Forráskód Böngészése

Use faster copy when not overlapping

Use the built-in copy function when the source doesn't overlap the destination.

Again benchmarks are a bit polarized based on how often this is the case, but should be a solid improvement for all non-amd64 users.

Benchmark  measured on AMD64 but with `-tags=noasm`:

```
>benchstat old.txt new.txt
name        old time/op    new time/op    delta
_UFlat0-8      194µs ± 3%     130µs ± 2%   -33.14%  (p=0.000 n=10+10)
_UFlat1-8     1.62ms ± 1%    1.42ms ± 1%   -11.98%    (p=0.000 n=9+9)
_UFlat2-8     8.91µs ± 4%    8.73µs ± 1%      ~      (p=0.182 n=10+9)
_UFlat3-8      222ns ± 2%     219ns ± 6%    -1.36%   (p=0.022 n=10+9)
_UFlat4-8     28.4µs ± 2%    11.5µs ± 1%   -59.57%  (p=0.000 n=10+10)
_UFlat5-8      797µs ± 5%     536µs ± 1%   -32.77%  (p=0.000 n=10+10)
_UFlat6-8      565µs ± 1%     571µs ± 1%    +1.04%   (p=0.007 n=8+10)
_UFlat7-8      494µs ± 4%     496µs ± 3%      ~     (p=0.986 n=10+10)
_UFlat8-8     1.55ms ± 4%    1.53ms ± 3%      ~     (p=0.280 n=10+10)
_UFlat9-8     1.93ms ± 1%    1.98ms ± 3%    +2.57%  (p=0.000 n=10+10)
_UFlat10-8     186µs ± 2%     102µs ± 2%   -45.14%  (p=0.000 n=10+10)
_UFlat11-8     524µs ± 2%     510µs ± 1%    -2.56%   (p=0.000 n=10+8)

name        old speed      new speed      delta
_UFlat0-8    528MB/s ± 3%   790MB/s ± 1%   +49.54%  (p=0.000 n=10+10)
_UFlat1-8    434MB/s ± 1%   493MB/s ± 1%   +13.61%    (p=0.000 n=9+9)
_UFlat2-8   13.8GB/s ± 4%  14.1GB/s ± 2%      ~      (p=0.182 n=10+9)
_UFlat3-8    901MB/s ± 1%   912MB/s ± 6%    +1.18%    (p=0.026 n=9+9)
_UFlat4-8   3.60GB/s ± 2%  8.91GB/s ± 1%  +147.32%  (p=0.000 n=10+10)
_UFlat5-8    514MB/s ± 5%   764MB/s ± 2%   +48.59%  (p=0.000 n=10+10)
_UFlat6-8    269MB/s ± 1%   266MB/s ± 1%    -1.03%   (p=0.009 n=8+10)
_UFlat7-8    253MB/s ± 4%   252MB/s ± 3%      ~     (p=0.985 n=10+10)
_UFlat8-8    276MB/s ± 4%   279MB/s ± 3%      ~     (p=0.288 n=10+10)
_UFlat9-8    249MB/s ± 1%   243MB/s ± 3%    -2.51%  (p=0.000 n=10+10)
_UFlat10-8   637MB/s ± 2%  1162MB/s ± 2%   +82.29%  (p=0.000 n=10+10)
_UFlat11-8   352MB/s ± 2%   361MB/s ± 1%    +2.62%   (p=0.000 n=10+8)
```

Co-Authored-By: Nigel Tao <nigeltao@golang.org>
Klaus Post 6 éve
szülő
commit
efb0d863a3
1 módosított fájl, 9 hozzáadás és 2 törlés
  1. 9 2
      decode_other.go

+ 9 - 2
decode_other.go

@@ -85,8 +85,15 @@ func decode(dst, src []byte) int {
 		if offset <= 0 || d < offset || length > len(dst)-d {
 			return decodeErrCodeCorrupt
 		}
-		// Copy from an earlier sub-slice of dst to a later sub-slice. Unlike
-		// the built-in copy function, this byte-by-byte copy always runs
+		// Copy from an earlier sub-slice of dst to a later sub-slice.
+		// If no overlap, use the built-in copy:
+		if offset >= length {
+			copy(dst[d:d+length], dst[d-offset:])
+			d += length
+			continue
+		}
+
+		// Unlike the built-in copy function, this byte-by-byte copy always runs
 		// forwards, even if the slices overlap. Conceptually, this is:
 		//
 		// d += forwardCopy(dst[d:d+length], dst[d-offset:])