Commit Graph

6 Commits (b25b4021c3b09b52db56af4ea42fca69182bcbde)

Author SHA1 Message Date
Stuart Carnie 369a4610e6
fix(storage): Don't panic when length of source slice is too large
StringArrayEncodeAll will panic if the total length of strings
contained in the src slice is > 0xffffffff. This change adds a unit
test to replicate the issue and an associated fix to return an error.

This also raises an issue that compactions will be unable to make
progress under the following condition:

* multiple string blocks are to be merged to a single block and
* the total length of all strings exceeds the maximum block size that
  snappy will encode (0xffffffff)

The observable effect of this is errors in the logs indicating a
compaction failure.

Fixes #13687
2019-04-29 13:29:41 -07:00
Edd Robinson 9403c1ec8e Ensure error strings not capitalised ST1005 2018-11-30 10:54:24 +00:00
Stuart Carnie c21336af0a fix(encoding): Improve array string encoding perf a little more
Encode the compressed data at the start internal buffer. This ensures
the returned slice maintains the entire capacity and is available for
subsequent use.

When we pool / reuse string buffers, this will help considerably.

Improvements over previous commit:

```
name                        old time/op    new time/op    delta
EncodeStrings/10/batch-8       542ns ± 1%     355ns ± 2%   -34.53%  (p=0.008 n=5+5)
EncodeStrings/100/batch-8     5.29µs ± 1%    3.58µs ± 2%   -32.20%  (p=0.008 n=5+5)
EncodeStrings/1000/batch-8    48.6µs ± 0%    36.2µs ± 2%   -25.40%  (p=0.008 n=5+5)

name                        old alloc/op   new alloc/op   delta
EncodeStrings/10/batch-8        704B ± 0%        0B       -100.00%  (p=0.008 n=5+5)
EncodeStrings/100/batch-8     9.47kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)
EncodeStrings/1000/batch-8    90.1kB ± 0%     0.0kB       -100.00%  (p=0.008 n=5+5)

name                        old allocs/op  new allocs/op  delta
EncodeStrings/10/batch-8        0.00           0.00           ~     (all equal)
EncodeStrings/100/batch-8       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
EncodeStrings/1000/batch-8      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
```
2018-11-01 18:59:20 +00:00
Edd Robinson d8b5f9d432 Batch oriented string encoders
This commit adds a tsm1 function for encoding a batch of strings into a
provided buffer. The new function also shares the buffer between the
input data and the snappy encoded output, reducing allocations.

The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders using
randomly generated strings.

name                old time/op    new time/op    delta
EncodeStrings/10      2.14µs ± 4%    1.42µs ± 4%   -33.56%  (p=0.000 n=10+10)
EncodeStrings/100     12.7µs ± 3%    10.9µs ± 2%   -14.46%  (p=0.000 n=10+10)
EncodeStrings/1000     132µs ± 2%     114µs ± 2%   -13.88%  (p=0.000 n=10+9)

name                old alloc/op   new alloc/op   delta
EncodeStrings/10        657B ± 0%      704B ± 0%    +7.15%  (p=0.000 n=10+10)
EncodeStrings/100     6.14kB ± 0%    9.47kB ± 0%   +54.14%  (p=0.000 n=10+10)
EncodeStrings/1000    61.4kB ± 0%    90.1kB ± 0%   +46.66%  (p=0.000 n=10+10)

name                old allocs/op  new allocs/op  delta
EncodeStrings/10        3.00 ± 0%      0.00       -100.00%  (p=0.000 n=10+10)
EncodeStrings/100       3.00 ± 0%      1.00 ± 0%   -66.67%  (p=0.000 n=10+10)
EncodeStrings/1000      3.00 ± 0%      1.00 ± 0%   -66.67%  (p=0.000 n=10+10)
2018-11-01 18:59:19 +00:00
Edd Robinson d7a4b814d4 Rename string batch decoders 2018-11-01 18:59:19 +00:00
Edd Robinson 074f263e08 Initial import of tsm1.Engine 2018-10-01 12:08:37 +01:00