Encode the compressed data at the start internal buffer. This ensures
the returned slice maintains the entire capacity and is available for
subsequent use.
When we pool / reuse string buffers, this will help considerably.
Improvements over previous commit:
```
name old time/op new time/op delta
EncodeStrings/10/batch-8 542ns ± 1% 355ns ± 2% -34.53% (p=0.008 n=5+5)
EncodeStrings/100/batch-8 5.29µs ± 1% 3.58µs ± 2% -32.20% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 48.6µs ± 0% 36.2µs ± 2% -25.40% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
EncodeStrings/10/batch-8 704B ± 0% 0B -100.00% (p=0.008 n=5+5)
EncodeStrings/100/batch-8 9.47kB ± 0% 0.00kB -100.00% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 90.1kB ± 0% 0.0kB -100.00% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
EncodeStrings/10/batch-8 0.00 0.00 ~ (all equal)
EncodeStrings/100/batch-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5)
```
This commit adds a tsm1 function for encoding a batch of booleans into a
provided buffer.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders using
randomly generated sets of booleans.
This commit adds a tsm1 function for encoding a batch of strings into a
provided buffer. The new function also shares the buffer between the
input data and the snappy encoded output, reducing allocations.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders using
randomly generated strings.
name old time/op new time/op delta
EncodeStrings/10 2.14µs ± 4% 1.42µs ± 4% -33.56% (p=0.000 n=10+10)
EncodeStrings/100 12.7µs ± 3% 10.9µs ± 2% -14.46% (p=0.000 n=10+10)
EncodeStrings/1000 132µs ± 2% 114µs ± 2% -13.88% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
EncodeStrings/10 657B ± 0% 704B ± 0% +7.15% (p=0.000 n=10+10)
EncodeStrings/100 6.14kB ± 0% 9.47kB ± 0% +54.14% (p=0.000 n=10+10)
EncodeStrings/1000 61.4kB ± 0% 90.1kB ± 0% +46.66% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
EncodeStrings/10 3.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
EncodeStrings/100 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
EncodeStrings/1000 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
This commit adds a tsm1 function for encoding a batch of floats into a
buffer. Further, it replaces the `bitstream` library used in the
existing encoders (and all the current decoders) with inlined bit
expressions within the encoder, significantly reducing the function call
overhead for larger batches.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders. They look
at a sequential input slice and a randomly generated input slice.
name old time/op new time/op delta
EncodeFloats/10_seq 1.14µs ± 3% 0.24µs ± 3% -78.94% (p=0.000 n=10+10)
EncodeFloats/10_ran 1.69µs ± 2% 0.21µs ± 3% -87.43% (p=0.000 n=10+10)
EncodeFloats/100_seq 7.07µs ± 1% 1.72µs ± 1% -75.62% (p=0.000 n=7+9)
EncodeFloats/100_ran 15.8µs ± 4% 1.8µs ± 1% -88.60% (p=0.000 n=10+9)
EncodeFloats/1000_seq 50.2µs ± 3% 16.2µs ± 2% -67.66% (p=0.000 n=10+10)
EncodeFloats/1000_ran 174µs ± 2% 16µs ± 2% -90.77% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
EncodeFloats/10_seq 0.00B 0.00B ~ (all equal)
EncodeFloats/10_ran 0.00B 0.00B ~ (all equal)
EncodeFloats/100_seq 0.00B 0.00B ~ (all equal)
EncodeFloats/100_ran 0.00B 0.00B ~ (all equal)
EncodeFloats/1000_seq 0.00B 0.00B ~ (all equal)
EncodeFloats/1000_ran 0.00B 0.00B ~ (all equal)
name old allocs/op new allocs/op delta
EncodeFloats/10_seq 0.00 0.00 ~ (all equal)
EncodeFloats/10_ran 0.00 0.00 ~ (all equal)
EncodeFloats/100_seq 0.00 0.00 ~ (all equal)
EncodeFloats/100_ran 0.00 0.00 ~ (all equal)
EncodeFloats/1000_seq 0.00 0.00 ~ (all equal)
EncodeFloats/1000_ran 0.00 0.00 ~ (all equal)
This commit deletes most of the code to service reads from influxdb
and pulls it in from platform instead.
Of note, the models.Tag and models.Tags types are now aliases to the
platform models.Tag and models.Tags types. Additionally, many types
in the tsdb package relating to cursors are also aliases to the same
types in the platform cursors package.
This updates the platform and flux repos to the current master in the
Gopkg.lock.
This commit fixes an issue with the series file compaction process
where tombstones are lost after compaction and series existence
checks are not correct. This commit also fixes some smaller flushing
issues within the series file that mainly related to testing.
If there was an error after the cache has been snapshotted to one or
more TSM files, but before the cache and WAL are cleaned up, then the
cache would be repeatedly snapshotted, generated duplicate level 1 TSM
files.
This commit attempts to clean those files up by removing the temporary
TSM file(s). The snapshot will be retried.
Since all tag sets are materialised to strings before this method
returns, a large number of allocations can be avoided by carefully
resuing buffers and containers.
This commit reduces allocations by about 75%, which can be very
significant for high cardinality workloads.
The benchmark results shown below are for a benchmark that asks for all
series keys matching `tag5=value0'. There are 100K matching series keys.
benchmark old ns/op new ns/op delta
BenchmarkIndexSet_TagSets/1M_series/inmem-8 10959963 11144345 +1.68%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 23632757 18768888 -20.58%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 10496303 10380551 -1.10%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 24344359 19020234 -21.87%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 10359864 10818296 +4.43%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 23453357 19027445 -18.87%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 10479519 10400619 -0.75%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 26364965 19023749 -27.84%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 10437794 10557066 +1.14%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 23126946 19196955 -16.99%
benchmark old allocs new allocs delta
BenchmarkIndexSet_TagSets/1M_series/inmem-8 51 51 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 80067 20071 -74.93%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 51 51 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 80067 20071 -74.93%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 51 51 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 80067 20071 -74.93%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 51 51 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 80067 20071 -74.93%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 51 51 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 80067 20071 -74.93%
benchmark old bytes new bytes delta
BenchmarkIndexSet_TagSets/1M_series/inmem-8 3556728 3556728 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 12677328 5157992 -59.31%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 3556728 3556728 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 12677328 5157992 -59.31%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 3556728 3556728 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 12677328 5157992 -59.31%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 3556728 3556728 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 12677328 5157992 -59.31%
BenchmarkIndexSet_TagSets/1M_series/inmem-8 3556728 3556728 +0.00%
BenchmarkIndexSet_TagSets/1M_series/tsi1-8 12677328 5157992 -59.31%
This commit sets the copy-on-write feature of the SeriesIDSets, such
that we can make immutable clones of underlying bitmaps efficiently. If
the original bitmap is modified then a copy will be made, which won't
affect the clone.
This commit ensures that cached bitset results at the Index level are
updated whenever new series ids are created that would belong in those
bitsets.
For example, if we have a cached bitset for the tuple {mem, region,
west}, and we add the series mem,host=prod,region=west then we would
update the cached bitset for {mem, region, west} with the series id of
the newly written series.