The tsdb/tsm1 package was one of the test suites that took the longest
to run in platform with go test -short. The rule of thumb on the Go
project is that short mode should skip any individual test that takes
longer than one second. This change skips two such tests, and it
eliminates a string concatenation loop in two other tests, so that they
report completion in "0.00s" rather than about 0.94s, on my machine.
These cumulative changes take `go test -short ./tsdb/tsm1` from about 14
seconds to about 7 seconds on my machine.
And add a test to cover that.
The data race would look roughly like:
```
WARNING: DATA RACE
Write at 0x00c000024e18 by goroutine 8:
github.com/RoaringBitmap/roaring.(*roaringArray).markAllAsNeedingCopyOnWrite()
/Users/mr/go/pkg/mod/github.com/!roaring!bitmap/roaring@v0.4.16/roaringarray.go:881 +0x6b
github.com/RoaringBitmap/roaring.(*roaringArray).clone()
/Users/mr/go/pkg/mod/github.com/!roaring!bitmap/roaring@v0.4.16/roaringarray.go:266 +0x808
github.com/RoaringBitmap/roaring.(*Bitmap).Clone()
/Users/mr/go/pkg/mod/github.com/!roaring!bitmap/roaring@v0.4.16/roaring.go:385 +0x58
github.com/influxdata/platform/tsdb.(*SeriesIDSet).CloneNoLock()
/Users/mr/go/src/github.com/influxdata/platform/tsdb/series_set.go:229 +0x73
github.com/influxdata/platform/tsdb.(*SeriesIDSet).Clone()
Previous write at 0x00c000024e18 by goroutine 7:
github.com/RoaringBitmap/roaring.(*roaringArray).markAllAsNeedingCopyOnWrite()
/Users/mr/go/pkg/mod/github.com/!roaring!bitmap/roaring@v0.4.16/roaringarray.go:881 +0x6b
github.com/RoaringBitmap/roaring.(*roaringArray).clone()
/Users/mr/go/pkg/mod/github.com/!roaring!bitmap/roaring@v0.4.16/roaringarray.go:266 +0x808
github.com/RoaringBitmap/roaring.(*Bitmap).Clone()
/Users/mr/go/pkg/mod/github.com/!roaring!bitmap/roaring@v0.4.16/roaring.go:385 +0x58
github.com/influxdata/platform/tsdb.(*SeriesIDSet).CloneNoLock()
/Users/mr/go/src/github.com/influxdata/platform/tsdb/series_set.go:229 +0x73
github.com/influxdata/platform/tsdb.(*SeriesIDSet).Clone()
/Users/mr/go/src/github.com/influxdata/platform/tsdb/series_set.go:223 +0x7b
```
We were passing a non-nil tsm1.Log containing a nil *tsm1.WAL which
would cause a panic when it was attempted to be used. Instead, always
pass a non-nil WAL.
We change the storage engine code to not pass in a nil WAL, and
additionally add a defensive check to change any nil WALs into a
NopWAL.
- Add some documentation.
- Move compaction planner to an option instead of config.
The latter fits with the general theme of having config be things
that can be specified in a toml, and everything else being an
option.
Encode the compressed data at the start internal buffer. This ensures
the returned slice maintains the entire capacity and is available for
subsequent use.
When we pool / reuse string buffers, this will help considerably.
Improvements over previous commit:
```
name old time/op new time/op delta
EncodeStrings/10/batch-8 542ns ± 1% 355ns ± 2% -34.53% (p=0.008 n=5+5)
EncodeStrings/100/batch-8 5.29µs ± 1% 3.58µs ± 2% -32.20% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 48.6µs ± 0% 36.2µs ± 2% -25.40% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
EncodeStrings/10/batch-8 704B ± 0% 0B -100.00% (p=0.008 n=5+5)
EncodeStrings/100/batch-8 9.47kB ± 0% 0.00kB -100.00% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 90.1kB ± 0% 0.0kB -100.00% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
EncodeStrings/10/batch-8 0.00 0.00 ~ (all equal)
EncodeStrings/100/batch-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5)
```
This commit adds a tsm1 function for encoding a batch of booleans into a
provided buffer.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders using
randomly generated sets of booleans.
This commit adds a tsm1 function for encoding a batch of strings into a
provided buffer. The new function also shares the buffer between the
input data and the snappy encoded output, reducing allocations.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders using
randomly generated strings.
name old time/op new time/op delta
EncodeStrings/10 2.14µs ± 4% 1.42µs ± 4% -33.56% (p=0.000 n=10+10)
EncodeStrings/100 12.7µs ± 3% 10.9µs ± 2% -14.46% (p=0.000 n=10+10)
EncodeStrings/1000 132µs ± 2% 114µs ± 2% -13.88% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
EncodeStrings/10 657B ± 0% 704B ± 0% +7.15% (p=0.000 n=10+10)
EncodeStrings/100 6.14kB ± 0% 9.47kB ± 0% +54.14% (p=0.000 n=10+10)
EncodeStrings/1000 61.4kB ± 0% 90.1kB ± 0% +46.66% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
EncodeStrings/10 3.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
EncodeStrings/100 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
EncodeStrings/1000 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
This commit adds a tsm1 function for encoding a batch of floats into a
buffer. Further, it replaces the `bitstream` library used in the
existing encoders (and all the current decoders) with inlined bit
expressions within the encoder, significantly reducing the function call
overhead for larger batches.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders. They look
at a sequential input slice and a randomly generated input slice.
name old time/op new time/op delta
EncodeFloats/10_seq 1.14µs ± 3% 0.24µs ± 3% -78.94% (p=0.000 n=10+10)
EncodeFloats/10_ran 1.69µs ± 2% 0.21µs ± 3% -87.43% (p=0.000 n=10+10)
EncodeFloats/100_seq 7.07µs ± 1% 1.72µs ± 1% -75.62% (p=0.000 n=7+9)
EncodeFloats/100_ran 15.8µs ± 4% 1.8µs ± 1% -88.60% (p=0.000 n=10+9)
EncodeFloats/1000_seq 50.2µs ± 3% 16.2µs ± 2% -67.66% (p=0.000 n=10+10)
EncodeFloats/1000_ran 174µs ± 2% 16µs ± 2% -90.77% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
EncodeFloats/10_seq 0.00B 0.00B ~ (all equal)
EncodeFloats/10_ran 0.00B 0.00B ~ (all equal)
EncodeFloats/100_seq 0.00B 0.00B ~ (all equal)
EncodeFloats/100_ran 0.00B 0.00B ~ (all equal)
EncodeFloats/1000_seq 0.00B 0.00B ~ (all equal)
EncodeFloats/1000_ran 0.00B 0.00B ~ (all equal)
name old allocs/op new allocs/op delta
EncodeFloats/10_seq 0.00 0.00 ~ (all equal)
EncodeFloats/10_ran 0.00 0.00 ~ (all equal)
EncodeFloats/100_seq 0.00 0.00 ~ (all equal)
EncodeFloats/100_ran 0.00 0.00 ~ (all equal)
EncodeFloats/1000_seq 0.00 0.00 ~ (all equal)
EncodeFloats/1000_ran 0.00 0.00 ~ (all equal)
This keeps file compatability by just writing out zeros for the
sizes and offsets. Perhaps it's ok to just nuke everything and
remove the data.
It also keeps the hll package because it seems generally useful
even if it's not currently being used.
These are the log messages that get printed immediately when starting
the application for the first time. This fixes the messages to conform
to the logging style guide.