Encode the compressed data at the start internal buffer. This ensures
the returned slice maintains the entire capacity and is available for
subsequent use.
When we pool / reuse string buffers, this will help considerably.
Improvements over previous commit:
```
name old time/op new time/op delta
EncodeStrings/10/batch-8 542ns ± 1% 355ns ± 2% -34.53% (p=0.008 n=5+5)
EncodeStrings/100/batch-8 5.29µs ± 1% 3.58µs ± 2% -32.20% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 48.6µs ± 0% 36.2µs ± 2% -25.40% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
EncodeStrings/10/batch-8 704B ± 0% 0B -100.00% (p=0.008 n=5+5)
EncodeStrings/100/batch-8 9.47kB ± 0% 0.00kB -100.00% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 90.1kB ± 0% 0.0kB -100.00% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
EncodeStrings/10/batch-8 0.00 0.00 ~ (all equal)
EncodeStrings/100/batch-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5)
```
This commit adds a tsm1 function for encoding a batch of booleans into a
provided buffer.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders using
randomly generated sets of booleans.
This commit adds a tsm1 function for encoding a batch of strings into a
provided buffer. The new function also shares the buffer between the
input data and the snappy encoded output, reducing allocations.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders using
randomly generated strings.
name old time/op new time/op delta
EncodeStrings/10 2.14µs ± 4% 1.42µs ± 4% -33.56% (p=0.000 n=10+10)
EncodeStrings/100 12.7µs ± 3% 10.9µs ± 2% -14.46% (p=0.000 n=10+10)
EncodeStrings/1000 132µs ± 2% 114µs ± 2% -13.88% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
EncodeStrings/10 657B ± 0% 704B ± 0% +7.15% (p=0.000 n=10+10)
EncodeStrings/100 6.14kB ± 0% 9.47kB ± 0% +54.14% (p=0.000 n=10+10)
EncodeStrings/1000 61.4kB ± 0% 90.1kB ± 0% +46.66% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
EncodeStrings/10 3.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
EncodeStrings/100 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
EncodeStrings/1000 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
This commit adds a tsm1 function for encoding a batch of floats into a
buffer. Further, it replaces the `bitstream` library used in the
existing encoders (and all the current decoders) with inlined bit
expressions within the encoder, significantly reducing the function call
overhead for larger batches.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders. They look
at a sequential input slice and a randomly generated input slice.
name old time/op new time/op delta
EncodeFloats/10_seq 1.14µs ± 3% 0.24µs ± 3% -78.94% (p=0.000 n=10+10)
EncodeFloats/10_ran 1.69µs ± 2% 0.21µs ± 3% -87.43% (p=0.000 n=10+10)
EncodeFloats/100_seq 7.07µs ± 1% 1.72µs ± 1% -75.62% (p=0.000 n=7+9)
EncodeFloats/100_ran 15.8µs ± 4% 1.8µs ± 1% -88.60% (p=0.000 n=10+9)
EncodeFloats/1000_seq 50.2µs ± 3% 16.2µs ± 2% -67.66% (p=0.000 n=10+10)
EncodeFloats/1000_ran 174µs ± 2% 16µs ± 2% -90.77% (p=0.000 n=10+10)
name old alloc/op new alloc/op delta
EncodeFloats/10_seq 0.00B 0.00B ~ (all equal)
EncodeFloats/10_ran 0.00B 0.00B ~ (all equal)
EncodeFloats/100_seq 0.00B 0.00B ~ (all equal)
EncodeFloats/100_ran 0.00B 0.00B ~ (all equal)
EncodeFloats/1000_seq 0.00B 0.00B ~ (all equal)
EncodeFloats/1000_ran 0.00B 0.00B ~ (all equal)
name old allocs/op new allocs/op delta
EncodeFloats/10_seq 0.00 0.00 ~ (all equal)
EncodeFloats/10_ran 0.00 0.00 ~ (all equal)
EncodeFloats/100_seq 0.00 0.00 ~ (all equal)
EncodeFloats/100_ran 0.00 0.00 ~ (all equal)
EncodeFloats/1000_seq 0.00 0.00 ~ (all equal)
EncodeFloats/1000_ran 0.00 0.00 ~ (all equal)
This keeps file compatability by just writing out zeros for the
sizes and offsets. Perhaps it's ok to just nuke everything and
remove the data.
It also keeps the hll package because it seems generally useful
even if it's not currently being used.
These are the log messages that get printed immediately when starting
the application for the first time. This fixes the messages to conform
to the logging style guide.
Type conflicts should be rare, but when they do happen, printing out the
string name should save developers a couple minutes of digging compared
to looking up which numeric value means which type.
This commit removes the remaining bits of the fields index. In doing
so, the buildCursor method on the engine would need to be updated.
It turns out, that code was statically dead, so delete it and anything
that depended on it. Additionally, delete anything as reported by
the unused tool in the tsdb package.
The generate commands have been modified to take advantage of the new
functionality in Go 1.11 that allows `go run` to execute a package
instead of individual files.
This functionality combined with Go modules allows us to execute a
package directly out of our pinned dependencies rather than accidentally
picking up another binary outside of the build environment.
This also simplifies the Makefile because they no longer have to be
responsible for installing the correct tooling since the Go command
takes care of that logic. It also makes it so that the Makefiles with
file generation can now be invoked from their appropriate subdirectories
so they are contained within the directory itself rather than relying on
values in the top level Makefile.
It is now possible to generate all files within this project by using:
go generate ./...
Or the Makefile can continue to be used.
This commit also copies over the special copy of `tmpl` that the storage
engine uses within the influxdb repository. It was never copied over so
using `go generate` on these packages did not work.