Encode the compressed data at the start internal buffer. This ensures
the returned slice maintains the entire capacity and is available for
subsequent use.
When we pool / reuse string buffers, this will help considerably.
Improvements over previous commit:
```
name old time/op new time/op delta
EncodeStrings/10/batch-8 542ns ± 1% 355ns ± 2% -34.53% (p=0.008 n=5+5)
EncodeStrings/100/batch-8 5.29µs ± 1% 3.58µs ± 2% -32.20% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 48.6µs ± 0% 36.2µs ± 2% -25.40% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
EncodeStrings/10/batch-8 704B ± 0% 0B -100.00% (p=0.008 n=5+5)
EncodeStrings/100/batch-8 9.47kB ± 0% 0.00kB -100.00% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 90.1kB ± 0% 0.0kB -100.00% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
EncodeStrings/10/batch-8 0.00 0.00 ~ (all equal)
EncodeStrings/100/batch-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5)
EncodeStrings/1000/batch-8 1.00 ± 0% 0.00 -100.00% (p=0.008 n=5+5)
```
This commit adds a tsm1 function for encoding a batch of strings into a
provided buffer. The new function also shares the buffer between the
input data and the snappy encoded output, reducing allocations.
The following benchmarks compare the performance of the existing
iterator based encoders, and the new batch oriented encoders using
randomly generated strings.
name old time/op new time/op delta
EncodeStrings/10 2.14µs ± 4% 1.42µs ± 4% -33.56% (p=0.000 n=10+10)
EncodeStrings/100 12.7µs ± 3% 10.9µs ± 2% -14.46% (p=0.000 n=10+10)
EncodeStrings/1000 132µs ± 2% 114µs ± 2% -13.88% (p=0.000 n=10+9)
name old alloc/op new alloc/op delta
EncodeStrings/10 657B ± 0% 704B ± 0% +7.15% (p=0.000 n=10+10)
EncodeStrings/100 6.14kB ± 0% 9.47kB ± 0% +54.14% (p=0.000 n=10+10)
EncodeStrings/1000 61.4kB ± 0% 90.1kB ± 0% +46.66% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
EncodeStrings/10 3.00 ± 0% 0.00 -100.00% (p=0.000 n=10+10)
EncodeStrings/100 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)
EncodeStrings/1000 3.00 ± 0% 1.00 ± 0% -66.67% (p=0.000 n=10+10)