This fixes a regression in the Cache introduced in ca40c1ad3c where
not all the values in the cache entry would be removed. Previously,
calling Exclude did not require the values to be sorted. The change
in ca40c1ad3c relies on the values being sorted so it was possible for
it to find the wrong indexes in when calling FindRange and leave some
data that should be deleted.
Fixes#9161
* introduced UnsignedValue type
* leveraged existing int64 compression algorithms (RLE, Simple 8B)
* tsm and WAL can read and write UnsignedValue
* compaction is aware of UnsignedValue
* unsigned support to model, cursors and write points
NOTE: there is no support to create unsigned points, as the line
protocol has not been modified.
The previous version was very innefficient due to the benchmarks used
to optimize it having a bug. This version always allocates a new
slice, but is O(n).
This switches compactions to use type values (FloatValues) from the
generic Values type. It avoids a bunch of allocations where each value
much be converted from a specific type to an interface{}.
Deduplicate is called from various places in the engine and can cause
a lot of garbage to get created. It first creates a map and then
adds each value to the map in order (1st alloc). It then creates a
new slice (2nd alloc) and appends everything from the map to the slice.
Finally, it sorted the new slice (3rd alloc).
This switches the algorithm to use stable sorting and resuing the existing
slice to avoid allocations.
Due to a bug in compactions, it's possible some blocks may have duplicate
points stored. If those blocks are decoded and re-compacted, an assertion
panic could trigger.
We now dedup those blocks if necessary to remove the duplicate points
and avoid the panic.
This fixes a pathalogical query condition cause by and problematic
structuring of TSM files based on how points were written. The
condition can occur when there are multiple TSM files and a large
number of points are written into the past. The earlier existing
TSM files must also have points in the past and close to the present
causing their time range to eclipse the later files.
When this condition occurs, some queries can spend an excessive amount
of time merge all the overlapping blocks.
The fix was to constrain the window of overlapping blocks based on
the first one we ran into. There was also a simple case in the Merge
where we could skip the binary search path and just append the two
inputs.