Stuart Carnie
31df76e1e9
refactor(tsm1): Add TimeRangeMaxTimeIterator
...
This commit introduces a new API for finding the maximum
timestamp of a series when iterating over the keys in a
set of TSM files.
This API will be used to determine the field type of a single
field key by selecting the series with the maximum timestamp.
It has also refactored the common functionality for iterating
TSM keys into `timeRangeBlockReader`, which is shared
between `TimeRangeIterator` and `TimeRangeMaxTimeIterator`.
2020-04-08 16:05:19 -07:00
Jeff Wendling
4096f93891
tsm1: implement reading and writing predicates in tombstone files
2019-05-01 13:40:40 -06:00
Jeff Wendling
dcf797f111
tsm1: basic predicate implementation at index layer
...
Only wires it up. No tests, no tombstone tracking, nothing.
2019-05-01 13:40:40 -06:00
Stuart Carnie
d3790aa072
feat: Teach storage engine how to find tag values for a given key
...
The TagValues API will perform a linear scan if there is no predicate;
otherwise, it will use the index to find a list of candidate series
keys.
TagValues expects the predicate to be transformed such that
`_measurement` and `_field` are remapped to `\x00` and `\xff`
respectively.
There is one TODO marked to analyze the predicate for a
`\x00 = '<measurement>'` pattern. If found, the predicate can be
eliminated and fall back to a linear prefix scan by combining the org,
bucket and measurement.
2019-04-18 16:19:18 -07:00
Stuart Carnie
35e0094a28
feat: TimeRangeIterator for checking if keys have data in a TSM file
...
The TimeRangeIterator permits linear or random index scans and
can answer whether the current key has data for the specified time
interval, considering any tombstones.
When there are no tombstones there are some opportunities for
optimization to skip decoding blocks. Specifically, if the
queried time interval overlaps any boundaries of the TSM index entries.
2019-04-18 16:19:18 -07:00
Jeff Wendling
0a85e3b0dd
tsm1: add initial index cleanup to DeletePrefix
2019-01-08 16:32:43 -07:00
Jeff Wendling
f712828016
tsm1: refactor and rename some methods
2019-01-08 14:52:30 -07:00
Jeff Wendling
8744a82665
tsm1: add DeletePrefix to the reader
2019-01-07 21:11:49 -07:00
Jeff Wendling
f65b0933f6
tsm1: move code around into smaller files and add tests
2019-01-07 21:11:49 -07:00
Jeff Wendling
fed3154506
tsm1: DeletePrefix on the indirectIndex
2019-01-07 21:08:32 -07:00
Jeff Wendling
ad5352926f
tsm1: log when error reading entries for tsm key
2019-01-07 11:00:35 -07:00
Jeff Wendling
9cdefa8e4f
tsm1: fix staticcheck and refactor closure out
2019-01-07 11:00:35 -07:00
Jeff Wendling
1ffcd77342
tsm1: fix remaining issues and add small benchmarks
...
- notice when keys are deleted during iteration and return an error
- make sure all the consumers check the error
- add some benchmarks for small indexes to compare
- allow concurrent readers to flag deletes
benchmarks against base:
name old time/op new time/op delta
IndirectIndex_UnmarshalBinary-8 70.0ms ±17% 71.0ms ±12% ~ (p=1.000 n=8+8)
IndirectIndex_DeleteRangeLast-8 1.48µs ± 1% 0.28µs ± 5% -81.29% (p=0.000 n=8+7)
IndirectIndex_DeleteRangeFull/Large-8 786ms ± 1% 363ms ± 3% -53.89% (p=0.000 n=7+8)
IndirectIndex_DeleteRangeFull/Small-8 2.37ms ± 0% 1.14ms ± 3% -52.02% (p=0.000 n=7+8)
IndirectIndex_DeleteRangeFull_Covered/Large-8 384ms ± 2% 188ms ± 3% -51.04% (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull_Covered/Small-8 470µs ± 1% 190µs ± 1% -59.71% (p=0.000 n=8+7)
IndirectIndex_Delete/Large-8 74.0ms ± 1% 128.7ms ± 1% +73.80% (p=0.001 n=7+7)
IndirectIndex_Delete/Small-8 142µs ± 1% 130µs ± 1% -8.24% (p=0.000 n=8+8)
name old alloc/op new alloc/op delta
IndirectIndex_UnmarshalBinary-8 11.6MB ± 0% 11.7MB ± 0% +0.02% (p=0.000 n=8+7)
IndirectIndex_DeleteRangeLast-8 3.26kB ± 0% 0.00kB ±NaN% -100.00% (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull/Large-8 233MB ± 0% 161MB ± 0% -30.75% (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull/Small-8 2.13MB ± 0% 1.40MB ± 0% -34.53% (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull_Covered/Large-8 12.4MB ± 0% 0.4MB ± 0% -96.82% (p=0.002 n=7+8)
IndirectIndex_DeleteRangeFull_Covered/Small-8 120kB ± 0% 0kB ± 0% -99.89% (p=0.000 n=8+8)
IndirectIndex_Delete/Large-8 4.54kB ± 0% 0.21kB ± 0% -95.26% (p=0.000 n=8+8)
IndirectIndex_Delete/Small-8 80.0B ± 0% 0.0B ±NaN% -100.00% (p=0.000 n=8+8)
name old allocs/op new allocs/op delta
IndirectIndex_UnmarshalBinary-8 35.0 ± 0% 42.0 ± 0% +20.00% (p=0.000 n=8+8)
IndirectIndex_DeleteRangeLast-8 3.00 ± 0% 0.00 ±NaN% -100.00% (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull/Large-8 1.53M ± 0% 0.52M ± 0% -65.98% (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull/Small-8 15.2k ± 0% 5.2k ± 0% -65.97% (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull_Covered/Large-8 620 ± 0% 124 ± 0% -80.00% (p=0.002 n=7+8)
IndirectIndex_DeleteRangeFull_Covered/Small-8 10.0 ± 0% 2.0 ± 0% -80.00% (p=0.000 n=8+8)
IndirectIndex_Delete/Large-8 246 ± 0% 1 ± 0% -99.59% (p=0.000 n=8+8)
IndirectIndex_Delete/Small-8 4.00 ± 0% 0.00 ±NaN% -100.00% (p=0.000 n=8+8)
2019-01-07 11:00:35 -07:00
Jeff Wendling
14cf01911e
tsm1: change TSMFile to use an iterator style api
2019-01-07 11:00:35 -07:00
Jeff Wendling
917584b054
tsm1: use readerOffsetsIterator for deletes
...
This reduces the amount of disk hits at some costs in cpu on some benchmarks. Notably, the
DeleteRangeFull_Covered and Delete benchmarks both went to approximately zero page faults
meaning they read from the index file linearly.
name old time/op new time/op delta
IndirectIndex_UnmarshalBinary-8 68.8ms ±10% 63.1ms ±16% -8.28% (p=0.021 n=8+8)
IndirectIndex_Entries-8 9.09µs ± 3% 9.62µs ± 1% +5.84% (p=0.000 n=8+7)
IndirectIndex_ReadEntries-8 5.86µs ± 1% 6.15µs ± 3% +5.03% (p=0.000 n=8+8)
IndirectIndex_DeleteRangeLast-8 562ns ± 6% 308ns ± 2% -45.25% (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull-8 363ms ±10% 376ms ± 5% ~ (p=0.054 n=8+7)
IndirectIndex_DeleteRangeFull_Covered-8 574ms ± 2% 746ms ± 0% +30.01% (p=0.000 n=8+7)
IndirectIndex_Delete-8 51.2ms ± 0% 88.2ms ± 0% +72.38% (p=0.000 n=8+7)
name old alloc/op new alloc/op delta
IndirectIndex_UnmarshalBinary-8 11.7MB ± 0% 11.7MB ± 0% ~ (all samples are equal)
IndirectIndex_Entries-8 32.8kB ± 0% 32.8kB ± 0% ~ (all samples are equal)
IndirectIndex_ReadEntries-8 0.00B ±NaN% 0.00B ±NaN% ~ (all samples are equal)
IndirectIndex_DeleteRangeLast-8 0.00B ±NaN% 0.00B ±NaN% ~ (all samples are equal)
IndirectIndex_DeleteRangeFull-8 162MB ± 0% 162MB ± 0% ~ (p=0.798 n=8+8)
IndirectIndex_DeleteRangeFull_Covered-8 82.4MB ± 0% 82.4MB ± 0% ~ (p=0.857 n=8+8)
IndirectIndex_Delete-8 4.01kB ± 0% 4.04kB ± 0% +0.90% (p=0.000 n=8+8)
name old allocs/op new allocs/op delta
IndirectIndex_UnmarshalBinary-8 42.0 ± 0% 42.0 ± 0% ~ (all samples are equal)
IndirectIndex_Entries-8 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal)
IndirectIndex_ReadEntries-8 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
IndirectIndex_DeleteRangeLast-8 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
IndirectIndex_DeleteRangeFull-8 522k ± 0% 522k ± 0% ~ (p=0.743 n=8+8)
IndirectIndex_DeleteRangeFull_Covered-8 3.31k ± 0% 3.31k ± 0% ~ (p=0.856 n=8+8)
IndirectIndex_Delete-8 123 ± 0% 123 ± 0% ~ (all samples are equal)
name old speed new speed delta
IndirectIndex_DeleteRangeFull-8 18.1MB/s ± 9% 17.5MB/s ± 7% ~ (p=0.105 n=8+8)
IndirectIndex_Delete-8 116MB/s ± 0% 0MB/s ± 0% -99.96% (p=0.000 n=8+8)
2019-01-07 11:00:35 -07:00
Jeff Wendling
6f5c94f3f7
tsm1: introduce readerOffsets to manage the offsets slice
...
It exposes an API that will clean up the bodies of many methods and
provide a safe abstraction around iteration that will be able to
handle reads with concurrent deletes.
Benchmarks are flat.
2019-01-07 11:00:35 -07:00
Jeff Wendling
f860305124
tsm1: keep first 8 bytes of each key in memory
...
Since most keys will share the first 8 bytes, we collapse them into
a slice containing partial sums of the counts. We can then binary search
into that slice to find the associated prefix for a given offset index.
Compressing in this way causes the overhead to be negligable and reduces
disk misses by about 30% in these benchmarks (500k series across 100 orgs).
name old time/op new time/op delta
IndirectIndex_UnmarshalBinary-8 67.5ms ± 1% 64.6ms ± 1% -4.33% (p=0.000 n=8+7)
IndirectIndex_Entries-8 9.41µs ± 2% 9.39µs ± 1% ~ (p=0.959 n=8+8)
IndirectIndex_ReadEntries-8 5.99µs ± 1% 6.07µs ± 1% +1.29% (p=0.001 n=8+8)
IndirectIndex_DeleteRangeLast-8 369ns ± 2% 566ns ± 1% +53.37% (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull-8 368ms ± 9% 369ms ± 2% ~ (p=0.232 n=8+7)
IndirectIndex_DeleteRangeFull_Covered-8 600ms ± 1% 618ms ± 0% +3.03% (p=0.000 n=8+7)
IndirectIndex_Delete-8 50.0ms ± 1% 47.6ms ± 9% ~ (p=0.463 n=7+8)
name old alloc/op new alloc/op delta
IndirectIndex_UnmarshalBinary-8 11.6MB ± 0% 11.7MB ± 0% +0.02% (p=0.000 n=8+7)
IndirectIndex_Entries-8 32.8kB ± 0% 32.8kB ± 0% ~ (all samples are equal)
IndirectIndex_ReadEntries-8 0.00B ±NaN% 0.00B ±NaN% ~ (all samples are equal)
IndirectIndex_DeleteRangeLast-8 0.00B ±NaN% 0.00B ±NaN% ~ (all samples are equal)
IndirectIndex_DeleteRangeFull-8 162MB ± 0% 162MB ± 0% ~ (p=0.382 n=8+8)
IndirectIndex_DeleteRangeFull_Covered-8 82.4MB ± 0% 82.4MB ± 0% ~ (p=0.776 n=8+8)
IndirectIndex_Delete-8 4.01kB ± 0% 4.01kB ± 0% ~ (all samples are equal)
name old allocs/op new allocs/op delta
IndirectIndex_UnmarshalBinary-8 35.0 ± 0% 42.0 ± 0% +20.00% (p=0.000 n=8+8)
IndirectIndex_Entries-8 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal)
IndirectIndex_ReadEntries-8 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
IndirectIndex_DeleteRangeLast-8 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
IndirectIndex_DeleteRangeFull-8 522k ± 0% 522k ± 0% ~ (p=0.382 n=8+8)
IndirectIndex_DeleteRangeFull_Covered-8 3.31k ± 0% 3.31k ± 0% ~ (p=0.457 n=8+8)
IndirectIndex_Delete-8 123 ± 0% 123 ± 0% ~ (all samples are equal)
name old speed new speed delta
IndirectIndex_DeleteRangeFull-8 24.7MB/s ±10% 17.8MB/s ± 2% -28.18% (p=0.000 n=8+7)
IndirectIndex_DeleteRangeFull_Covered-8 14.2MB/s ± 1% 9.6MB/s ± 0% -32.30% (p=0.000 n=8+7)
IndirectIndex_Delete-8 171MB/s ± 1% 126MB/s ±10% -26.35% (p=0.000 n=7+8)
IndirectIndex_DeleteRangeLast went from 17 page faults, or ~180GB/sec at 369ns/op
to zero page faults. So even though it got 50% slower, it was actually I/O bound
and no longer is.
2019-01-07 11:00:35 -07:00
Jeff Wendling
0becfc6239
tsm1: add helper to track page faults in index
...
Since the methods inline and dead code is eliminated, it has no runtime
overhead in the benchmarks when disabled.
benchmark recorded faults
BenchmarkIndirectIndex_Entries-8 11
BenchmarkIndirectIndex_ReadEntries-8 11
BenchmarkIndirectIndex_DeleteRangeLast-8 17
BenchmarkIndirectIndex_DeleteRangeFull-8 2218
BenchmarkIndirectIndex_Delete-8 2084
2019-01-07 11:00:35 -07:00
Jeff Wendling
91e820a9d8
tsm1: fix multiple issues with DeleteRange
...
1. Correctly acquires locks
2. Seeks for discontiguous key ranges (like delete ["aaa", "zzz"])
3. Is precise about deleting a key when it contains no data
name old time/op new time/op delta
IndirectIndex_UnmarshalBinary-8 67.3ms ± 1% 63.2ms ±15% ~ (p=0.463 n=7+8)
IndirectIndex_Entries-8 9.14µs ± 1% 9.01µs ± 0% -1.40% (p=0.004 n=8+7)
IndirectIndex_ReadEntries-8 5.83µs ± 1% 5.68µs ± 2% -2.62% (p=0.000 n=8+8)
IndirectIndex_DeleteRangeLast-8 283ns ± 2% 191ns ± 1% -32.37% (p=0.000 n=8+7)
IndirectIndex_DeleteRangeFull-8 612ms ± 1% 361ms ± 1% -41.02% (p=0.000 n=8+8)
IndirectIndex_Delete-8 49.0ms ± 1% 49.8ms ± 1% +1.80% (p=0.001 n=7+8)
name old alloc/op new alloc/op delta
IndirectIndex_UnmarshalBinary-8 11.6MB ± 0% 11.6MB ± 0% ~ (all samples are equal)
IndirectIndex_Entries-8 32.8kB ± 0% 32.8kB ± 0% ~ (all samples are equal)
IndirectIndex_ReadEntries-8 0.00B ±NaN% 0.00B ±NaN% ~ (all samples are equal)
IndirectIndex_DeleteRangeLast-8 64.0B ± 0% 0.0B ±NaN% -100.00% (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull-8 168MB ± 0% 162MB ± 0% -3.71% (p=0.000 n=8+8)
IndirectIndex_Delete-8 3.94kB ± 0% 3.94kB ± 0% ~ (all samples are equal)
name old allocs/op new allocs/op delta
IndirectIndex_UnmarshalBinary-8 35.0 ± 0% 35.0 ± 0% ~ (all samples are equal)
IndirectIndex_Entries-8 1.00 ± 0% 1.00 ± 0% ~ (all samples are equal)
IndirectIndex_ReadEntries-8 0.00 ±NaN% 0.00 ±NaN% ~ (all samples are equal)
IndirectIndex_DeleteRangeLast-8 2.00 ± 0% 0.00 ±NaN% -100.00% (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull-8 1.04M ± 0% 0.52M ± 0% -49.77% (p=0.000 n=8+8)
IndirectIndex_Delete-8 123 ± 0% 123 ± 0% ~ (all samples are equal)
2019-01-07 11:00:35 -07:00
Jeff Wendling
d40c3e662f
tsm1: use uint32 key for tombstones
...
rough, noisy benchmarks.
benchmark old ns/op new ns/op delta
BenchmarkIndirectIndex_UnmarshalBinary-8 62462250 67710057 +8.40%
BenchmarkIndirectIndex_Entries-8 9601 9239 -3.77%
BenchmarkIndirectIndex_ReadEntries-8 5984 5964 -0.33%
BenchmarkIndirectIndex_DeleteRangeLast-8 314 317 +0.96%
BenchmarkIndirectIndex_DeleteRangeFull-8 813838165 615346992 -24.39%
BenchmarkIndirectIndex_Delete-8 52079181 52906315 +1.59%
benchmark old allocs new allocs delta
BenchmarkIndirectIndex_UnmarshalBinary-8 35 35 +0.00%
BenchmarkIndirectIndex_Entries-8 1 1 +0.00%
BenchmarkIndirectIndex_ReadEntries-8 0 0 +0.00%
BenchmarkIndirectIndex_DeleteRangeLast-8 2 2 +0.00%
BenchmarkIndirectIndex_DeleteRangeFull-8 1532670 1038932 -32.21%
BenchmarkIndirectIndex_Delete-8 123 123 +0.00%
benchmark old bytes new bytes delta
BenchmarkIndirectIndex_UnmarshalBinary-8 11648760 11648760 +0.00%
BenchmarkIndirectIndex_Entries-8 32768 32768 +0.00%
BenchmarkIndirectIndex_ReadEntries-8 1 1 +0.00%
BenchmarkIndirectIndex_DeleteRangeLast-8 64 64 +0.00%
BenchmarkIndirectIndex_DeleteRangeFull-8 232738960 168112352 -27.77%
BenchmarkIndirectIndex_Delete-8 3936 3936 +0.00%
2019-01-07 11:00:35 -07:00
Jeff Wendling
ffd35ce1aa
tsm1: use a uint32 for offsets globally
...
benchmarks are flat.
2019-01-07 11:00:35 -07:00
Jeff Wendling
7a7a4b6d58
tsm1: remove offsets from mmap
...
benchmark old ns/op new ns/op delta
BenchmarkIndirectIndex_UnmarshalBinary-8 74525387 66439305 -10.85%
BenchmarkIndirectIndex_Entries-8 8892 9200 +3.46%
BenchmarkIndirectIndex_ReadEntries-8 5816 5691 -2.15%
BenchmarkIndirectIndex_DeleteRangeLast-8 1550 311 -79.94%
BenchmarkIndirectIndex_DeleteRangeFull-8 773649708 767030277 -0.86%
BenchmarkIndirectIndex_Delete-8 79755991 52015903 -34.78%
benchmark old allocs new allocs delta
BenchmarkIndirectIndex_UnmarshalBinary-8 35 35 +0.00%
BenchmarkIndirectIndex_Entries-8 1 1 +0.00%
BenchmarkIndirectIndex_ReadEntries-8 0 0 +0.00%
BenchmarkIndirectIndex_DeleteRangeLast-8 3 2 -33.33%
BenchmarkIndirectIndex_DeleteRangeFull-8 1532589 1532344 -0.02%
BenchmarkIndirectIndex_Delete-8 246 123 -50.00%
benchmark old bytes new bytes delta
BenchmarkIndirectIndex_UnmarshalBinary-8 11648760 11648760 +0.00%
BenchmarkIndirectIndex_Entries-8 32768 32768 +0.00%
BenchmarkIndirectIndex_ReadEntries-8 1 1 +0.00%
BenchmarkIndirectIndex_DeleteRangeLast-8 3264 64 -98.04%
BenchmarkIndirectIndex_DeleteRangeFull-8 232710448 232624208 -0.04%
BenchmarkIndirectIndex_Delete-8 4432 3936 -11.19%
2019-01-07 11:00:35 -07:00
Jeff Wendling
04605eb266
tsm1: speed up deleterange for large keys
...
rather than starting at the first key, do a binary search to the
first key. changes O(N) when deleting the largest key to O(log N).
benchmark old ns/op new ns/op delta
BenchmarkIndirectIndex_DeleteRangeFull-8 17884166763 738717473 -95.87%
2018-12-14 10:06:24 -07:00
Jeff Wendling
0d411023f2
config: clean up
...
- Breaks the weird cycle that existed with the EngineOptions
- Removes a bunch of useless parameters
- Moves around a bunch of defaults
2018-11-08 11:39:36 -07:00
Jacob Marble
b6a1c0e9c7
storage: MeasurementStats.ReadFrom requires ByteReader
2018-10-19 14:16:20 -07:00
Ben Johnson
68450681ef
Add TSM1 measurement stats.
...
This commit generates an additional `.tss` stats file alongside each
TSM file when it is written that contains size stats for all measurements
within the TSM file. These files can be combined to generate stats for
all measurements across all TSM files.
2018-10-08 10:43:53 -06:00
Edd Robinson
074f263e08
Initial import of tsm1.Engine
2018-10-01 12:08:37 +01:00