Commit Graph

27 Commits (b25b4021c3b09b52db56af4ea42fca69182bcbde)

Author SHA1 Message Date
Stuart Carnie 31df76e1e9
refactor(tsm1): Add TimeRangeMaxTimeIterator
This commit introduces a new API for finding the maximum
timestamp of a series when iterating over the keys in a
set of TSM files.

This API will be used to determine the field type of a single
field key by selecting the series with the maximum timestamp.

It has also refactored the common functionality for iterating
TSM keys into `timeRangeBlockReader`, which is shared
between `TimeRangeIterator` and `TimeRangeMaxTimeIterator`.
2020-04-08 16:05:19 -07:00
Jeff Wendling 4096f93891 tsm1: implement reading and writing predicates in tombstone files 2019-05-01 13:40:40 -06:00
Jeff Wendling dcf797f111 tsm1: basic predicate implementation at index layer
Only wires it up. No tests, no tombstone tracking, nothing.
2019-05-01 13:40:40 -06:00
Stuart Carnie d3790aa072
feat: Teach storage engine how to find tag values for a given key
The TagValues API will perform a linear scan if there is no predicate;
otherwise, it will use the index to find a list of candidate series
keys.

TagValues expects the predicate to be transformed such that
`_measurement` and `_field` are remapped to `\x00` and `\xff`
respectively.

There is one TODO marked to analyze the predicate for a
`\x00 = '<measurement>'` pattern. If found, the predicate can be
eliminated and fall back to a linear prefix scan by combining the org,
bucket and measurement.
2019-04-18 16:19:18 -07:00
Stuart Carnie 35e0094a28
feat: TimeRangeIterator for checking if keys have data in a TSM file
The TimeRangeIterator permits linear or random index scans and
can answer whether the current key has data for the specified time
interval, considering any tombstones.

When there are no tombstones there are some opportunities for
optimization to skip decoding blocks. Specifically, if the
queried time interval overlaps any boundaries of the TSM index entries.
2019-04-18 16:19:18 -07:00
Jeff Wendling 0a85e3b0dd tsm1: add initial index cleanup to DeletePrefix 2019-01-08 16:32:43 -07:00
Jeff Wendling f712828016 tsm1: refactor and rename some methods 2019-01-08 14:52:30 -07:00
Jeff Wendling 8744a82665 tsm1: add DeletePrefix to the reader 2019-01-07 21:11:49 -07:00
Jeff Wendling f65b0933f6 tsm1: move code around into smaller files and add tests 2019-01-07 21:11:49 -07:00
Jeff Wendling fed3154506 tsm1: DeletePrefix on the indirectIndex 2019-01-07 21:08:32 -07:00
Jeff Wendling ad5352926f tsm1: log when error reading entries for tsm key 2019-01-07 11:00:35 -07:00
Jeff Wendling 9cdefa8e4f tsm1: fix staticcheck and refactor closure out 2019-01-07 11:00:35 -07:00
Jeff Wendling 1ffcd77342 tsm1: fix remaining issues and add small benchmarks
- notice when keys are deleted during iteration and return an error
- make sure all the consumers check the error
- add some benchmarks for small indexes to compare
- allow concurrent readers to flag deletes

benchmarks against base:

name                                           old time/op    new time/op    delta
IndirectIndex_UnmarshalBinary-8                  70.0ms ±17%    71.0ms ±12%      ~     (p=1.000 n=8+8)
IndirectIndex_DeleteRangeLast-8                  1.48µs ± 1%    0.28µs ± 5%   -81.29%  (p=0.000 n=8+7)
IndirectIndex_DeleteRangeFull/Large-8             786ms ± 1%     363ms ± 3%   -53.89%  (p=0.000 n=7+8)
IndirectIndex_DeleteRangeFull/Small-8            2.37ms ± 0%    1.14ms ± 3%   -52.02%  (p=0.000 n=7+8)
IndirectIndex_DeleteRangeFull_Covered/Large-8     384ms ± 2%     188ms ± 3%   -51.04%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull_Covered/Small-8     470µs ± 1%     190µs ± 1%   -59.71%  (p=0.000 n=8+7)
IndirectIndex_Delete/Large-8                     74.0ms ± 1%   128.7ms ± 1%   +73.80%  (p=0.001 n=7+7)
IndirectIndex_Delete/Small-8                      142µs ± 1%     130µs ± 1%    -8.24%  (p=0.000 n=8+8)

name                                           old alloc/op   new alloc/op   delta
IndirectIndex_UnmarshalBinary-8                  11.6MB ± 0%    11.7MB ± 0%    +0.02%  (p=0.000 n=8+7)
IndirectIndex_DeleteRangeLast-8                  3.26kB ± 0%   0.00kB ±NaN%  -100.00%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull/Large-8             233MB ± 0%     161MB ± 0%   -30.75%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull/Small-8            2.13MB ± 0%    1.40MB ± 0%   -34.53%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull_Covered/Large-8    12.4MB ± 0%     0.4MB ± 0%   -96.82%  (p=0.002 n=7+8)
IndirectIndex_DeleteRangeFull_Covered/Small-8     120kB ± 0%       0kB ± 0%   -99.89%  (p=0.000 n=8+8)
IndirectIndex_Delete/Large-8                     4.54kB ± 0%    0.21kB ± 0%   -95.26%  (p=0.000 n=8+8)
IndirectIndex_Delete/Small-8                      80.0B ± 0%     0.0B ±NaN%  -100.00%  (p=0.000 n=8+8)

name                                           old allocs/op  new allocs/op  delta
IndirectIndex_UnmarshalBinary-8                    35.0 ± 0%      42.0 ± 0%   +20.00%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeLast-8                    3.00 ± 0%     0.00 ±NaN%  -100.00%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull/Large-8             1.53M ± 0%     0.52M ± 0%   -65.98%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull/Small-8             15.2k ± 0%      5.2k ± 0%   -65.97%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull_Covered/Large-8       620 ± 0%       124 ± 0%   -80.00%  (p=0.002 n=7+8)
IndirectIndex_DeleteRangeFull_Covered/Small-8      10.0 ± 0%       2.0 ± 0%   -80.00%  (p=0.000 n=8+8)
IndirectIndex_Delete/Large-8                        246 ± 0%         1 ± 0%   -99.59%  (p=0.000 n=8+8)
IndirectIndex_Delete/Small-8                       4.00 ± 0%     0.00 ±NaN%  -100.00%  (p=0.000 n=8+8)
2019-01-07 11:00:35 -07:00
Jeff Wendling 14cf01911e tsm1: change TSMFile to use an iterator style api 2019-01-07 11:00:35 -07:00
Jeff Wendling 917584b054 tsm1: use readerOffsetsIterator for deletes
This reduces the amount of disk hits at some costs in cpu on some benchmarks. Notably, the
DeleteRangeFull_Covered and Delete benchmarks both went to approximately zero page faults
meaning they read from the index file linearly.

name                                     old time/op    new time/op    delta
IndirectIndex_UnmarshalBinary-8            68.8ms ±10%    63.1ms ±16%   -8.28%          (p=0.021 n=8+8)
IndirectIndex_Entries-8                    9.09µs ± 3%    9.62µs ± 1%   +5.84%          (p=0.000 n=8+7)
IndirectIndex_ReadEntries-8                5.86µs ± 1%    6.15µs ± 3%   +5.03%          (p=0.000 n=8+8)
IndirectIndex_DeleteRangeLast-8             562ns ± 6%     308ns ± 2%  -45.25%          (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull-8             363ms ±10%     376ms ± 5%     ~             (p=0.054 n=8+7)
IndirectIndex_DeleteRangeFull_Covered-8     574ms ± 2%     746ms ± 0%  +30.01%          (p=0.000 n=8+7)
IndirectIndex_Delete-8                     51.2ms ± 0%    88.2ms ± 0%  +72.38%          (p=0.000 n=8+7)

name                                     old alloc/op   new alloc/op   delta
IndirectIndex_UnmarshalBinary-8            11.7MB ± 0%    11.7MB ± 0%     ~     (all samples are equal)
IndirectIndex_Entries-8                    32.8kB ± 0%    32.8kB ± 0%     ~     (all samples are equal)
IndirectIndex_ReadEntries-8                0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
IndirectIndex_DeleteRangeLast-8            0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
IndirectIndex_DeleteRangeFull-8             162MB ± 0%     162MB ± 0%     ~             (p=0.798 n=8+8)
IndirectIndex_DeleteRangeFull_Covered-8    82.4MB ± 0%    82.4MB ± 0%     ~             (p=0.857 n=8+8)
IndirectIndex_Delete-8                     4.01kB ± 0%    4.04kB ± 0%   +0.90%          (p=0.000 n=8+8)

name                                     old allocs/op  new allocs/op  delta
IndirectIndex_UnmarshalBinary-8              42.0 ± 0%      42.0 ± 0%     ~     (all samples are equal)
IndirectIndex_Entries-8                      1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
IndirectIndex_ReadEntries-8                 0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
IndirectIndex_DeleteRangeLast-8             0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
IndirectIndex_DeleteRangeFull-8              522k ± 0%      522k ± 0%     ~             (p=0.743 n=8+8)
IndirectIndex_DeleteRangeFull_Covered-8     3.31k ± 0%     3.31k ± 0%     ~             (p=0.856 n=8+8)
IndirectIndex_Delete-8                        123 ± 0%       123 ± 0%     ~     (all samples are equal)

name                                     old speed      new speed      delta
IndirectIndex_DeleteRangeFull-8          18.1MB/s ± 9%  17.5MB/s ± 7%     ~             (p=0.105 n=8+8)
IndirectIndex_Delete-8                    116MB/s ± 0%     0MB/s ± 0%  -99.96%          (p=0.000 n=8+8)
2019-01-07 11:00:35 -07:00
Jeff Wendling 6f5c94f3f7 tsm1: introduce readerOffsets to manage the offsets slice
It exposes an API that will clean up the bodies of many methods and
provide a safe abstraction around iteration that will be able to
handle reads with concurrent deletes.

Benchmarks are flat.
2019-01-07 11:00:35 -07:00
Jeff Wendling f860305124 tsm1: keep first 8 bytes of each key in memory
Since most keys will share the first 8 bytes, we collapse them into
a slice containing partial sums of the counts. We can then binary search
into that slice to find the associated prefix for a given offset index.
Compressing in this way causes the overhead to be negligable and reduces
disk misses by about 30% in these benchmarks (500k series across 100 orgs).

name                                     old time/op    new time/op    delta
IndirectIndex_UnmarshalBinary-8            67.5ms ± 1%    64.6ms ± 1%   -4.33%          (p=0.000 n=8+7)
IndirectIndex_Entries-8                    9.41µs ± 2%    9.39µs ± 1%     ~             (p=0.959 n=8+8)
IndirectIndex_ReadEntries-8                5.99µs ± 1%    6.07µs ± 1%   +1.29%          (p=0.001 n=8+8)
IndirectIndex_DeleteRangeLast-8             369ns ± 2%     566ns ± 1%  +53.37%          (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull-8             368ms ± 9%     369ms ± 2%     ~             (p=0.232 n=8+7)
IndirectIndex_DeleteRangeFull_Covered-8     600ms ± 1%     618ms ± 0%   +3.03%          (p=0.000 n=8+7)
IndirectIndex_Delete-8                     50.0ms ± 1%    47.6ms ± 9%     ~             (p=0.463 n=7+8)

name                                     old alloc/op   new alloc/op   delta
IndirectIndex_UnmarshalBinary-8            11.6MB ± 0%    11.7MB ± 0%   +0.02%          (p=0.000 n=8+7)
IndirectIndex_Entries-8                    32.8kB ± 0%    32.8kB ± 0%     ~     (all samples are equal)
IndirectIndex_ReadEntries-8                0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
IndirectIndex_DeleteRangeLast-8            0.00B ±NaN%    0.00B ±NaN%     ~     (all samples are equal)
IndirectIndex_DeleteRangeFull-8             162MB ± 0%     162MB ± 0%     ~             (p=0.382 n=8+8)
IndirectIndex_DeleteRangeFull_Covered-8    82.4MB ± 0%    82.4MB ± 0%     ~             (p=0.776 n=8+8)
IndirectIndex_Delete-8                     4.01kB ± 0%    4.01kB ± 0%     ~     (all samples are equal)

name                                     old allocs/op  new allocs/op  delta
IndirectIndex_UnmarshalBinary-8              35.0 ± 0%      42.0 ± 0%  +20.00%          (p=0.000 n=8+8)
IndirectIndex_Entries-8                      1.00 ± 0%      1.00 ± 0%     ~     (all samples are equal)
IndirectIndex_ReadEntries-8                 0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
IndirectIndex_DeleteRangeLast-8             0.00 ±NaN%     0.00 ±NaN%     ~     (all samples are equal)
IndirectIndex_DeleteRangeFull-8              522k ± 0%      522k ± 0%     ~             (p=0.382 n=8+8)
IndirectIndex_DeleteRangeFull_Covered-8     3.31k ± 0%     3.31k ± 0%     ~             (p=0.457 n=8+8)
IndirectIndex_Delete-8                        123 ± 0%       123 ± 0%     ~     (all samples are equal)

name                                     old speed      new speed      delta
IndirectIndex_DeleteRangeFull-8          24.7MB/s ±10%  17.8MB/s ± 2%  -28.18%          (p=0.000 n=8+7)
IndirectIndex_DeleteRangeFull_Covered-8  14.2MB/s ± 1%   9.6MB/s ± 0%  -32.30%          (p=0.000 n=8+7)
IndirectIndex_Delete-8                    171MB/s ± 1%   126MB/s ±10%  -26.35%          (p=0.000 n=7+8)

IndirectIndex_DeleteRangeLast went from 17 page faults, or ~180GB/sec at 369ns/op
to zero page faults. So even though it got 50% slower, it was actually I/O bound
and no longer is.
2019-01-07 11:00:35 -07:00
Jeff Wendling 0becfc6239 tsm1: add helper to track page faults in index
Since the methods inline and dead code is eliminated, it has no runtime
overhead in the benchmarks when disabled.

benchmark                                  recorded faults
BenchmarkIndirectIndex_Entries-8           11
BenchmarkIndirectIndex_ReadEntries-8       11
BenchmarkIndirectIndex_DeleteRangeLast-8   17
BenchmarkIndirectIndex_DeleteRangeFull-8   2218
BenchmarkIndirectIndex_Delete-8            2084
2019-01-07 11:00:35 -07:00
Jeff Wendling 91e820a9d8 tsm1: fix multiple issues with DeleteRange
1. Correctly acquires locks
2. Seeks for discontiguous key ranges (like delete ["aaa", "zzz"])
3. Is precise about deleting a key when it contains no data

name                             old time/op    new time/op    delta
IndirectIndex_UnmarshalBinary-8    67.3ms ± 1%    63.2ms ±15%      ~             (p=0.463 n=7+8)
IndirectIndex_Entries-8            9.14µs ± 1%    9.01µs ± 0%    -1.40%          (p=0.004 n=8+7)
IndirectIndex_ReadEntries-8        5.83µs ± 1%    5.68µs ± 2%    -2.62%          (p=0.000 n=8+8)
IndirectIndex_DeleteRangeLast-8     283ns ± 2%     191ns ± 1%   -32.37%          (p=0.000 n=8+7)
IndirectIndex_DeleteRangeFull-8     612ms ± 1%     361ms ± 1%   -41.02%          (p=0.000 n=8+8)
IndirectIndex_Delete-8             49.0ms ± 1%    49.8ms ± 1%    +1.80%          (p=0.001 n=7+8)

name                             old alloc/op   new alloc/op   delta
IndirectIndex_UnmarshalBinary-8    11.6MB ± 0%    11.6MB ± 0%      ~     (all samples are equal)
IndirectIndex_Entries-8            32.8kB ± 0%    32.8kB ± 0%      ~     (all samples are equal)
IndirectIndex_ReadEntries-8        0.00B ±NaN%    0.00B ±NaN%      ~     (all samples are equal)
IndirectIndex_DeleteRangeLast-8     64.0B ± 0%     0.0B ±NaN%  -100.00%          (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull-8     168MB ± 0%     162MB ± 0%    -3.71%          (p=0.000 n=8+8)
IndirectIndex_Delete-8             3.94kB ± 0%    3.94kB ± 0%      ~     (all samples are equal)

name                             old allocs/op  new allocs/op  delta
IndirectIndex_UnmarshalBinary-8      35.0 ± 0%      35.0 ± 0%      ~     (all samples are equal)
IndirectIndex_Entries-8              1.00 ± 0%      1.00 ± 0%      ~     (all samples are equal)
IndirectIndex_ReadEntries-8         0.00 ±NaN%     0.00 ±NaN%      ~     (all samples are equal)
IndirectIndex_DeleteRangeLast-8      2.00 ± 0%     0.00 ±NaN%  -100.00%          (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull-8     1.04M ± 0%     0.52M ± 0%   -49.77%          (p=0.000 n=8+8)
IndirectIndex_Delete-8                123 ± 0%       123 ± 0%      ~     (all samples are equal)
2019-01-07 11:00:35 -07:00
Jeff Wendling d40c3e662f tsm1: use uint32 key for tombstones
rough, noisy benchmarks.

benchmark                                    old ns/op     new ns/op     delta
BenchmarkIndirectIndex_UnmarshalBinary-8     62462250      67710057      +8.40%
BenchmarkIndirectIndex_Entries-8             9601          9239          -3.77%
BenchmarkIndirectIndex_ReadEntries-8         5984          5964          -0.33%
BenchmarkIndirectIndex_DeleteRangeLast-8     314           317           +0.96%
BenchmarkIndirectIndex_DeleteRangeFull-8     813838165     615346992     -24.39%
BenchmarkIndirectIndex_Delete-8              52079181      52906315      +1.59%

benchmark                                    old allocs     new allocs     delta
BenchmarkIndirectIndex_UnmarshalBinary-8     35             35             +0.00%
BenchmarkIndirectIndex_Entries-8             1              1              +0.00%
BenchmarkIndirectIndex_ReadEntries-8         0              0              +0.00%
BenchmarkIndirectIndex_DeleteRangeLast-8     2              2              +0.00%
BenchmarkIndirectIndex_DeleteRangeFull-8     1532670        1038932        -32.21%
BenchmarkIndirectIndex_Delete-8              123            123            +0.00%

benchmark                                    old bytes     new bytes     delta
BenchmarkIndirectIndex_UnmarshalBinary-8     11648760      11648760      +0.00%
BenchmarkIndirectIndex_Entries-8             32768         32768         +0.00%
BenchmarkIndirectIndex_ReadEntries-8         1             1             +0.00%
BenchmarkIndirectIndex_DeleteRangeLast-8     64            64            +0.00%
BenchmarkIndirectIndex_DeleteRangeFull-8     232738960     168112352     -27.77%
BenchmarkIndirectIndex_Delete-8              3936          3936          +0.00%
2019-01-07 11:00:35 -07:00
Jeff Wendling ffd35ce1aa tsm1: use a uint32 for offsets globally
benchmarks are flat.
2019-01-07 11:00:35 -07:00
Jeff Wendling 7a7a4b6d58 tsm1: remove offsets from mmap
benchmark                                    old ns/op     new ns/op     delta
BenchmarkIndirectIndex_UnmarshalBinary-8     74525387      66439305      -10.85%
BenchmarkIndirectIndex_Entries-8             8892          9200          +3.46%
BenchmarkIndirectIndex_ReadEntries-8         5816          5691          -2.15%
BenchmarkIndirectIndex_DeleteRangeLast-8     1550          311           -79.94%
BenchmarkIndirectIndex_DeleteRangeFull-8     773649708     767030277     -0.86%
BenchmarkIndirectIndex_Delete-8              79755991      52015903      -34.78%

benchmark                                    old allocs     new allocs     delta
BenchmarkIndirectIndex_UnmarshalBinary-8     35             35             +0.00%
BenchmarkIndirectIndex_Entries-8             1              1              +0.00%
BenchmarkIndirectIndex_ReadEntries-8         0              0              +0.00%
BenchmarkIndirectIndex_DeleteRangeLast-8     3              2              -33.33%
BenchmarkIndirectIndex_DeleteRangeFull-8     1532589        1532344        -0.02%
BenchmarkIndirectIndex_Delete-8              246            123            -50.00%

benchmark                                    old bytes     new bytes     delta
BenchmarkIndirectIndex_UnmarshalBinary-8     11648760      11648760      +0.00%
BenchmarkIndirectIndex_Entries-8             32768         32768         +0.00%
BenchmarkIndirectIndex_ReadEntries-8         1             1             +0.00%
BenchmarkIndirectIndex_DeleteRangeLast-8     3264          64            -98.04%
BenchmarkIndirectIndex_DeleteRangeFull-8     232710448     232624208     -0.04%
BenchmarkIndirectIndex_Delete-8              4432          3936          -11.19%
2019-01-07 11:00:35 -07:00
Jeff Wendling 04605eb266 tsm1: speed up deleterange for large keys
rather than starting at the first key, do a binary search to the
first key. changes O(N) when deleting the largest key to O(log N).

benchmark                                    old ns/op       new ns/op     delta
BenchmarkIndirectIndex_DeleteRangeFull-8     17884166763     738717473     -95.87%
2018-12-14 10:06:24 -07:00
Jeff Wendling 0d411023f2 config: clean up
- Breaks the weird cycle that existed with the EngineOptions
- Removes a bunch of useless parameters
- Moves around a bunch of defaults
2018-11-08 11:39:36 -07:00
Jacob Marble b6a1c0e9c7 storage: MeasurementStats.ReadFrom requires ByteReader 2018-10-19 14:16:20 -07:00
Ben Johnson 68450681ef
Add TSM1 measurement stats.
This commit generates an additional `.tss` stats file alongside each
TSM file when it is written that contains size stats for all measurements
within the TSM file. These files can be combined to generate stats for
all measurements across all TSM files.
2018-10-08 10:43:53 -06:00
Edd Robinson 074f263e08 Initial import of tsm1.Engine 2018-10-01 12:08:37 +01:00