Commit Graph

461 Commits (db/wait-timeout-utility)

Author SHA1 Message Date
Edd Robinson 301ab71ba0 Remove copy-on-write when caching bitmaps
In the case of caching TSI bitmaps belonging to immutable .tsi files,
the underlying bitset data can be mmapped. It is possible, though rare,
for this data to be unmapped (e.g., via a TSI compaction) but for the
cached bitmap to be subsequently read. This leads to a segfault.

This only happens when copy-on-write is set to true on the roaring
bitmap, because in that case only the internal pointers are cloned.

This change will reduce the TSI cache performance by around 10%, which I
have deemed to account for only a few microseconds typically.
2019-01-25 18:02:48 +00:00
Edd Robinson efdddbb31a Allow TSI bitset cache size to be configured
This commit adds a config option to the tsdb Config allowing the size of
the bitset cached in the TSI index to be specified.

Setting the cache size to 0 will disable the cache.
2019-01-24 17:41:45 +00:00
Edd Robinson e20541d2ba Expose functional option for setting TSI cache size 2019-01-23 17:15:48 +00:00
Edd Robinson 3a055a6107 Fix cardinality estimation error
This commit fixes an error in the TSI index with estimating the
cardinality of series recently added and then removed.
2019-01-10 17:46:30 +00:00
Jeff Wendling 259f3fe6e5 tsdb: consider measurement drops per shard on inmem 2018-11-27 16:59:17 -07:00
Jeff Wendling 0a2f6191a6 tsdb: clean up fields index for every kind of delete
Before this, if you deleted everything with `delete where true`
for example, then you would be left with all of your measurements
in the fields index. That would cause ghost fields to reappear
if someone reinserted to the measurement.

This fixes that by making it so the deepest most delete code
checks if the measurement was removed from the index, and if so
cleaning it up out of the fields index.

Additionally, it fixes bugs in that cleanup code where if you had
a measurement like "m1" and "m10", when iterating over the cache
or file store, "m1" would match "m10" due to it only checking the
prefix. This also has it check the character right after the
measurement to be either a comma because tags started, or the first
character of the field separator.
2018-11-27 16:12:06 -07:00
Edd Robinson cade59e253 Fix panic in IndexSet
This commit fixes a panic where a concurrent removal of a shard and meta
query could cause a `nil` index to be added to the IndexSet`.
2018-10-26 18:23:54 +01:00
ludweeg 5622355526 Simplify s[:] to s where s is a slice 2018-10-04 17:10:21 +03:00
Ben Johnson bdcbad3fc9
Fix append of possible nil iterator.
This commit updates an iterator list to ignore `nil` iterators.
Adding a `nil` caused the `SeriesIterators.Close()` to panic.
2018-10-02 13:19:21 -06:00
Ben Johnson 0d777ad423
Fix tsi1 sketch locking. 2018-09-26 17:01:47 -06:00
Edd Robinson 812ac6da25 PR feedback 2018-09-18 15:58:38 -07:00
Edd Robinson a15bdeef92 Fix megacheck 2018-09-18 15:58:38 -07:00
Edd Robinson 76237d80f2 Address PR feedback 2018-09-18 15:58:38 -07:00
Ben Johnson e651153f1c Add TagValueSeriesIDCache.Delete(). 2018-09-18 15:58:38 -07:00
Ben Johnson fcbc03240a Inline mutex into TagValueSeriesIDCache. 2018-09-18 15:58:38 -07:00
Edd Robinson bdc293abdd Tidy up 2018-09-18 15:58:38 -07:00
Edd Robinson d8af622333 Add benchmark for TagSets across indexes 2018-09-18 15:58:38 -07:00
Edd Robinson 5c88a1dd0e Fix locking on cache 2018-09-18 15:58:38 -07:00
Edd Robinson 6d12f5d323 Debug 2018-09-18 15:58:38 -07:00
Edd Robinson 8af7c133db Refactor cache 2018-09-18 15:58:38 -07:00
Edd Robinson 1ae716b64e Use copy-on-write when cloning bitmaps
This commit sets the copy-on-write feature of the SeriesIDSets, such
that we can make immutable clones of underlying bitmaps efficiently. If
the original bitmap is modified then a copy will be made, which won't
affect the clone.
2018-09-18 15:58:38 -07:00
Edd Robinson baf35f2138 Add benchmarks for cache and option to disable 2018-09-18 15:58:38 -07:00
Edd Robinson 3f6ef0ba22 Update cached bitset results with new series ids
This commit ensures that cached bitset results at the Index level are
updated whenever new series ids are created that would belong in those
bitsets.

For example, if we have a cached bitset for the tuple {mem, region,
west}, and we add the series mem,host=prod,region=west then we would
update the cached bitset for {mem, region, west} with the series id of
the newly written series.
2018-09-18 15:58:38 -07:00
Edd Robinson 065d47e4f2 Return created series ids from LogFile insertion 2018-09-18 15:58:38 -07:00
Edd Robinson 52b5640a4a Add test for TagValueSeriesIDIterator 2018-09-18 15:58:38 -07:00
Edd Robinson 9fb301cf10 Add CreateSeriesListIfNotExists benchmark 2018-09-18 15:58:38 -07:00
Edd Robinson 2c4c79f110 Convert cache to LRU 2018-09-18 15:58:38 -07:00
Edd Robinson 2ae2157d02 debug 2018-09-18 15:58:38 -07:00
Edd Robinson 74b3d35e40 Basic cache 2018-09-18 15:58:38 -07:00
Edd Robinson 7d00a45ebf Don't allocate when reading tombstone SeriesID set 2018-09-18 15:58:38 -07:00
Edd Robinson ca07a38402 Add benchmark for TagValueSeriesIDIterator 2018-09-18 15:53:52 -07:00
Ben Johnson 88d006a18c
Remove TSI1 HLL sketches from heap.
This commit removes the HLL sketches on each `tsi1.LogFile` and
`tsi1.IndexFile` and instead caches the data at the `tsi1.Index`
level. This reduces the heap size significantly for servers with
many TSI-enabled shards.
2018-09-12 08:48:40 -06:00
Stuart Carnie 4e91c8d33d Revert: Unmap LogFile on successful open
Resolves a panic when attempting to sort the `logMeasurements` slice,
which holds on to mmap'd data.
2018-09-04 15:16:09 -06:00
Edd Robinson 9970620ee0 Unmap LogFile on successful open
Since we append to the file itself, once we have read the file in, we
can be done with the mmap'd data.

Ideally we can rework UnmarshalBinary and do away with the mmap
completely. That is future work.
2018-08-23 17:24:22 +01:00
Edd Robinson dece5b847f Refactor index names 2018-08-21 14:32:30 +01:00
Edd Robinson a67f15fad4 Promote DropSeriesGlobal to Index interface 2018-08-20 17:57:16 +01:00
Edd Robinson 035b26cadd Refactor DropSeriesGlobal 2018-08-20 16:37:55 +01:00
Edd Robinson 19a4f1c9b0 Fix megacheck 2018-07-31 15:22:54 +01:00
Edd Robinson 61af08abde Fix megacheck 2018-07-31 15:03:54 +01:00
Ben Johnson 5612511a8f
Use roaring.Bitmap.FromBuffer(), remove memory alignment. 2018-07-30 13:42:13 +00:00
Ben Johnson 66920a181a
Add legacy tsi1 uvarint encoding test. 2018-07-27 15:43:14 +01:00
Ben Johnson 80d01325f8
Refactor file set tag value iterators to support series sets & tombstones. 2018-07-26 23:48:27 +01:00
Ben Johnson cb828f0187
Fix roaring dependency, minor PR fixes. 2018-07-26 09:32:43 +01:00
Ben Johnson fdfd038401
Add roaring bitmaps to TSI index files. 2018-07-24 17:59:23 +01:00
Jeff Wendling d979518135 inmem: use radix sort for series ids 2018-07-17 12:31:12 -06:00
Jacob Marble ffe54d0239 Revert "Resolve deadlock"
This reverts commit 681f22b078.
2018-07-09 22:05:54 -07:00
Edd Robinson ad388a8fd8 Address PR feedback 2018-07-09 11:51:48 +01:00
Edd Robinson 11bea138f8 Restrict buffer size 2018-07-09 11:51:48 +01:00
Edd Robinson 96ed566e6c Store series ID sets in LogFile as bitmaps
This commit swaps out map[uint64]struct{} implementations for roaring
bitmaps, which in turn improves memory usage and read performance.

The bitmap implementation is abstracted such that for low cardinality
sets a simple slice of ids is used, to reduce in-use memory.
2018-07-09 11:51:48 +01:00
Edd Robinson 13f896b9ff Buffer writing of .tsl file with 128K buffer 2018-07-09 11:51:48 +01:00
Edd Robinson 3cf20823e9 Allow LogFile buffer size to be changed
When adding many series using offline tooling, it's likely that every
series involves an entry being appended to a LogFile. Typically an entry
is 11 or 12 bytes, but the default bufio.Writer buffer size is only 4K.

This means by default a write of 10,000 new series would involve ~30
buffer flushes.

This commit makes the buffer configurable, and sets the value in
`buildtsi` such that it reflects the number of series being written to
the LogFile.
2018-07-09 11:51:48 +01:00
Edd Robinson 681af04815 Optionally disable buffer flushing/file syncing
When running offline tooling, flushing buffers and syncing files on
every write to a `LogFile` is not necessary. Were a hard exit
with data loss to occur, the tooling can simply be run again.
2018-07-09 11:51:15 +01:00
Jacob Marble b7d5e2ecdf
Merge pull request #10050 from influxdata/jgm-delete-regex
Resolve deadlock deleting from many measurements concurrently
2018-07-08 17:01:33 -07:00
Jacob Marble 681f22b078 Resolve deadlock
TSI LogFile compactions occasionally race with insert and delete
operations because the index partition FileSet is retained needlessly by
the method that calls Partition.CheckLogFile.

In this change:
- TSI LogFile compaction respects enable/disable compactions
- Partition FileSet.Release before log compaction is triggered

An alternative to the second step is to handle log file compaction in a
new goroutine. Log file compaction errors would be logged and not
returned to the caller.

After this change, `DELETE FROM /regex/` does not deadlock; performance:
- 30s to delete 100 measurements
- 5m30s to delete 1000 measurements
2018-07-06 15:02:38 -07:00
Jacob Marble 2ac811e57e close objects without swallowing errors 2018-07-06 13:45:22 -07:00
Jacob Marble 0af22b5992 Partition receiver rename
Got tired of referring to Index Partitions as `i` instead of `p`.
2018-07-05 14:28:00 -07:00
Ben Johnson 979d790154
Implement bitset iterator 2018-07-05 09:01:22 -06:00
Ben Johnson fd5a2116d7
Flush/sync TSI1 WAL 2018-06-19 08:32:33 -06:00
Stuart Carnie e209a0a1f2 Restore "Performance optimization suggestions"
CLA confirmed
PR: https://github.com/influxdata/influxdb/pull/9836

This reverts commit 7215bad
2018-05-23 08:54:20 -07:00
Stuart Carnie 7215badfcd Revert "Performance optimization suggestions"
This reverts commit f82d53f75d.
2018-05-21 14:10:03 -07:00
chenjian.cj f82d53f75d Performance optimization suggestions 2018-05-21 13:30:32 -07:00
Jacob Marble 3f2ff742c0 Remove unused 'database' field 2018-05-18 09:22:43 -07:00
Jacob Marble 7f8b7af61e
Cleanup index memory footprint counting code (#9828)
* Fix IndexSet.DedupeInmemIndexes

* Cleanup index memory footprint code
2018-05-15 11:25:19 -07:00
Jacob Marble 200fda999f remove unused function parameters 2018-05-14 09:10:21 -07:00
Jacob Marble 0763d1789e Get inmem index bytes without double-counting 2018-05-10 11:33:52 -07:00
Jacob Marble e2f9413c8a count slice memory use with len, not cap 2018-05-10 11:33:52 -07:00
Jacob Marble 87d73d405c tsdb/SeriesFile: remove unused function param 2018-05-04 11:22:12 -07:00
Jacob Marble 2dc2b97fb9
tsdb/index: Add Bytes() methods (#9794) 2018-05-04 08:47:05 -07:00
Jonathan A. Sternberg 6607c29a02
Merge pull request #9649 from influxdata/js-eval-functions-in-where
Allow math functions to be used in the condition
2018-05-02 08:29:08 -05:00
Jonathan A. Sternberg 10ed277e7a
Merge pull request #9791 from influxdata/js-spread-stream-function
Optimize the spread function to process points iteratively instead of in batch
2018-05-01 15:08:34 -05:00
Jacob Marble fa24142467 tsdb/indx/inmem: Fix megacheck issue 2018-04-30 10:25:07 -07:00
Jonathan A. Sternberg 9d049c4b62 Optimize the spread function to process points iteratively instead of in batch 2018-04-30 11:25:29 -05:00
Jacob Marble b23e32321c Remove unused code in tsdb/index/inmem 2018-04-26 13:19:01 -07:00
Jacob Marble 4282bf2744 Remove unused function parameter 2018-04-26 13:19:01 -07:00
Jacob Marble 1c63c4a3da Fix tsdb/index/inmem benchmark tests 2018-04-25 08:51:28 -07:00
Jacob Marble 10a7ffb647
Check for errors from binary.Uvarint when reading TSI logs (#9705)
* Check for errors from binary.Uvarint when reading TSI logs

* also check len(parsed) == len(input)

* wrap binary.Uvarint

* make uvarint() more generally useful/used
2018-04-12 09:59:56 -07:00
Jonathan A. Sternberg 1f9227e20c Allow math functions to be used in the condition 2018-04-10 10:55:34 -05:00
Ben Johnson 92d38414f2
Add adjustable TSI log file size.
This commit adds the `max-index-log-file-size` configuration flag so
that users can restrict the maximum size of log files before compaction.
The default limit was also lowered from `5MB` to `1MB`. The original
size was set before we partitioned the index so the change reflects this.
2018-04-02 11:47:59 -06:00
Jonathan A. Sternberg a6741aaf6c Simplify tsi1/log_file.go according to megacheck 2018-03-09 11:00:46 -06:00
Ben Johnson 8e62e8d3bd
Fix TSI log file recovery. 2018-03-05 14:49:12 -07:00
Ben Johnson fb3187f62f
Merge pull request #9496 from influxdata/bj-fix-series-key-replay-after-delete
Fix panic on tsi1 log replay of deleted series.
2018-02-28 08:37:07 -07:00
Jonathan A. Sternberg 87ac8ad385
Merge pull request #9491 from influxdata/js-9290-index-boolean-literals
Evaluate a true boolean literal when calculating tag sets
2018-02-28 09:14:24 -06:00
Ben Johnson 567a35d364
Fix panic on tsi1 log replay of deleted series. 2018-02-28 08:06:30 -07:00
Jonathan A. Sternberg 6baf354818 Evaluate a true boolean literal when calculating tag sets 2018-02-28 08:08:21 -06:00
Jason Wilder 2896d210af Skip creating cursors for series not in a shard
There was a check in inmem TagSets to see if a series was assigned
to a shard to prevent cursors for non-existent series getting created.
This check was lost during TSI development because inmem Series tracking
was removed and then replaced with bitsets.  The bitsets were not
re-incorporated as before.  This adds the functionality back using
the bitsets.
2018-02-27 21:23:59 -07:00
Ben Johnson fee6149791
Merge pull request #9489 from influxdata/bj-dumptsi-cardinality
Add dumptsi path error handling.
2018-02-27 09:15:03 -07:00
Ben Johnson b3fcc63a78
Add dumptsi path error handling. 2018-02-27 08:30:12 -07:00
Edd Robinson 96c0ecf618 Improve startup time of `inmem` index
This commit improves the startup time when using the `inmem` index by
ensuring that the series are created in the index and series file in
batches of 10000, rather than individually.

Fixes #9486.
2018-02-27 13:33:00 +00:00
Stuart Carnie a74d296200 use underscore vs period, fix doc comment, add database name to CQ 2018-02-26 10:08:43 -07:00
Stuart Carnie d135aecf02 Generate trace logs for a number of significant influx operations
* tsdb Store.Open traces all events related to opening files
    * op.name : tsdb.open
* retention policy shard deletions
    * op.name : retention.delete_check
* all TSM compaction strategies
    * op.name : tsm1.compact_group
* series file compactions
    * op.name : series_partition.compaction
* continuous query execution (if logging enabled)
    * op.name : continuous_querier.execute
* TSI log file compaction
    * op_name: index.tsi.compact_log_file
* TSI level compaction
    * op.name: index.tsi.compact_to_level
2018-02-21 15:08:49 -07:00
Jason Wilder eeb0b967f9 Don't create series one at a time when limits in place
When a max series per data limit was in place (or 0), we would create
series one at a time which really affects throughput.  This does it
in bulk which is less accurate, but more performant.
2018-02-15 10:43:39 -07:00
Jason Wilder 67e65e50ff Remove inmem lastModified time
This was added for preventing concurrent writes and deletes to the
same series.  This is not handled by the bitsets for both tsi and
inmme.  The time.Now() calls shows up in profiles and is not needed.
2018-02-15 09:29:52 -07:00
Ben Johnson ed9c0576d4
Add series sketches, fix tombstones in index files. 2018-02-07 14:52:13 -07:00
Edd Robinson 0d164f3164
WIP - tsi integration sketches 2018-02-07 14:52:13 -07:00
Edd Robinson 7a55735562
Add option to set LogFile compaction size 2018-02-07 14:52:13 -07:00
Edd Robinson 544329380f
Add empty series sketches back to tsi1 index
This commit adds initial empty sketches back to the tsi1 index, as well
as ensuring that ephemeral sketches in the index `LogFile` are updated
accordingly.

The commit also adds a test that verifies that the merged sketches at
the store level produce the correct results under writes, deletions and
re-opening of the store.

This commit does not provide working sketches for post-compaction on the
tsi1 index.
2018-02-07 14:52:13 -07:00
Jason Wilder 20d429c62b Use cached tags when applying series entries 2018-01-30 16:02:50 -07:00
Ben Johnson da8568d86c
Remove unused field. 2018-01-30 10:34:29 -07:00
Ben Johnson a6d11585b3
Add TSI compaction interruption. 2018-01-30 10:34:17 -07:00
Ben Johnson 0652effb78
Interrupt TSI & Series File Compactions 2018-01-30 10:34:17 -07:00