Commit Graph

154 Commits (master)

Author SHA1 Message Date
Sam Arnold 4de89afd37
refactor: remove dead iterator code (#23887)
* fix: codegen without needing goimports

* refactor: remove dead code
2022-11-09 19:26:12 -05:00
davidby-influx 7ad612b0d7
fix: discard excessive errors (#22379) (#22391)
The tsmBatchKeyIterator discards excessive errors to avoid
out-of-memory crashes when compacting very corrupt files.
Any error beyond DefaultMaxSavedErrors (100) will be
discarded instead of appended to the error slice.

closes https://github.com/influxdata/influxdb/issues/22328

(cherry picked from commit e53f75e06d)

closes https://github.com/influxdata/influxdb/issues/22381
2021-09-03 14:57:36 -07:00
davidby-influx 9923d2e8d5
fix: avoid compaction queue stats flutter (#22235)
When the compaction planner runs, if it cannot acquire
a lock on the files it plans to compact, it returns a
nil list of compaction groups. This, in turn, sets the
engine statistics for compactions queues to zero,
which is incorrect. Instead, use the length of pending
files which would have been returned.

closes https://github.com/influxdata/influxdb/issues/22138

(cherry picked from commit 7d3efe1e9e)

closes https://github.com/influxdata/influxdb/issues/22141
2021-08-17 14:03:54 -07:00
davidby-influx a78729b2ff
chore: add logging to compaction (#21707) (#21900)
Compaction logging will generate intermediate information on
volume of data written and output files created, as well as
improve some of the anti-entropy messages related to compaction.

Closes https://github.com/influxdata/influxdb/issues/21704

(cherry picked from commit 73bdb2860e)

Closes https://github.com/influxdata/influxdb/issues/21706
2021-07-21 09:43:21 -07:00
Yun Zhao 2116332950
fix(tsm1): fix calculation of tsmFullCompactionQueue statistic (#20897)
Co-authored-by: zhaoyun.248 <zhaoyun.248@bytedance.com>
2021-06-04 10:26:37 -04:00
sans 7dcaf5c639
fix: typos (#19734) 2020-10-13 09:50:32 -07:00
Stuart Carnie dee8977d2c
chore: move v2/v1/tsdb → v2/tsdb 2020-08-26 10:46:47 -07:00
Mark Rushakoff f2898d1992 Wipe out workspace in preparation for v2 merge
"Knock knock."

"Who's there?"

"InfluxDB Veet."

...
2019-01-11 10:38:50 -08:00
Ben Johnson 40db64d0b9
Limit force-full and cold compaction size.
This commit limits the number of files that can be compacted in
a single group when forcing a full compaction or when a shard
becomes cold. This is to prevent too many files being compacted
at the same time.
2018-12-05 10:18:56 -07:00
Edd Robinson be662a5853 Fix TSM index maxtime modification 2018-10-29 15:44:31 +00:00
Edd Robinson 5054d6fae4 Address PR feedback 2018-10-16 13:37:49 +01:00
Edd Robinson 09da18c08e Add TSM batch key iterator
The batch focussed TSM key iterator iterates TSM blocks, decoding and
merging blocks where appropriate using the the batch focussed
approaches.
2018-10-16 12:08:12 +01:00
Jeff Wendling 1c0e49e002 tsm1: ensure all written tsm files are fsynced
we were asserting to an *os.File in order to call Sync, but in some
cases the file handle has been wrapped, for example with limiting.
instead, assert to minimal interfaces for the functionality we need
and attempt to add some robustness in the code that creates the
writers by using a stronger interface with a Sync method.

fixes #9991
2018-06-25 11:36:22 -06:00
Jacob Marble bb313765e4 tsdb/tsm1: Clean up TSM filename format/parse 2018-05-29 09:57:48 -07:00
Jeff Wendling 1a8931af42
Merge pull request #9841 from influxdata/jmw-ensure-no-race-conditions
tsm1: ensure some race conditions are impossible
2018-05-16 11:56:10 -06:00
Jeff Wendling 7d2bb19b74 tsm1: ensure some race conditions are impossible
The InUse call on TSMFiles is inherently racy in the presence of
Ref calls outside of the file store mutex. In addition, we return
some TSMFiles to callers without them being Ref'd which might allow
them to be closed from underneath. While I believe it is the case
that it would be impossible, as the only thing that gets a handle
externally is compaction, and compaction enforces that only one
handle exists at a time, and thus is only deleted once after the
compaction is done with it, it's not very obvious or enforced.

Instead, always return a TSMFile with a Ref call under the read
lock, and require that no one else calls Ref. That way, it cannot
transition to referenced if the InUse call returns false under the
write lock.

The CreateSnapshot method was racy in a number of ways in the presence
of multiple calls or compactions: it did not take references to the
TSMFiles, and the temporary directory it creates could have been
shared with concurrent CreateSnapshot calls. In addition, the
files slice could have been concurrently mutated during a compaction
as well.

Instead, under the write lock, make a local copy of the state for
the compaction, including Ref calls (write locks are implicitly
read locks). Then, there is no need for a lock at all afterward.

Add some comments to explain these issues at the call sites of InUse,
and document that the Files method that returns the slice unprotected
is only for tests.
2018-05-14 19:45:42 -06:00
Ben Johnson 35a64dee99
Inject tsm file naming. 2018-05-14 10:46:38 -06:00
Jason Wilder 0eb6564e79 Add extension point to swap out the compaction planner 2018-03-21 15:51:00 -06:00
hpbieker c892bf15a1 Fix missing sorting of blocks when compacting. 2018-01-03 10:21:11 +01:00
Jason Wilder 2d85ff1d09 Adjust compaction planning
Increase level 1 min criteria, fix only fast compactions getting run,
and fix very large generations getting included in optimize plans.
2017-12-14 22:41:34 -07:00
Jason Wilder 749c9d2483 Rate limit disk IO when writing TSM files
This limits the disk IO for writing TSM files during compactions
and snapshots.  This helps reduce the spiky IO patterns on SSDs and
when compactions run very quickly.
2017-12-14 22:02:32 -07:00
Jason Wilder 7dc5327a0a Adjust snapshot concurrency by latency
This changes the approach to adjusting the amount of concurrency
used for snapshotting to be based on the snapshot latency vs
cardinality.  The cardinality approach could use too much concurrency
and increase the number of level 1 TSM files too quickly which incurs
more disk IO.

The latency model seems to adjust better to different workloads.
2017-12-13 13:17:56 -07:00
Jason Wilder 9f2a422039 Use disk based TSM index more selectively
The disk based temp index for writing a TSM file was used for
compactions other than snapshot compactions.  That meant it was
used even for smaller compactiont that would not use much memory.
An unintended side-effect of this is higher disk IO when copying
the index to the final file.

This switches when to use the index based on the estimated size of
the new index that will be written.  This isn't exact, but seems to
work kick in at higher cardinality and larger compactions when it
is necessary to avoid OOMs.
2017-12-06 13:45:43 -07:00
Jason Wilder 0a85ce2b73 Schedule compactions less aggressively
This runs the scheduler every 5s instead of every 1s as well as reduces
the scope of a level 1 plan.
2017-12-06 13:45:43 -07:00
Jason Wilder 9c1d7d00a9 Switch O_SYNC to periodic fsync
O_SYNC was added with writing TSM files to fix an issue where the
final fsync at the end cause the process to stall.  This ends up
increase disk util to much so this change switches to use multiple
fsyncs while writing the TSM file instead of O_SYNC or one large
one at the end.
2017-12-06 09:35:24 -07:00
Jason Wilder b6096414c2 Fix compactions aborting early
If there were many individual deletes to a series that ended up
deleting every value in the block and the tombstone timestamps
were not contigous, it was possible for the TSMKeyIterator to
return false for Next incorrectly.  This causes the compaction to
drop any remaining data in the file.

Normally, if all the data is deleted via tombstones, we remove the
whole key from the TSM index.  In this case, we're not able to determine
that the key is fully deleted until the block is decode and tombstones
are applied.

This changes the TSMKeyIterator to detect this condition and continue
to the next key instead of aborting.
2017-11-30 14:38:09 -07:00
Edd Robinson 12a2ff7fac Add support for TSI shard streaming and shard size
This commit firstly ensures that a shard's size on disk is accurately
reported when using the tsi1 index, by including the on-disk size of the
tsi1 index in the calculation.

Secondly, this commit add support for shard streaming/copying when using
the tsi1 index. Prior to this, a tsi1 index would not be correctly
restored when streaming shards.
2017-11-28 15:57:02 +00:00
Jason Wilder 97e0d496a6 Add capability to force a full compaction
This adds the capability to the engine to force a full compaction
to be scheduled.  When called, it snapshots any data in the cache,
aborts running compactions and prevents level plans from returning
level plans.
2017-11-15 07:14:27 -07:00
Jason Wilder 9f102adabe Abort BlockIterator iteration if deletes detected
This fixes a potential bug where the BlockIterator would skip blocks
if the underlying TSMReader had deletes on it concurrently.  This
could possibly occur due to changes in 91eb9de3 that now use the
existing TSMReaders from the FileStore instead of creating new ones
during compaction.
2017-10-18 18:16:37 -06:00
lrita 2f0aa4a420 remove duplicated code in cacheKeyIterator.encode() 2017-10-13 20:39:15 +08:00
Jason Wilder 00a403f60e Reduce allocation in tsmKeyIterator.Next
This reuses some intermediate buffers and structs while compacting
files.
2017-10-04 17:35:56 -06:00
Jason Wilder 06226d6fd3 Handle orphan lower level TSM files during full planning
Some files seem to get orphan behind higher levels.  This causes
the compactions to get blocked as the lowere level files will not
get picked up by their lower level planners.  This allows the full
plan to identify them and pull them into their plans.
2017-10-04 08:13:14 -06:00
Jason Wilder 0c0505881f Remove multiple file skipping for full compaction planning
This check doesn't make sense for high cardinality data as the files
typically get big and sparse very quickly.  This causes a lot of extra
disk space to be used which is taken up by large indexes and sparse
data.
2017-10-03 10:48:14 -06:00
Jason Wilder 4ff4ba0841 Use first file in generation for level
With higher cardinality or larger series keys, the files can roll
over early which causes them to take longer to be compacted by higher
levels.  This causes larger disk usage and higher numbers of tsm files
at times.
2017-10-03 10:48:14 -06:00
Jason Wilder 2c5006fccc Rework snapshotting concurrency
This switches the thresholds that are used for writing snapshots
concurrently.  This scales better than the prior model.
2017-10-03 10:48:14 -06:00
Jason Wilder 70817350b7 Ensure temp index files are cleaned up on error 2017-10-03 10:48:14 -06:00
Jason Wilder ae821f4e2d Rework compaction scheduling
This changes the compaction scheduling to better utilize the available
cores that are free.  Previously, a level was planned in its own goroutine
and would kick off a number of compactions groups.  The problem with this
model was that if there were 4 groups, and 3 completed quickly, the planning
would be blocked for that level until the last group finished.  If the compactions
at the prior level are running more quickly, a large backlog could accumlate.

This now moves the planning to a single goroutine that plans each level in
succession and starts as many groups as it can.  When one group finishes,
the planning will start the next group for the level.
2017-10-03 10:48:13 -06:00
Jason Wilder 1610ae5727 Don't return tsm files part of a compaction plan 2017-10-03 10:48:13 -06:00
Jason Wilder 122a74c692 Use synchronous IO for wal and tsm writing
The fysncs due to large writes when writing to TSM files and the
WAL can eventually cause large pauses.  Since we already buffer
writes, using synchronous IO reduces fsync latency by ensuring
the individiual writes hit disk.  This spreads out the latecncy
across multiple writes better.
2017-09-25 12:44:57 -06:00
Jason Wilder db204f3eb7 Default concurrent compactions to 50% of available cores 2017-09-21 12:48:11 -06:00
Jason Wilder 796de3dcea Reduce encoder pool checkout contention
With higher cardinalities, the encoder pools where become a bottleneck.
This changes the snapshot compactions ot checkout one encoder of each
type and re-use it while writing the snapshots as opposed to repeatedly
checking it out and in.
2017-09-19 15:27:26 -06:00
Jason Wilder 391a6288c6 Write parallel snapshot for higher cardinalities 2017-09-19 15:27:26 -06:00
Jason Wilder ddeba2c86b Split large snapshots and write concurrently 2017-09-19 15:27:25 -06:00
Jason Wilder 91eb9de341 Use existing TSMReader from file store during compactions
Compactions would create their own TSMReaders for simplicity. With
very high cardinality compactions, creating the reader and indirectIndex
can start to use a significant amount of memory.

This changes the compactions to use a reader that is already allocated
and managed by the FileStore.
2017-09-11 15:29:25 -06:00
Jason Wilder a8d9eeef36 Reduce lock contention when deleting high cardinality series
Deleting high cardinality series could take a very long time, cause
write timeouts as well as dead lock the process.  This fixes these
issue to by changing the approach for cleaning up the indexes and
reducing lock contention.

The prior approach delete each series and updated every index (inmem)
during the delete.  This was very slow and cause the index to be locked
while it items in a slice were removed one by one.  This has been changed
to mark series as deleted and then rebuild the index asynchronously which
speeds up the process.

There was also a dead lock that could occur when deleing the field set.
Deleting the field set held a write lock and the function it invoked under
the lock could try to take a read lock on the field set.  This would then
deadlock.  This approach was also very slow and caused time out for writes.
It now uses faster approach that checks for the existing of the measurment
in the cache and filestore which does not take write locks.
2017-09-07 11:36:02 -06:00
Jason Wilder e265d150be Fix leaking tmp file when large compaction aborted
If a large compaction was running and was aborted. It could would leave
some tmp files around for files that it had fully written.  The current
active file was cleaned up, but already completed ones would not.  This
would occur when a TSM file needed to rollover due to size.
2017-08-21 17:04:57 -06:00
Jason Wilder 61b13eb12b Fix partiallyRead logic
The partiallyRead func didn't account for the initial values and would
return true for blocks that had not been read at all.  This causes a
slower path during compactions that forces a block to be decoded when
it could just be merged as is without decoded.  This causes compactions
to consume more CPU and run slower at times.
2017-08-14 16:44:32 -06:00
Jason Wilder 778000435a Conver all keys from string to []byte in TSM engine
This switches all the interfaces that take string series key to
take a []byte.  This eliminates many small allocations where we
convert between to two repeatedly.  Eventually, this change should
propogate futher up the stack.
2017-07-28 11:00:50 -06:00
Jason Wilder 18a02d50d7 Interrupt in progress TSM compactions
When snapshots and compactions are disabled, the check to see if
the compaction should be aborted occurs in between writing to the
next TSM file.  If a large compaction is running, it might take
a while for the file to be finished writing causing long delays.

This now interrupts compactions while iterating over the blocks to
write which allows them to abort immediately.
2017-07-27 15:58:56 -06:00
Stuart Carnie eec80692c4 Taught tsm1 storage engine how to read and write uint64 values
* introduced UnsignedValue type
  * leveraged existing int64 compression algorithms (RLE, Simple 8B)
* tsm and WAL can read and write UnsignedValue
* compaction is aware of UnsignedValue
* unsigned support to model, cursors and write points

NOTE: there is no support to create unsigned points, as the line
protocol has not been modified.
2017-07-24 09:03:22 -07:00