influxdb

Commit Graph

Author	SHA1	Message	Date
WeblWabl	982ae57f22	feat: Add error join for file writing in snapshots (#26004 ) (#26005 ) This PR adds an error join to help with handling multiple errors from snapshot file writers. (cherry picked from commit `4ad5e2aba7`)	2025-02-12 16:12:36 -06:00
WeblWabl	5149774e22	feat: Add error joins/returns (#25996 ) (#26000 ) This pr adds err handling for branch that did not specify os file removal errors previously. This is part of EAR #5819. (cherry picked from commit `306a184a8d`)	2025-02-12 09:56:20 -06:00
davidby-influx	dd7b4ce351	fix: move aside TSM file on errBlockRead (#25899 ) The error type check for errBlockRead was incorrect, and bad TSM files were not being moved aside when that error was encountered. Use errors.Join, errors.Is, and errors.As to correctly unwrap multiple errors. Closes https://github.com/influxdata/influxdb/issues/25838 (cherry picked from commit `800970490a`) Closes https://github.com/influxdata/influxdb/issues/25840	2025-01-22 14:10:14 -08:00
davidby-influx	c82d4f86ee	fix: do not leak file handles from Compactor.write (#25725 ) (#25740 ) There are a number of code paths in Compactor.write which on error can lead to leaked file handles to temporary files. This, in turn, prevents the removal of the temporary files until InfluxDB is rebooted, releasing the file handles. closes https://github.com/influxdata/influxdb/issues/25724 (cherry picked from commit `e974165d25`) closes https://github.com/influxdata/influxdb/issues/25739	2025-01-06 09:03:37 -08:00
Geoffrey Wossum	cb8cfe3510	fix: prevent retention service from hanging (#25077 ) * fix: prevent retention service from hanging (#25055) Fix issue that can cause the retention service to hang waiting on a `Shard.Close` call. When this occurs, no other shards will be deleted by the retention service. This is usually noticed as an increase in disk usage because old shards are not cleaned up. The fix adds to new methods to `Store`, `SetShardNewReadersBlocked` and `InUse`. `InUse` can be used to poll if a shard has active readers, which the retention service uses to skip over in-use shards to prevent the service from hanging. `SetShardNewReadersBlocked` determines if new read access may be granted to a shard. This is required to prevent race conditions around the use of `InUse` and the deletion of shards. If the retention service skips over a shard because it is in-use, the shard will be checked again the next time the retention service is run. It can be deleted on subsequent checks if it is no longer in-use. If the shards is stuck in-use, the retention service will not be able to delete the shards, which can be observed in the logs for manual intervention. Other shards can still be deleted by the retention service even if a shard is stuck with readers. This is a port of ad68ec8 from master-1.x to main-2.x. closes: #25076 (cherry picked from commit `b4bd607eef`)	2024-06-24 12:27:22 -05:00
davidby-influx	0a4d41bc90	fix: ensure TSMBatchKeyIterator and FileStore close all TSMReaders (#24957 ) (#24964 ) Do not let errors on closing a TSMReader prevent other closes. (cherry picked from commit `82cbdb5478`) closes https://github.com/influxdata/influxdb/issues/24961	2024-05-06 10:45:41 -07:00
Sam Arnold	4de89afd37	refactor: remove dead iterator code (#23887 ) * fix: codegen without needing goimports * refactor: remove dead code	2022-11-09 19:26:12 -05:00
davidby-influx	7ad612b0d7	fix: discard excessive errors (#22379 ) (#22391 ) The tsmBatchKeyIterator discards excessive errors to avoid out-of-memory crashes when compacting very corrupt files. Any error beyond DefaultMaxSavedErrors (100) will be discarded instead of appended to the error slice. closes https://github.com/influxdata/influxdb/issues/22328 (cherry picked from commit `e53f75e06d`) closes https://github.com/influxdata/influxdb/issues/22381	2021-09-03 14:57:36 -07:00
davidby-influx	9923d2e8d5	fix: avoid compaction queue stats flutter (#22235 ) When the compaction planner runs, if it cannot acquire a lock on the files it plans to compact, it returns a nil list of compaction groups. This, in turn, sets the engine statistics for compactions queues to zero, which is incorrect. Instead, use the length of pending files which would have been returned. closes https://github.com/influxdata/influxdb/issues/22138 (cherry picked from commit `7d3efe1e9e`) closes https://github.com/influxdata/influxdb/issues/22141	2021-08-17 14:03:54 -07:00
davidby-influx	a78729b2ff	chore: add logging to compaction (#21707 ) (#21900 ) Compaction logging will generate intermediate information on volume of data written and output files created, as well as improve some of the anti-entropy messages related to compaction. Closes https://github.com/influxdata/influxdb/issues/21704 (cherry picked from commit `73bdb2860e`) Closes https://github.com/influxdata/influxdb/issues/21706	2021-07-21 09:43:21 -07:00
Yun Zhao	2116332950	fix(tsm1): fix calculation of tsmFullCompactionQueue statistic (#20897 ) Co-authored-by: zhaoyun.248 <zhaoyun.248@bytedance.com>	2021-06-04 10:26:37 -04:00
sans	7dcaf5c639	fix: typos (#19734 )	2020-10-13 09:50:32 -07:00
Stuart Carnie	dee8977d2c	chore: move v2/v1/tsdb → v2/tsdb	2020-08-26 10:46:47 -07:00
Mark Rushakoff	f2898d1992	Wipe out workspace in preparation for v2 merge "Knock knock." "Who's there?" "InfluxDB Veet." ...	2019-01-11 10:38:50 -08:00
Ben Johnson	40db64d0b9	Limit force-full and cold compaction size. This commit limits the number of files that can be compacted in a single group when forcing a full compaction or when a shard becomes cold. This is to prevent too many files being compacted at the same time.	2018-12-05 10:18:56 -07:00
Edd Robinson	be662a5853	Fix TSM index maxtime modification	2018-10-29 15:44:31 +00:00
Edd Robinson	5054d6fae4	Address PR feedback	2018-10-16 13:37:49 +01:00
Edd Robinson	09da18c08e	Add TSM batch key iterator The batch focussed TSM key iterator iterates TSM blocks, decoding and merging blocks where appropriate using the the batch focussed approaches.	2018-10-16 12:08:12 +01:00
Jeff Wendling	1c0e49e002	tsm1: ensure all written tsm files are fsynced we were asserting to an *os.File in order to call Sync, but in some cases the file handle has been wrapped, for example with limiting. instead, assert to minimal interfaces for the functionality we need and attempt to add some robustness in the code that creates the writers by using a stronger interface with a Sync method. fixes #9991	2018-06-25 11:36:22 -06:00
Jacob Marble	bb313765e4	tsdb/tsm1: Clean up TSM filename format/parse	2018-05-29 09:57:48 -07:00
Jeff Wendling	1a8931af42	Merge pull request #9841 from influxdata/jmw-ensure-no-race-conditions tsm1: ensure some race conditions are impossible	2018-05-16 11:56:10 -06:00
Jeff Wendling	7d2bb19b74	tsm1: ensure some race conditions are impossible The InUse call on TSMFiles is inherently racy in the presence of Ref calls outside of the file store mutex. In addition, we return some TSMFiles to callers without them being Ref'd which might allow them to be closed from underneath. While I believe it is the case that it would be impossible, as the only thing that gets a handle externally is compaction, and compaction enforces that only one handle exists at a time, and thus is only deleted once after the compaction is done with it, it's not very obvious or enforced. Instead, always return a TSMFile with a Ref call under the read lock, and require that no one else calls Ref. That way, it cannot transition to referenced if the InUse call returns false under the write lock. The CreateSnapshot method was racy in a number of ways in the presence of multiple calls or compactions: it did not take references to the TSMFiles, and the temporary directory it creates could have been shared with concurrent CreateSnapshot calls. In addition, the files slice could have been concurrently mutated during a compaction as well. Instead, under the write lock, make a local copy of the state for the compaction, including Ref calls (write locks are implicitly read locks). Then, there is no need for a lock at all afterward. Add some comments to explain these issues at the call sites of InUse, and document that the Files method that returns the slice unprotected is only for tests.	2018-05-14 19:45:42 -06:00
Ben Johnson	35a64dee99	Inject tsm file naming.	2018-05-14 10:46:38 -06:00
Jason Wilder	0eb6564e79	Add extension point to swap out the compaction planner	2018-03-21 15:51:00 -06:00
hpbieker	c892bf15a1	Fix missing sorting of blocks when compacting.	2018-01-03 10:21:11 +01:00
Jason Wilder	2d85ff1d09	Adjust compaction planning Increase level 1 min criteria, fix only fast compactions getting run, and fix very large generations getting included in optimize plans.	2017-12-14 22:41:34 -07:00
Jason Wilder	749c9d2483	Rate limit disk IO when writing TSM files This limits the disk IO for writing TSM files during compactions and snapshots. This helps reduce the spiky IO patterns on SSDs and when compactions run very quickly.	2017-12-14 22:02:32 -07:00
Jason Wilder	7dc5327a0a	Adjust snapshot concurrency by latency This changes the approach to adjusting the amount of concurrency used for snapshotting to be based on the snapshot latency vs cardinality. The cardinality approach could use too much concurrency and increase the number of level 1 TSM files too quickly which incurs more disk IO. The latency model seems to adjust better to different workloads.	2017-12-13 13:17:56 -07:00
Jason Wilder	9f2a422039	Use disk based TSM index more selectively The disk based temp index for writing a TSM file was used for compactions other than snapshot compactions. That meant it was used even for smaller compactiont that would not use much memory. An unintended side-effect of this is higher disk IO when copying the index to the final file. This switches when to use the index based on the estimated size of the new index that will be written. This isn't exact, but seems to work kick in at higher cardinality and larger compactions when it is necessary to avoid OOMs.	2017-12-06 13:45:43 -07:00
Jason Wilder	0a85ce2b73	Schedule compactions less aggressively This runs the scheduler every 5s instead of every 1s as well as reduces the scope of a level 1 plan.	2017-12-06 13:45:43 -07:00
Jason Wilder	9c1d7d00a9	Switch O_SYNC to periodic fsync O_SYNC was added with writing TSM files to fix an issue where the final fsync at the end cause the process to stall. This ends up increase disk util to much so this change switches to use multiple fsyncs while writing the TSM file instead of O_SYNC or one large one at the end.	2017-12-06 09:35:24 -07:00
Jason Wilder	b6096414c2	Fix compactions aborting early If there were many individual deletes to a series that ended up deleting every value in the block and the tombstone timestamps were not contigous, it was possible for the TSMKeyIterator to return false for Next incorrectly. This causes the compaction to drop any remaining data in the file. Normally, if all the data is deleted via tombstones, we remove the whole key from the TSM index. In this case, we're not able to determine that the key is fully deleted until the block is decode and tombstones are applied. This changes the TSMKeyIterator to detect this condition and continue to the next key instead of aborting.	2017-11-30 14:38:09 -07:00
Edd Robinson	12a2ff7fac	Add support for TSI shard streaming and shard size This commit firstly ensures that a shard's size on disk is accurately reported when using the tsi1 index, by including the on-disk size of the tsi1 index in the calculation. Secondly, this commit add support for shard streaming/copying when using the tsi1 index. Prior to this, a tsi1 index would not be correctly restored when streaming shards.	2017-11-28 15:57:02 +00:00
Jason Wilder	97e0d496a6	Add capability to force a full compaction This adds the capability to the engine to force a full compaction to be scheduled. When called, it snapshots any data in the cache, aborts running compactions and prevents level plans from returning level plans.	2017-11-15 07:14:27 -07:00
Jason Wilder	9f102adabe	Abort BlockIterator iteration if deletes detected This fixes a potential bug where the BlockIterator would skip blocks if the underlying TSMReader had deletes on it concurrently. This could possibly occur due to changes in `91eb9de3` that now use the existing TSMReaders from the FileStore instead of creating new ones during compaction.	2017-10-18 18:16:37 -06:00
lrita	2f0aa4a420	remove duplicated code in cacheKeyIterator.encode()	2017-10-13 20:39:15 +08:00
Jason Wilder	00a403f60e	Reduce allocation in tsmKeyIterator.Next This reuses some intermediate buffers and structs while compacting files.	2017-10-04 17:35:56 -06:00
Jason Wilder	06226d6fd3	Handle orphan lower level TSM files during full planning Some files seem to get orphan behind higher levels. This causes the compactions to get blocked as the lowere level files will not get picked up by their lower level planners. This allows the full plan to identify them and pull them into their plans.	2017-10-04 08:13:14 -06:00
Jason Wilder	0c0505881f	Remove multiple file skipping for full compaction planning This check doesn't make sense for high cardinality data as the files typically get big and sparse very quickly. This causes a lot of extra disk space to be used which is taken up by large indexes and sparse data.	2017-10-03 10:48:14 -06:00
Jason Wilder	4ff4ba0841	Use first file in generation for level With higher cardinality or larger series keys, the files can roll over early which causes them to take longer to be compacted by higher levels. This causes larger disk usage and higher numbers of tsm files at times.	2017-10-03 10:48:14 -06:00
Jason Wilder	2c5006fccc	Rework snapshotting concurrency This switches the thresholds that are used for writing snapshots concurrently. This scales better than the prior model.	2017-10-03 10:48:14 -06:00
Jason Wilder	70817350b7	Ensure temp index files are cleaned up on error	2017-10-03 10:48:14 -06:00
Jason Wilder	ae821f4e2d	Rework compaction scheduling This changes the compaction scheduling to better utilize the available cores that are free. Previously, a level was planned in its own goroutine and would kick off a number of compactions groups. The problem with this model was that if there were 4 groups, and 3 completed quickly, the planning would be blocked for that level until the last group finished. If the compactions at the prior level are running more quickly, a large backlog could accumlate. This now moves the planning to a single goroutine that plans each level in succession and starts as many groups as it can. When one group finishes, the planning will start the next group for the level.	2017-10-03 10:48:13 -06:00
Jason Wilder	1610ae5727	Don't return tsm files part of a compaction plan	2017-10-03 10:48:13 -06:00
Jason Wilder	122a74c692	Use synchronous IO for wal and tsm writing The fysncs due to large writes when writing to TSM files and the WAL can eventually cause large pauses. Since we already buffer writes, using synchronous IO reduces fsync latency by ensuring the individiual writes hit disk. This spreads out the latecncy across multiple writes better.	2017-09-25 12:44:57 -06:00
Jason Wilder	db204f3eb7	Default concurrent compactions to 50% of available cores	2017-09-21 12:48:11 -06:00
Jason Wilder	796de3dcea	Reduce encoder pool checkout contention With higher cardinalities, the encoder pools where become a bottleneck. This changes the snapshot compactions ot checkout one encoder of each type and re-use it while writing the snapshots as opposed to repeatedly checking it out and in.	2017-09-19 15:27:26 -06:00
Jason Wilder	391a6288c6	Write parallel snapshot for higher cardinalities	2017-09-19 15:27:26 -06:00
Jason Wilder	ddeba2c86b	Split large snapshots and write concurrently	2017-09-19 15:27:25 -06:00
Jason Wilder	91eb9de341	Use existing TSMReader from file store during compactions Compactions would create their own TSMReaders for simplicity. With very high cardinality compactions, creating the reader and indirectIndex can start to use a significant amount of memory. This changes the compactions to use a reader that is already allocated and managed by the FileStore.	2017-09-11 15:29:25 -06:00

1 2 3 4

160 Commits (db/update-cross-builder)