influxdb

Commit Graph

Author	SHA1	Message	Date
Geoffrey Wossum	1fbe319080	fix: reduce excessive CPU usage during compaction planning (#26432 ) Co-authored-by: devanbenz <devandbenz@gmail.com>	2025-05-27 16:55:20 -05:00
Geoffrey Wossum	66f4dbeaad	fix: limit number of concurrent optimized compactions (#26319 ) Limit number of concurrent optimized compactions so that level compactions do not get starved. Starved level compactions result in a sudden increase in disk usage. Add [data] max-concurrent-optimized-compactions for configuring maximum number of concurrent optimized compactions. Default value is 1. Co-authored-by: davidby-influx <dbyrne@influxdata.com> Co-authored-by: devanbenz <devandbenz@gmail.com> Closes: #26315	2025-05-06 15:42:39 -05:00
WeblWabl	96e44cac73	fix: PlanOptimize is running too frequently (#26211 ) PlanOptimize is being checked far too frequently. This PR is the simplest change that can be made in order to ensure that PlanOptimize is not being ran too much. To alleviate the frequency I've added a lastWrite parameter to PlanOptimize and added an additional test that mocks the edge cause out in the wild that led to this PR. Previously in test cases for PlanOptimize I was not checked to see if certain cases would be picked up by Plan I've adjusted a few of the existing test cases after modifying Plan and PlanOptimize to have the same lastWrite time.	2025-04-08 12:22:29 -05:00
WeblWabl	d8bcbd894c	feat: Add CompactPointsPerBlock config opt (#26100 ) * feat: Add CompactPointsPerBlock config opt This PR adds an additional parameter for influxd CompactPointsPerBlock. It adjusts the DefaultAggressiveMaxPointsPerBlock to 10,000. We had discovered that with the points per block set to 100,000 compacted TSM files were increasing. After modifying the points per block to 10,000 we noticed that the file sizes decreased. The value has been set as a parameter that can be adjusted by administrators this allows there to be some tuning if compression problems are encountered.	2025-03-05 14:59:06 -06:00
davidby-influx	2ab5aad52e	chore: add logging to Filestore.purger (#26089 ) Also fixes error type checks in TestCompactor_CompactFull_InProgress	2025-03-05 11:46:07 -08:00
davidby-influx	800970490a	fix: move aside TSM file on errBlockRead (#25839 ) The error type check for errBlockRead was incorrect, and bad TSM files were not being moved aside when that error was encountered. Use errors.Join, errors.Is, and errors.As to correctly unwrap multiple errors. Closes https://github.com/influxdata/influxdb/issues/25838	2025-01-22 10:46:31 -08:00
WeblWabl	f04105bede	feat: Modify optimized compaction to cover edge cases (#25594 ) * feat: Modify optimized compaction to cover edge cases This PR changes the algorithm for compaction to account for the following cases that were not previously accounted for: - Many generations with a groupsize over 2 GB - Single generation with many files and a groupsize under 2 GB - Where groupsize is the total size of the TSM files in said shard directory. - shards that may have over a 2 GB group size but many fragmented files (under 2 GB and under aggressive point per block count) closes https://github.com/influxdata/influxdb/issues/25666	2025-01-14 14:51:09 -06:00
davidby-influx	e974165d25	fix: do not leak file handles from Compactor.write (#25725 ) There are a number of code paths in Compactor.write which on error can lead to leaked file handles to temporary files. This, in turn, prevents the removal of the temporary files until InfluxDB is rebooted, releasing the file handles. closes https://github.com/influxdata/influxdb/issues/25724	2025-01-03 14:43:41 -08:00
Geoffrey Wossum	b4bd607eef	fix: prevent retention service from hanging (#25055 ) * fix: prevent retention service from hanging Fix issue that can cause the retention service to hang waiting on a `Shard.Close` call. When this occurs, no other shards will be deleted by the retention service. This is usually noticed as an increase in disk usage because old shards are not cleaned up. The fix adds to new methods to `Store`, `SetShardNewReadersBlocked` and `InUse`. `InUse` can be used to poll if a shard has active readers, which the retention service uses to skip over in-use shards to prevent the service from hanging. `SetShardNewReadersBlocked` determines if new read access may be granted to a shard. This is required to prevent race conditions around the use of `InUse` and the deletion of shards. If the retention service skips over a shard because it is in-use, the shard will be checked again the next time the retention service is run. It can be deleted on subsequent checks if it is no longer in-use. If the shards is stuck in-use, the retention service will not be able to delete the shards, which can be observed in the logs for manual intervention. Other shards can still be deleted by the retention service even if a shard is stuck with readers. closes: #25054	2024-06-13 11:07:17 -05:00
Sam Arnold	9e9f1be574	fix: remove dead iterator (#23888 )	2022-11-09 16:24:01 -05:00
davidby-influx	7d3efe1e9e	fix: avoid compaction queue stats flutter. (#22195 ) When the compaction planner runs, if it cannot acquire a lock on the files it plans to compact, it returns a nil list of compaction groups. This, in turn, sets the engine statistics for compactions queues to zero, which is incorrect. Instead, use the length of pending files which would have been returned. closes https://github.com/influxdata/influxdb/issues/22138	2021-08-16 09:21:07 -07:00
davidby-influx	73bdb2860e	chore: add logging to compaction (#21707 ) Compaction logging will generate intermediate information on volume of data written and output files created, as well as improve some of the anti-entropy messages related to compaction. This will also apply to `influx_tools compact` Closes https://github.com/influxdata/influxdb/issues/21704	2021-06-16 15:28:44 -07:00
Ayan George	c0e391935d	fix(tsdb): address staticcheck warning SA4006 (#18127 ) This commit addresses staticcheck warning SA4006 "this value of VAR is never used" by removing unused variables	2020-05-18 13:11:44 -04:00
Ben Johnson	3f59d4be8e	fix(tsdb): Fix error lists	2020-04-01 14:54:39 -06:00
tmgordeeva	f1d26652e9	fix(storage): skip TSM files with block read errors (#15885 ) * fix(storage): skip TSM files with block read errors When we find a bad TSM file during compaction, propagate the error up and move the bad file aside. The engine will disregard the file so the next compaction will not hit the same error.	2019-12-13 15:05:39 -08:00
Stuart Carnie	86734e7fcd	fix(storage): Don't panic when length of source slice is too large StringArrayEncodeAll will panic if the total length of strings contained in the src slice is > 0xffffffff. This change adds a unit test to replicate the issue and an associated fix to return an error. This also raises an issue that compactions will be unable to make progress under the following condition: * multiple string blocks are to be merged to a single block and * the total length of all strings exceeds the maximum block size that snappy will encode (0xffffffff) The observable effect of this is errors in the logs indicating a compaction failure. Fixes #13687	2019-06-04 17:05:01 -07:00
Ben Johnson	2dd913d71b	Revert "Limit force-full and cold compaction size." This reverts commit `40db64d0b9`.	2019-02-11 11:07:44 -07:00
Ben Johnson	40db64d0b9	Limit force-full and cold compaction size. This commit limits the number of files that can be compacted in a single group when forcing a full compaction or when a shard becomes cold. This is to prevent too many files being compacted at the same time.	2018-12-05 10:18:56 -07:00
Stuart Carnie	5d083887a5	chore: Compactor test which replicates issue #10465 Due to an encoding bug with simple8b, it is possible that the MaxTime for a TSM index entry does not match the last encoded timestamp.	2018-11-14 09:13:13 -07:00
Jacob Marble	0dc5393441	tsm/cache: Remove unused function parameter	2018-06-13 15:22:37 -07:00
Jacob Marble	bb313765e4	tsdb/tsm1: Clean up TSM filename format/parse	2018-05-29 09:57:48 -07:00
Jeff Wendling	ce565965a4	tsdb: avoid nil checks on the observer this avoids nil panics in the case that someone eventually forgets.	2018-05-23 13:15:41 -06:00
Jeff Wendling	1a8931af42	Merge pull request #9841 from influxdata/jmw-ensure-no-race-conditions tsm1: ensure some race conditions are impossible	2018-05-16 11:56:10 -06:00
Jeff Wendling	7d2bb19b74	tsm1: ensure some race conditions are impossible The InUse call on TSMFiles is inherently racy in the presence of Ref calls outside of the file store mutex. In addition, we return some TSMFiles to callers without them being Ref'd which might allow them to be closed from underneath. While I believe it is the case that it would be impossible, as the only thing that gets a handle externally is compaction, and compaction enforces that only one handle exists at a time, and thus is only deleted once after the compaction is done with it, it's not very obvious or enforced. Instead, always return a TSMFile with a Ref call under the read lock, and require that no one else calls Ref. That way, it cannot transition to referenced if the InUse call returns false under the write lock. The CreateSnapshot method was racy in a number of ways in the presence of multiple calls or compactions: it did not take references to the TSMFiles, and the temporary directory it creates could have been shared with concurrent CreateSnapshot calls. In addition, the files slice could have been concurrently mutated during a compaction as well. Instead, under the write lock, make a local copy of the state for the compaction, including Ref calls (write locks are implicitly read locks). Then, there is no need for a lock at all afterward. Add some comments to explain these issues at the call sites of InUse, and document that the Files method that returns the slice unprotected is only for tests.	2018-05-14 19:45:42 -06:00
Ben Johnson	35a64dee99	Inject tsm file naming.	2018-05-14 10:46:38 -06:00
Stuart Carnie	0f6e6fb9ef	Merge pull request #9192 from influxdata/sgc-writer ensure tsmWriter#Write returns ErrMaxBlocksExceeded	2018-02-01 15:39:01 -07:00
Hans Petter Bieker	7a273ccdb5	Fixed issue where compacting did not sort when block are unsorted and overlapping.	2018-01-04 15:25:26 +01:00
hpbieker	ee185e18b7	Added unit test TestCompactor_Compact_UnsortedBlocks.	2018-01-03 09:42:36 +01:00
Jason Wilder	2d85ff1d09	Adjust compaction planning Increase level 1 min criteria, fix only fast compactions getting run, and fix very large generations getting included in optimize plans.	2017-12-14 22:41:34 -07:00
Jason Wilder	7dc5327a0a	Adjust snapshot concurrency by latency This changes the approach to adjusting the amount of concurrency used for snapshotting to be based on the snapshot latency vs cardinality. The cardinality approach could use too much concurrency and increase the number of level 1 TSM files too quickly which incurs more disk IO. The latency model seems to adjust better to different workloads.	2017-12-13 13:17:56 -07:00
Jason Wilder	0a85ce2b73	Schedule compactions less aggressively This runs the scheduler every 5s instead of every 1s as well as reduces the scope of a level 1 plan.	2017-12-06 13:45:43 -07:00
Stuart Carnie	fffd123646	update unit test	2017-12-01 12:19:28 -07:00
Jason Wilder	b6096414c2	Fix compactions aborting early If there were many individual deletes to a series that ended up deleting every value in the block and the tombstone timestamps were not contigous, it was possible for the TSMKeyIterator to return false for Next incorrectly. This causes the compaction to drop any remaining data in the file. Normally, if all the data is deleted via tombstones, we remove the whole key from the TSM index. In this case, we're not able to determine that the key is fully deleted until the block is decode and tombstones are applied. This changes the TSMKeyIterator to detect this condition and continue to the next key instead of aborting.	2017-11-30 14:38:09 -07:00
Jason Wilder	97e0d496a6	Add capability to force a full compaction This adds the capability to the engine to force a full compaction to be scheduled. When called, it snapshots any data in the cache, aborts running compactions and prevents level plans from returning level plans.	2017-11-15 07:14:27 -07:00
Jason Wilder	17bae05370	Allow buffering tombstones before writing to disk	2017-11-13 08:48:03 -07:00
Jason Wilder	06226d6fd3	Handle orphan lower level TSM files during full planning Some files seem to get orphan behind higher levels. This causes the compactions to get blocked as the lowere level files will not get picked up by their lower level planners. This allows the full plan to identify them and pull them into their plans.	2017-10-04 08:13:14 -06:00
Jason Wilder	0c0505881f	Remove multiple file skipping for full compaction planning This check doesn't make sense for high cardinality data as the files typically get big and sparse very quickly. This causes a lot of extra disk space to be used which is taken up by large indexes and sparse data.	2017-10-03 10:48:14 -06:00
Jason Wilder	4ff4ba0841	Use first file in generation for level With higher cardinality or larger series keys, the files can roll over early which causes them to take longer to be compacted by higher levels. This causes larger disk usage and higher numbers of tsm files at times.	2017-10-03 10:48:14 -06:00
Jason Wilder	1610ae5727	Don't return tsm files part of a compaction plan	2017-10-03 10:48:13 -06:00
Jason Wilder	ddeba2c86b	Split large snapshots and write concurrently	2017-09-19 15:27:25 -06:00
Jason Wilder	7388eb9499	Use disk when writing TSM index	2017-09-11 15:29:25 -06:00
Jason Wilder	d3e832b462	Use offheap memory for indirect index offsets slice	2017-09-11 15:29:25 -06:00
Jason Wilder	91eb9de341	Use existing TSMReader from file store during compactions Compactions would create their own TSMReaders for simplicity. With very high cardinality compactions, creating the reader and indirectIndex can start to use a significant amount of memory. This changes the compactions to use a reader that is already allocated and managed by the FileStore.	2017-09-11 15:29:25 -06:00
Jason Wilder	778000435a	Conver all keys from string to []byte in TSM engine This switches all the interfaces that take string series key to take a []byte. This eliminates many small allocations where we convert between to two repeatedly. Eventually, this change should propogate futher up the stack.	2017-07-28 11:00:50 -06:00
Jason Wilder	18a02d50d7	Interrupt in progress TSM compactions When snapshots and compactions are disabled, the check to see if the compaction should be aborted occurs in between writing to the next TSM file. If a large compaction is running, it might take a while for the file to be finished writing causing long delays. This now interrupts compactions while iterating over the blocks to write which allows them to abort immediately.	2017-07-27 15:58:56 -06:00
Jason Wilder	4d002bb370	Limit concurrent compactions within a shard This changes full compactions within a shard to run sequentially instead of running all the compaction groups in parallel. Normally, there is only 1 full compaction group to run. At times, there could be several which causes instability if they are all running concurrently as they tie up a cpu for long periods of time. Level compactions are also capped to a max of 4 concurrently running for each level in a shard. This prevents sudden spikes in CPU and disk usage due to a large backlog of tsm files at a given level.	2017-05-12 14:05:24 -06:00
Jason Wilder	3d1c0cd981	Don't return compaction plans for files already part of a plan The compactor prevents the same file from being compacted by different compaction runs, but it can result in warning errors in the logs that are confusing. This adds compaction plan tracking to the planner so that files are only part of one plan at a given time.	2017-05-03 16:31:57 -06:00
Jason Wilder	675d7c9d65	Merge branch '1.2' into jw-merge12	2017-03-06 11:09:05 -07:00
Jason Wilder	eab012ef61	Fix points missing after compaction If blocks containing overlapping ranges of time where partially recombined, it was possible for the some points to get dropped during compactions. This occurred because the window of time of the points we need to merge did not account for the partial blocks created from a prior merge. Fixes #8084	2017-03-06 10:17:11 -07:00
Edd Robinson	292b30b82b	Fix subtle bugs and remove dead code from tsdb	2017-01-17 09:47:34 -08:00

1 2 3

103 Commits (db/shards_persisting_chaos_testing)