influxdb

Commit Graph

Author	SHA1	Message	Date
Geoffrey Wossum	66f4dbeaad	fix: limit number of concurrent optimized compactions (#26319 ) Limit number of concurrent optimized compactions so that level compactions do not get starved. Starved level compactions result in a sudden increase in disk usage. Add [data] max-concurrent-optimized-compactions for configuring maximum number of concurrent optimized compactions. Default value is 1. Co-authored-by: davidby-influx <dbyrne@influxdata.com> Co-authored-by: devanbenz <devandbenz@gmail.com> Closes: #26315	2025-05-06 15:42:39 -05:00
WeblWabl	96e44cac73	fix: PlanOptimize is running too frequently (#26211 ) PlanOptimize is being checked far too frequently. This PR is the simplest change that can be made in order to ensure that PlanOptimize is not being ran too much. To alleviate the frequency I've added a lastWrite parameter to PlanOptimize and added an additional test that mocks the edge cause out in the wild that led to this PR. Previously in test cases for PlanOptimize I was not checked to see if certain cases would be picked up by Plan I've adjusted a few of the existing test cases after modifying Plan and PlanOptimize to have the same lastWrite time.	2025-04-08 12:22:29 -05:00
WeblWabl	d8bcbd894c	feat: Add CompactPointsPerBlock config opt (#26100 ) * feat: Add CompactPointsPerBlock config opt This PR adds an additional parameter for influxd CompactPointsPerBlock. It adjusts the DefaultAggressiveMaxPointsPerBlock to 10,000. We had discovered that with the points per block set to 100,000 compacted TSM files were increasing. After modifying the points per block to 10,000 we noticed that the file sizes decreased. The value has been set as a parameter that can be adjusted by administrators this allows there to be some tuning if compression problems are encountered.	2025-03-05 14:59:06 -06:00
davidby-influx	2ab5aad52e	chore: add logging to Filestore.purger (#26089 ) Also fixes error type checks in TestCompactor_CompactFull_InProgress	2025-03-05 11:46:07 -08:00
davidby-influx	1efb8dad43	fix: remove temp files on error in Compactor.writeNewFiles (#26074 ) Compactor.writeNewFiles should delete temporary files created on iterations before an error halts the compaction. closes https://github.com/influxdata/influxdb/issues/26073	2025-02-27 08:17:48 -08:00
davidby-influx	ba95c9b0f0	fix: ensure temp files removed on failed compaction (#26070 ) Add more robust temporary file removal on a failed compaction. Don't halt on a failed removal, and don't assume a failed compaction won't generate temporary files. closes https://github.com/influxdata/influxdb/issues/26068	2025-02-26 13:17:17 -08:00
davidby-influx	083b679b56	fix: ensure fields in memory match on disk A field could be created in memory but not saved to disk if a later field in that point was invalid (type conflict, too big) Ensure that if a field is created, it is saved.	2025-02-24 13:53:40 -08:00
davidby-influx	5f576331d3	chore: refactor field creation for maintainability Address review comments in the port work of the field creation. Also fixes one bug in returning the wrong error.	2025-02-18 14:00:11 -08:00
davidby-influx	5a20a835a5	fix: lock MeasurementFields while validating (#25998 ) There was a window where a race between writes with differing types for the same field were being validated. Lock the MeasurementFields struct during field validation to avoid this. closes https://github.com/influxdata/influxdb/issues/23756	2025-02-13 11:33:34 -08:00
WeblWabl	4ad5e2aba7	feat: Add error join for file writing in snapshots (#26004 ) This PR adds an error join to help with handling multiple errors from snapshot file writers.	2025-02-12 15:06:43 -06:00
WeblWabl	306a184a8d	feat: Add error joins/returns (#25996 ) This pr adds err handling for branch that did not specify os file removal errors previously. This is part of EAR #5819.	2025-02-11 12:15:25 -06:00
davidby-influx	800970490a	fix: move aside TSM file on errBlockRead (#25839 ) The error type check for errBlockRead was incorrect, and bad TSM files were not being moved aside when that error was encountered. Use errors.Join, errors.Is, and errors.As to correctly unwrap multiple errors. Closes https://github.com/influxdata/influxdb/issues/25838	2025-01-22 10:46:31 -08:00
WeblWabl	f04105bede	feat: Modify optimized compaction to cover edge cases (#25594 ) * feat: Modify optimized compaction to cover edge cases This PR changes the algorithm for compaction to account for the following cases that were not previously accounted for: - Many generations with a groupsize over 2 GB - Single generation with many files and a groupsize under 2 GB - Where groupsize is the total size of the TSM files in said shard directory. - shards that may have over a 2 GB group size but many fragmented files (under 2 GB and under aggressive point per block count) closes https://github.com/influxdata/influxdb/issues/25666	2025-01-14 14:51:09 -06:00
davidby-influx	e974165d25	fix: do not leak file handles from Compactor.write (#25725 ) There are a number of code paths in Compactor.write which on error can lead to leaked file handles to temporary files. This, in turn, prevents the removal of the temporary files until InfluxDB is rebooted, releasing the file handles. closes https://github.com/influxdata/influxdb/issues/25724	2025-01-03 14:43:41 -08:00
WeblWabl	45a8227ad6	fix(influxd): update xxhash, avoid stringtoslicebyte in cache (#578 ) (#25622 ) (#25624 ) * fix(influxd): update xxhash, avoid stringtoslicebyte in cache (#578) * fix(influxd): update xxhash, avoid stringtoslicebyte in cache This commit does 3 things: * it updates xxhash from v1 to v2; v2 includes a assembly arm version of Sum64 * it changes the cache storer to write with a string key instead of a byte slice. The cache only reads the key which WriteMulti already has as a string so we can avoid a host of allocations when converting back and forth from immutable strings to mutable byte slices. This includes updating the cache ring and ring partition to write with a string key * it updates the xxhash for finding the cache ring partition to use Sum64String which uses unsafe pointers to directly use a string as a byte slice since it only reads the string. Note: this now uses an assembly version because of the v2 xxhash update. Go 1.22 included new compiler ability to recognize calls of Method([]byte(myString)) and not make a copy but from looking at the call sites, I'm not sure the compiler would recognize it as the conversion to a byte slice was happening several calls earlier. That's what this change set does. If we are uncomfortable with any of these, we can do fewer of them (for example, not upgrade xxhash; and/or not use the specialized Sum64String, etc). For the performance issue in maz-rr, I see converting string keys to byte slices taking between 3-5% of cpu usage on both the primary and secondary. So while this pr doesn't address directly the increased cpu usage on the secondary, it makes cpu usage less on both which still feels like a win. I believe these changes are easier to review that switching to a byte slice pool that is likely needed in other places as the compiler provides nearly all of the correctness checks we need (we are relying also on xxhash v2 being correct). * helps #550 * chore: fix tests/lint * chore: don't use assembly version; should inline This 2 line change causes xxhash to use a purego Sum64 implementation which allows the compiler to see that Sum64 only read the byte slice input which them means is can skip the string to byte slice allocation and since it can skip that, it should inline all the calls to getPartitionStringKey and Sum64 avoiding 1 call to Sum64String which isn't inlined. * chore: update ci build file the ci build doesn't use the make file!!! * chore: revert "chore: update ci build file" This reverts commit 94be66fde03e0bbe18004aab25c0e19051406de2. * chore: revert "chore: don't use assembly version; should inline" This reverts commit 67d8d06c02e17e91ba643a2991e30a49308a5283. (cherry picked from commit 1d334c679ca025645ed93518b7832ae676499cd2) * feat: need to update go sum --------- Co-authored-by: Phil Bracikowski <13472206+philjb@users.noreply.github.com> (cherry picked from commit `06ab224516`)	2024-12-06 16:05:03 -06:00
Shiwen Cheng	1bc0eb4795	fix(tsm1): Fix data race of seriesKeys in deleteSeriesRange (#25268 ) Add an RWMutex to allow safe concurrent access in deleteSeriesRange	2024-09-27 16:36:27 -07:00
Geoffrey Wossum	23008e5286	chore: improve error messages and logging during shard opening (#25314 ) * chore: improve error messages and logging during shard opening	2024-09-12 15:11:56 -05:00
Geoffrey Wossum	b4bd607eef	fix: prevent retention service from hanging (#25055 ) * fix: prevent retention service from hanging Fix issue that can cause the retention service to hang waiting on a `Shard.Close` call. When this occurs, no other shards will be deleted by the retention service. This is usually noticed as an increase in disk usage because old shards are not cleaned up. The fix adds to new methods to `Store`, `SetShardNewReadersBlocked` and `InUse`. `InUse` can be used to poll if a shard has active readers, which the retention service uses to skip over in-use shards to prevent the service from hanging. `SetShardNewReadersBlocked` determines if new read access may be granted to a shard. This is required to prevent race conditions around the use of `InUse` and the deletion of shards. If the retention service skips over a shard because it is in-use, the shard will be checked again the next time the retention service is run. It can be deleted on subsequent checks if it is no longer in-use. If the shards is stuck in-use, the retention service will not be able to delete the shards, which can be observed in the logs for manual intervention. Other shards can still be deleted by the retention service even if a shard is stuck with readers. closes: #25054	2024-06-13 11:07:17 -05:00
davidby-influx	82cbdb5478	fix: ensure TSMBatchKeyIterator and FileStore close all TSMReaders (#24957 ) Do not let errors on closing a TSMReader prevent other closes.	2024-05-06 09:59:30 -07:00
davidby-influx	fe6c64b21e	fix: return and respect cursor errors (#24791 ) ArrayCursors were ignoring errors, which led to panics when nil cursors were operated on. This fix passes errors back up the stack and uses them to enforce healthy cursor creation. Closes https://github.com/influxdata/influxdb/issues/24789 --------- Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>	2024-03-25 17:22:33 -07:00
davidby-influx	2dc3dcb3d1	fix: do not escape CSV output (#24311 ) CSV output is incorrectly escaped. Add a boolean flag to tag output functions to prevent this. closes https://github.com/influxdata/influxdb/issues/24309	2023-06-29 12:00:41 -07:00
Brandon Pfeifer	e484c4d871	chore: upgrade Go to v1.19.3 (1.x) (#23941 ) * chore: upgrade Go to 1.19.3 This re-runs ./generate.sh and ./checkfmt.sh to format and update source code (this is primarily responsible for the huge diff.) * fix: update tests to reflect sorting algorithm change	2022-11-28 12:15:47 -05:00
Sam Arnold	9e9f1be574	fix: remove dead iterator (#23888 )	2022-11-09 16:24:01 -05:00
davidby-influx	80c10c8c04	feat: optimize saving changes to fields.idx (#23701 ) Instead of writing out the complete fields.idx file when it changes, write out incremental changes that will be applied to the file on close and startup. closes https://github.com/influxdata/influxdb/issues/23653	2022-09-14 13:14:09 -07:00
davidby-influx	ec412f793b	fix: do not rename files on mmap failure (#23396 ) If NewTSMReader() fails because mmap fails, do not rename the file, because the error is probably caused by vm.max_map_count being too low closes https://github.com/influxdata/influxdb/issues/23172	2022-06-07 08:37:00 -07:00
Dane Strandboge	0574163566	build: upgrade to go1.18 (#23250 )	2022-03-31 16:17:57 -05:00
davidby-influx	d9b9e86db9	fix: extend snapshot copy to filesystems that cannot link (#22703 ) If os.Link fails with syscall.ENOTSUP, then the file system does not support links, and we must make copies to snapshot files for backup. We also automatically make copies instead of link on Windows, because although it makes links, their semantics are different from Linux. closes https://github.com/influxdata/influxdb/issues/16739	2021-10-21 12:53:26 -07:00
Sam Arnold	611a4370a2	feat: show measurements database and retention policy wildcards (#22388 ) * feat: show measurements database and retention policy wildcards Closes #3318 * chore: run formatter	2021-10-05 09:07:25 -04:00
davidby-influx	3702fe8e76	fix: for Windows, copy snapshot files being backed up (#22551 ) On Windows, make copies of files for snapshots, because Go does not support the FILE_SHARE_DELETE flag which allows files (and links) to be deleted while open. This causes temporary directories to be left behind after backups. closes https://github.com/influxdata/influxdb/issues/16289	2021-09-22 10:56:17 -07:00
davidby-influx	e53f75e06d	fix: discard excessive errors (#22379 ) The tsmBatchKeyIterator discards excessive errors to avoid out-of-memory crashes when compacting very corrupt files. Any error beyond DefaultMaxSavedErrors (100) will be discarded instead of appended to the error slice. closes https://github.com/influxdata/influxdb/issues/22328	2021-09-03 09:11:05 -07:00
Sam Arnold	38de69cc1c	fix: flux error properly read by cloud (#22348 )	2021-08-31 17:43:12 -04:00
Tristan Su	e5f6894037	fix(tsm): check write-ahead-log size (#18991 )	2021-08-24 11:44:01 -04:00
davidby-influx	7d3efe1e9e	fix: avoid compaction queue stats flutter. (#22195 ) When the compaction planner runs, if it cannot acquire a lock on the files it plans to compact, it returns a nil list of compaction groups. This, in turn, sets the engine statistics for compactions queues to zero, which is incorrect. Instead, use the length of pending files which would have been returned. closes https://github.com/influxdata/influxdb/issues/22138	2021-08-16 09:21:07 -07:00
Sam Arnold	b64c2c3dcf	fix: tsi index should compact old or too-large log files (#21943 ) * fix: tsi index should compact old log files that are too large * chore: run automated formatter * chore: update changelog * fix: review comments	2021-07-26 17:40:15 -04:00
davidby-influx	73bdb2860e	chore: add logging to compaction (#21707 ) Compaction logging will generate intermediate information on volume of data written and output files created, as well as improve some of the anti-entropy messages related to compaction. This will also apply to `influx_tools compact` Closes https://github.com/influxdata/influxdb/issues/21704	2021-06-16 15:28:44 -07:00
davidby-influx	bce6553459	fix: Do not close connection twice in DigestWithOptions (#21659 ) tsm1.DigestWithOptions closes its network connection twice. This may cause broken pipe errors on concurrent invocations of the same procedure, by closing a reused i/o descriptor. This fix also captures errors from TSM file closures, which were previously ignored. Closes https://github.com/influxdata/influxdb/issues/21656	2021-06-10 12:41:42 -07:00
davidby-influx	f64be286be	fix: avoid rewriting fields.idx unnecessarily (#21592 ) Under heavy write load creating new fields and measurements the rewrite of the fields.idx file is a bottleneck. This enhancement combines multiple writes into a single one and shares any error return value with all of the combined invocations. MeasurementFieldSet and the new MeasurementFieldSetWriter must both now be explicitly closed. Closes #21577	2021-06-04 09:21:33 -07:00
davidby-influx	c8da9bafbf	chore(ae): add more logging (#21381 ) (#21452 ) tsdb.Engine.IsIdle and tsdb.Engine.Digest now return a reason string for why the engine & shard are not idle. Callers can then use this string for logging, if desired. The returned reason does not allocate memory, so the caller may want to add the shard ID and path for more information in the log. This is intended to be used in calls from the anti-entropy service in Enterprise. (cherry picked from commit `bf45841359`) fixes https://github.com/influxdata/influxdb/issues/21448	2021-05-11 09:46:45 -07:00
Sam Arnold	8edf7a4e2f	fix(storage): cursor requests are [start, stop] instead of [start, stop) (#21347 ) * fix: backport tsdb fix for window pushdowns From https://github.com/influxdata/influxdb/pull/19855 * fix(storage): cursor requests are [start, stop] instead of [start, stop) The cursors were previously [start, stop) to be consistent with how flux requests data, but the underlying storage file store was [start, stop] because that's how influxql read data. This reverts back the cursor behavior so that it is now [start, stop] everywhere and the conversion from [start, stop) to [start, stop] is performed when doing the cursor request to get the next cursor. cherry-pick from #21318 Co-authored-by: Sam Arnold <sarnold@influxdata.com> (cherry picked from commit `7766672797`) * chore: fix formatting Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>	2021-04-30 15:26:31 -04:00
davidby-influx	7f300dc248	fix: Anti-Entropy loops endlessly with empty shard (#21275 ) The anti-entropy service will loop trying to copy an empty shard to a data node missing that shard. This fix is one of two changes that correctly create an empty shard on a new node. This fix will set the LastModified date of an empty shard directory to the modification time of that directory, instead of to the Unix epoch. Fixes: https://github.com/influxdata/influxdb/issues/21273	2021-04-23 09:06:03 -07:00
Daniel Moran	333cff1b15	fix(tsdb): exclude the stop time from the array cursor (#21139 ) This is a backport of #14262 to the 1.x storage engine. This also ports the table tests that existed with the pre-beta version of the storage engine to the one that is now used in the production version. A few of the tests are skipped. These are portions of the storage engine that have not been ported over. They should be unskipped when that functionality is ported over. Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>	2021-04-06 14:50:07 -04:00
Daniel Moran	3eb4fdaf33	fix(tsm1): fix data race when accessing tombstone stats (#20903 )	2021-03-09 15:20:40 -05:00
Sam Arnold	de491dab97	refactor: Remove unused function add and unused variable keysHint (#20803 ) (#20805 ) (cherry picked from commit `1068d1de6f`)	2021-02-25 15:15:32 -05:00
Daniel Moran	e85a10248d	fix(tsm1): fix data race and validation in cache ring (#20802 ) Co-authored-by: Yun Zhao <zhaoyun2316@gmail.com> Co-authored-by: Yun Zhao <zhaoyun2316@gmail.com>	2021-02-24 17:17:08 -05:00
Sam Arnold	de1a0eb2a9	feat: use count_hll for 'show series cardinality' queries (#20745 ) Closes: https://github.com/influxdata/influxdb/issues/20614 Also fix nil pointer for seriesKey iterator Fix for bug in: https://github.com/influxdata/influxdb/issues/20543 Also add a test for ingress metrics	2021-02-10 16:00:16 -05:00
Sam Arnold	903b8cd0ea	feat(query): Hyper log log operators in influxql (#20603 ) * feat(query): hyper log log counting in query engine In addition to helping with normal queries, this can improve the 'SHOW CARDINALITY' meta-queries: time influx -database mydb -execute 'select count_hll(sum_hll(_seriesKey)) from big' name: big time count_hll ---- --------- 0 200767781 influx -database mydb -execute 0.06s user 0.12s system 0% cpu 8:49.99 total	2021-02-08 08:38:14 -05:00
Sam Arnold	21823db00b	feat: series creation ingress metrics (#20700 ) After turning this on and testing locally, note the 'seriesCreated' metric "localStore": {"name":"localStore","tags":null,"values":{"pointsWritten":2987,"seriesCreated":58,"valuesWritten":23754}}, "ingress": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"cq","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":4}}, "ingress:1": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"database","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":4}}, "ingress:2": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"httpd","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":46}}, "ingress:3": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"ingress","rp":"monitor"},"values":{"pointsWritten":14,"seriesCreated":14,"valuesWritten":42}}, "ingress:4": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"localStore","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":6}}, "ingress:5": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"queryExecutor","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":10}}, "ingress:6": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"runtime","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":30}}, "ingress:7": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"shard","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":22}}, "ingress:8": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"subscriber","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":6}}, "ingress:9": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_cache","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":18}}, "ingress:10": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_engine","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":58}}, "ingress:11": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_filestore","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":4}}, "ingress:12": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_wal","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":8}}, "ingress:13": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"write","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":18}}, "ingress:14": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"cpu","rp":"autogen"},"values":{"pointsWritten":1342,"seriesCreated":13,"valuesWritten":13420}}, "ingress:15": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"disk","rp":"autogen"},"values":{"pointsWritten":642,"seriesCreated":6,"valuesWritten":4494}}, "ingress:16": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"diskio","rp":"autogen"},"values":{"pointsWritten":214,"seriesCreated":2,"valuesWritten":2354}}, "ingress:17": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"mem","rp":"autogen"},"values":{"pointsWritten":107,"seriesCreated":1,"valuesWritten":963}}, "ingress:18": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"processes","rp":"autogen"},"values":{"pointsWritten":107,"seriesCreated":1,"valuesWritten":856}}, "ingress:19": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"swap","rp":"autogen"},"values":{"pointsWritten":214,"seriesCreated":1,"valuesWritten":642}}, "ingress:20": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"system","rp":"autogen"},"values":{"pointsWritten":321,"seriesCreated":1,"valuesWritten":749}}, Closes: https://github.com/influxdata/influxdb/issues/20613	2021-02-05 14:52:43 -04:00
Sam Arnold	eb92c997cd	feat: Ingress metrics by measurement Partial implementation of https://github.com/influxdata/influxdb/issues/20612 Implements per-measurement points written metric. Next step: Also support per-login.	2021-02-02 15:58:28 -05:00
Sam Arnold	6795ec6c01	refactor: do not use context value anti-pattern Extending the context instead of fixing the API breaks type safety. For tracking the number of points / values written, it is much clearer to pass an explicit tracker.	2021-02-01 14:34:11 -05:00
Sam Arnold	8a16bf0531	chore: run goimports -w ./	2021-01-29 11:40:02 -05:00

1 2 3 4 5 ...

1464 Commits (db/wait-timeout-utility)