influxdb

Commit Graph

Author	SHA1	Message	Date
Geoffrey Wossum	60e49d854c	chore: replace uses of %v with %w (#25358 ) Replace uses of `%v` with `%w` where appropriate in file_store.go Closes: #25357	2024-09-25 15:12:31 -05:00
Geoffrey Wossum	5aff511e40	fix: do not rename files on mmap failure (#25340 ) If NewTSMReader() fails because mmap fails, do not rename the file, because the error is probably caused by vm.max_map_count being too low Closes: #25337 (cherry picked from commit `ec412f793b`)	2024-09-17 12:48:21 -05:00
Geoffrey Wossum	da9615fdc3	chore: improve error messages and logging during shard opening (#25331 ) Ported from master-1.x. (cherry picked from commit `23008e5286`) Closes: #25328	2024-09-13 16:59:17 -05:00
Geoffrey Wossum	cb8cfe3510	fix: prevent retention service from hanging (#25077 ) * fix: prevent retention service from hanging (#25055) Fix issue that can cause the retention service to hang waiting on a `Shard.Close` call. When this occurs, no other shards will be deleted by the retention service. This is usually noticed as an increase in disk usage because old shards are not cleaned up. The fix adds to new methods to `Store`, `SetShardNewReadersBlocked` and `InUse`. `InUse` can be used to poll if a shard has active readers, which the retention service uses to skip over in-use shards to prevent the service from hanging. `SetShardNewReadersBlocked` determines if new read access may be granted to a shard. This is required to prevent race conditions around the use of `InUse` and the deletion of shards. If the retention service skips over a shard because it is in-use, the shard will be checked again the next time the retention service is run. It can be deleted on subsequent checks if it is no longer in-use. If the shards is stuck in-use, the retention service will not be able to delete the shards, which can be observed in the logs for manual intervention. Other shards can still be deleted by the retention service even if a shard is stuck with readers. This is a port of ad68ec8 from master-1.x to main-2.x. closes: #25076 (cherry picked from commit `b4bd607eef`)	2024-06-24 12:27:22 -05:00
davidby-influx	0a4d41bc90	fix: ensure TSMBatchKeyIterator and FileStore close all TSMReaders (#24957 ) (#24964 ) Do not let errors on closing a TSMReader prevent other closes. (cherry picked from commit `82cbdb5478`) closes https://github.com/influxdata/influxdb/issues/24961	2024-05-06 10:45:41 -07:00
Sam Arnold	b970e359dc	feat: remaining storage metrics from OSS engine (#22938 ) * fix: simplify disk size tracking * refactor: EngineTags in tsdb package * fix: fewer compaction buckets and dead code removal * feat: shard metrics * chore: formatting * feat: tsdb store metrics * feat: retention check metrics * chore: fix go vet * fix: review comments	2021-12-02 09:01:46 -05:00
Sam Arnold	edb21abe91	feat: metrics for wal subsystem (#22918 ) https://github.com/influxdata/influxdb/issues/20026	2021-11-23 12:17:52 -05:00
Sam Arnold	feb459c785	feat: metrics for cache subsystem (#22915 ) * fix: drop complicated cache metrics and document remaining * feat: metrics for cache	2021-11-23 10:11:22 -05:00
Sam Arnold	a74e05177c	feat: disk size metrics per shard (#22912 )	2021-11-22 16:53:55 -05:00
davidby-influx	5e6b0d539b	fix: extend snapshot copy to filesystems that cannot link (#22703 ) (#22732 ) If os.Link fails with syscall.ENOTSUP, then the file system does not support links, and we must make copies to snapshot files for backup. We also automatically make copies instead of link on Windows, because although it makes links, their semantics are different from Linux. closes https://github.com/influxdata/influxdb/issues/16739 (cherry picked from commit `d9b9e86db9`) closes https://github.com/influxdata/influxdb/issues/22701	2021-10-22 08:59:41 -07:00
Dane Strandboge	ca992e9fff	chore: use io/os over ioutil (#22656 )	2021-10-12 16:55:07 -05:00
davidby-influx	47007f6988	fix: for Windows, copy snapshot files being backed up (#22551 ) (#22562 ) On Windows, make copies of files for snapshots, because Go does not support the FILE_SHARE_DELETE flag which allows files (and links) to be deleted while open. This causes temporary directories to be left behind after backups. closes https://github.com/influxdata/influxdb/issues/16289 (cherry picked from commit `3702fe8e76`) closes https://github.com/influxdata/influxdb/issues/22557	2021-09-22 13:06:28 -07:00
Daniel Moran	d747e7ec4e	feat: add config parameters to toggle WAL concurrency and timeouts (#21621 ) * feat: add context parameter to Take() method on fixed limiter * refactor: plumb context through to uses of Take() * test: update tests to pass context as needed * feat: add config toggles for setting WAL write concurrency & timeout	2021-06-09 11:03:53 -04:00
Daniel Moran	fc3beb7d0a	fix: last-modified of empty shard directory shouldn't be Unix epoch. (#21481 ) Co-authored-by: davidby-influx <72418212+davidby-influx@users.noreply.github.com>	2021-05-17 13:36:36 -04:00
Daniel Moran	efd766d60f	fix(tsm1): fix data race when accessing tombstone stats (#20773 )	2021-02-18 20:23:57 -05:00
Stuart Carnie	dee8977d2c	chore: move v2/v1/tsdb → v2/tsdb	2020-08-26 10:46:47 -07:00
Mark Rushakoff	f2898d1992	Wipe out workspace in preparation for v2 merge "Knock knock." "Who's there?" "InfluxDB Veet." ...	2019-01-11 10:38:50 -08:00
Ben Johnson	844b7ef9bf	Merge pull request #10299 from influxdata/bj-tsm1-panic-fix Fix TSM1 panic on reader error.	2018-10-10 08:12:17 -06:00
Edd Robinson	d649d5928b	Cleanup failed TSM snapshot If there was an error after the cache has been snapshotted to one or more TSM files, but before the cache and WAL are cleaned up, then the cache would be repeatedly snapshotted, generated duplicate level 1 TSM files. This commit attempts to clean those files up by removing the temporary TSM file(s). The snapshot will be retried.	2018-10-03 16:34:54 +01:00
Ben Johnson	da2dfa495e	Fix TSM1 panic on reader error. This commit fixes an error check so that a `nil` TSM reader does not cause a panic.	2018-09-24 08:54:28 -06:00
Edd Robinson	996bb9bfa6	Wire in mmap advise hint to TSMReader	2018-08-03 16:27:39 +01:00
Stuart Carnie	3632df77a6	feat(tsm1): Add Read<type>ArrayBlock APIs to FileStore * introduced tmpl from Arrow, which allows existing templates to be reused with additional command-line properties to control output. * duplicated suite of ReadFloatBlock tests for ReadFloatArrayBlock * only the float data type is tested as the Read APIs are generated from a single template.	2018-07-16 08:55:37 -07:00
Stuart Carnie	790639d728	feat(tsm1): Add Read<Type>ArrayBlock APIs to TSMReader and mmapAccessor	2018-07-16 08:55:37 -07:00
Jeff Wendling	e6aec771b0	fix(tsdb): attempt to work on docker on windows multiple users have attempted to run influxdb in a docker container with a windows host and a volume mounted from windows. that causes problems because it apparently uses samba/cifs which does not support fsync on directories. this patchset will, if it receives an EINVAL on directory fsync, as is what appears to happen on samba/cifs, then it will ignore it. this should help. fixes #9833. fixes #9630.	2018-06-01 14:57:18 -06:00
Jacob Marble	9a7b652a1c	TSM: OpenLimiter must not be nil	2018-05-31 13:43:16 -07:00
Ben Johnson	cec2a2d988	Merge pull request #9918 from influxdata/bj-tsm-open-limiter TSM1 Open Limiter	2018-05-30 13:13:14 -06:00
Jacob Marble	bb313765e4	tsdb/tsm1: Clean up TSM filename format/parse	2018-05-29 09:57:48 -07:00
Ben Johnson	d3e3b05a49	Add tsm1 open limiter This commit restricts the number of TSM1 files that can be opened concurrently across the entire `tsdb.Store`. There is currently a limit for the number of shards that can be opened concurrently, however, this limit does not help when the number of CPU cores is higher than the number of shards. Because TSM1 files have a 2GB limit and there is no limit on the number of files per shard, extremely large shards (1TB+) can load 1,000s of files simultaneously.	2018-05-29 10:21:53 -06:00
Jeff Wendling	ce565965a4	tsdb: avoid nil checks on the observer this avoids nil panics in the case that someone eventually forgets.	2018-05-23 13:15:41 -06:00
Jeff Wendling	8ad515b387	tsdb: remove the shard id again callers can always ensure that the observer set on the engine options is appropriate for that shard id. this simplifies the api and reduces the chance of bugs due to mixing up shard ids.	2018-05-23 13:04:54 -06:00
Jeff Wendling	15ae0bd98d	tsdb: observe tombstone files as well	2018-05-22 22:07:16 -06:00
Jeff Wendling	eb4bf651e5	tsdb: add shard number to the observer an observer may want to know what shard the file is part of. this way, they don't have to rely on brittle file path parsing.	2018-05-18 18:15:44 -06:00
Jeff Wendling	6320316fd4	Merge pull request #9852 from influxdata/jmw-tsm-notifications file store: send notifications about new/deleted tsm files.	2018-05-18 11:29:34 -06:00
Jeff Wendling	27040d6f31	file store: send notifications about new/deleted tsm files. just adds some interface for hooks about when these files come and go. we do them before the action is taken so that if the hook has an error, it doesn't have any consistency problems.	2018-05-17 12:19:58 -06:00
Jacob Marble	c119f9a846	Close TSMReaders from FileStore.Close after releasing FileStore mutex	2018-05-17 09:12:36 -07:00
Jeff Wendling	1a8931af42	Merge pull request #9841 from influxdata/jmw-ensure-no-race-conditions tsm1: ensure some race conditions are impossible	2018-05-16 11:56:10 -06:00
Jeff Wendling	7d2bb19b74	tsm1: ensure some race conditions are impossible The InUse call on TSMFiles is inherently racy in the presence of Ref calls outside of the file store mutex. In addition, we return some TSMFiles to callers without them being Ref'd which might allow them to be closed from underneath. While I believe it is the case that it would be impossible, as the only thing that gets a handle externally is compaction, and compaction enforces that only one handle exists at a time, and thus is only deleted once after the compaction is done with it, it's not very obvious or enforced. Instead, always return a TSMFile with a Ref call under the read lock, and require that no one else calls Ref. That way, it cannot transition to referenced if the InUse call returns false under the write lock. The CreateSnapshot method was racy in a number of ways in the presence of multiple calls or compactions: it did not take references to the TSMFiles, and the temporary directory it creates could have been shared with concurrent CreateSnapshot calls. In addition, the files slice could have been concurrently mutated during a compaction as well. Instead, under the write lock, make a local copy of the state for the compaction, including Ref calls (write locks are implicitly read locks). Then, there is no need for a lock at all afterward. Add some comments to explain these issues at the call sites of InUse, and document that the Files method that returns the slice unprotected is only for tests.	2018-05-14 19:45:42 -06:00
Ben Johnson	35a64dee99	Inject tsm file naming.	2018-05-14 10:46:38 -06:00
Jason Wilder	ec3f5c353c	Fix panic in FileStore.walkKeys If a TSM file is replaced while walkKeys is running, a panic could occur because the mmap has been unmapped.	2018-04-30 17:26:23 -06:00
Ben Johnson	f459d87325	Merge pull request #9785 from influxdata/rename-bad-tsm-file Rename & log corrupt tsm files on load	2018-04-30 15:37:45 -06:00
Jacob Marble	7de2dcd3d9	TSM: TSMReader.Close blocks until reads complete	2018-04-30 13:46:03 -07:00
Stuart Carnie	e0ae9c5a2d	tsm1: Replace goroutine `merge` with k-way merge Previously replaced WalkKeys implementation for a considerable improvement to startup time	2018-04-30 07:57:55 -07:00
Ben Johnson	108fa09439	Rename corrupt tsm files on load.	2018-04-27 14:27:44 -06:00
Jonathan A. Sternberg	d38413a849	Merge pull request #9454 from influxdata/js-structured-logging Update logging calls to take advantage of structured logging	2018-02-21 09:14:40 -06:00
Jason Wilder	f7279b57f3	Re-open last WAL segment Re-open the last wal segment instead of creating a new one. This fixes an issue where the last modified time of the WAL would change on restart. It also avoids a lot of IO file churn on restart.	2018-02-20 14:24:04 -07:00
Jonathan A. Sternberg	2bbd96768d	Update logging calls to take advantage of structured logging Includes a style guide that details the basics of how to log.	2018-02-20 10:04:19 -06:00
Edd Robinson	90903fa6ed	Remove unused code/cleanup engine package	2018-01-20 13:56:45 +00:00
Adam	af2918a193	fix file_store path bug that affects windows users (#9219 )	2017-12-11 17:31:33 -05:00
Edd Robinson	12a2ff7fac	Add support for TSI shard streaming and shard size This commit firstly ensures that a shard's size on disk is accurately reported when using the tsi1 index, by including the on-disk size of the tsi1 index in the calculation. Secondly, this commit add support for shard streaming/copying when using the tsi1 index. Prior to this, a tsi1 index would not be correctly restored when streaming shards.	2017-11-28 15:57:02 +00:00
Stuart Carnie	e1ec331048	improve startup performance * replaces coordinating goroutines for single k-way heap merge iterator * removes contention sending keys across buffered channels startup time from 46s -> 28s for iterating 1MM keys across 14 shards	2017-11-27 12:44:58 -07:00

1 2 3 4

173 Commits (db/update-cross-builder)