influxdb

Commit Graph

Author	SHA1	Message	Date
davidby-influx	2ab5aad52e	chore: add logging to Filestore.purger (#26089 ) Also fixes error type checks in TestCompactor_CompactFull_InProgress	2025-03-05 11:46:07 -08:00
Geoffrey Wossum	23008e5286	chore: improve error messages and logging during shard opening (#25314 ) * chore: improve error messages and logging during shard opening	2024-09-12 15:11:56 -05:00
Geoffrey Wossum	b4bd607eef	fix: prevent retention service from hanging (#25055 ) * fix: prevent retention service from hanging Fix issue that can cause the retention service to hang waiting on a `Shard.Close` call. When this occurs, no other shards will be deleted by the retention service. This is usually noticed as an increase in disk usage because old shards are not cleaned up. The fix adds to new methods to `Store`, `SetShardNewReadersBlocked` and `InUse`. `InUse` can be used to poll if a shard has active readers, which the retention service uses to skip over in-use shards to prevent the service from hanging. `SetShardNewReadersBlocked` determines if new read access may be granted to a shard. This is required to prevent race conditions around the use of `InUse` and the deletion of shards. If the retention service skips over a shard because it is in-use, the shard will be checked again the next time the retention service is run. It can be deleted on subsequent checks if it is no longer in-use. If the shards is stuck in-use, the retention service will not be able to delete the shards, which can be observed in the logs for manual intervention. Other shards can still be deleted by the retention service even if a shard is stuck with readers. closes: #25054	2024-06-13 11:07:17 -05:00
davidby-influx	82cbdb5478	fix: ensure TSMBatchKeyIterator and FileStore close all TSMReaders (#24957 ) Do not let errors on closing a TSMReader prevent other closes.	2024-05-06 09:59:30 -07:00
davidby-influx	ec412f793b	fix: do not rename files on mmap failure (#23396 ) If NewTSMReader() fails because mmap fails, do not rename the file, because the error is probably caused by vm.max_map_count being too low closes https://github.com/influxdata/influxdb/issues/23172	2022-06-07 08:37:00 -07:00
Dane Strandboge	0574163566	build: upgrade to go1.18 (#23250 )	2022-03-31 16:17:57 -05:00
davidby-influx	d9b9e86db9	fix: extend snapshot copy to filesystems that cannot link (#22703 ) If os.Link fails with syscall.ENOTSUP, then the file system does not support links, and we must make copies to snapshot files for backup. We also automatically make copies instead of link on Windows, because although it makes links, their semantics are different from Linux. closes https://github.com/influxdata/influxdb/issues/16739	2021-10-21 12:53:26 -07:00
davidby-influx	3702fe8e76	fix: for Windows, copy snapshot files being backed up (#22551 ) On Windows, make copies of files for snapshots, because Go does not support the FILE_SHARE_DELETE flag which allows files (and links) to be deleted while open. This causes temporary directories to be left behind after backups. closes https://github.com/influxdata/influxdb/issues/16289	2021-09-22 10:56:17 -07:00
davidby-influx	7f300dc248	fix: Anti-Entropy loops endlessly with empty shard (#21275 ) The anti-entropy service will loop trying to copy an empty shard to a data node missing that shard. This fix is one of two changes that correctly create an empty shard on a new node. This fix will set the LastModified date of an empty shard directory to the modification time of that directory, instead of to the Unix epoch. Fixes: https://github.com/influxdata/influxdb/issues/21273	2021-04-23 09:06:03 -07:00
Daniel Moran	3eb4fdaf33	fix(tsm1): fix data race when accessing tombstone stats (#20903 )	2021-03-09 15:20:40 -05:00
Ayan George	a9d02e7ab7	fix: Handle snapshot related errors (#18710 ) When applied this patch will: * log snapshot directory removal errors Prior to this patch, errors when removing temporary snapshot directories happens silently. This patch ensures that errors are logged when os.RemoveAll() fails. * refactor tsm1: Declare error value in condition Save a line of code and limits the scope of an error value. * refactor tsm1: Add MakeSnapshotLinks() This commit adds (*FileStore).MakeSnapshotLinks(). The code in this function was originally part of CreateSnapshot(). That code was hoisted out and into MakeSnapshotLinks() becuase there are two points of failure that require cleanup -- we have to delete a temporary directory on failure. Placing the code in one function allows us to check its returned error value and perform cleanup in only once place. In short, we hoisted code out of CreateSnapshot() to simplify error handling. On error, we remove any directories we created.	2020-06-25 10:05:04 -04:00
Ayan George	a0f2e0c21a	fix(tsm1): Fix temp directory search bug (#17685 ) * fix: verify precision parameter in write requests This change updates the HTTP endpoints that service v1 and v2 writes to verify the values passed in the precision parameter. * fix(tsm1): Fix temp directory search bug The original code's intention is to scan a directory for the directory with the higest value when converted to an integer. So directories may be in the form: 0.tmp 1.tmp 2.tmp 30.tmp ... 100.tmp The loop should scan the directory, strip the basename and extension from the file name to leave just a number, then store the higest number it finds. Before this patch, there is a bug that has the code only store the higest value if there is an error converting the numeric value into an integer. This patch primarily fixes that logic. In addition, this patch will save an indent level by inverting logic in two places: Instead if checkig if a file is a directory and has a suffix of ".tmp", it is probably better to test if a file is NOT a directory OR does NOT have an extension of ".tmp" then continue. Also, instead of testig if len(ss) == 2, we can test if len(ss) != 2 and continue if so. Both of these save an indent level and keeps our "happy path" to the left. Finally, this patch will use string concatination instead of calling fmt.Sprintf() to add periods to "tmp" and "tsm" extension. Co-authored-by: David Norton <dgnorton@gmail.com>	2020-04-15 10:29:46 -04:00
elbehery	042128b948	fix(tsdb): Replace panic with error while de/encoding corrupt data fixes #17440 While encoding or decoding corrupt data, the current behaviour is to `panic`. This commit replaces the `panic` with `error` to be propagated up to the calling `iterator`. To avoid overwriting other `error`, iterators now wraps a `TSMErrors` which contains ALL the encountered errors. TSMErrors itself implements `Error()`, the returned string contains all the error msgs, separated by "," delimiter.	2020-04-01 20:51:11 +02:00
elbehery	a4bb1083f2	fix(storage): Renaming corrupt data files fails fixes#14107	2019-10-28 17:32:58 +01:00
Ben Wells	e9bada090f	Fix misspelling identified by misspell	2019-02-03 20:27:43 +00:00
Ben Johnson	844b7ef9bf	Merge pull request #10299 from influxdata/bj-tsm1-panic-fix Fix TSM1 panic on reader error.	2018-10-10 08:12:17 -06:00
Edd Robinson	d649d5928b	Cleanup failed TSM snapshot If there was an error after the cache has been snapshotted to one or more TSM files, but before the cache and WAL are cleaned up, then the cache would be repeatedly snapshotted, generated duplicate level 1 TSM files. This commit attempts to clean those files up by removing the temporary TSM file(s). The snapshot will be retried.	2018-10-03 16:34:54 +01:00
Ben Johnson	da2dfa495e	Fix TSM1 panic on reader error. This commit fixes an error check so that a `nil` TSM reader does not cause a panic.	2018-09-24 08:54:28 -06:00
Edd Robinson	996bb9bfa6	Wire in mmap advise hint to TSMReader	2018-08-03 16:27:39 +01:00
Stuart Carnie	3632df77a6	feat(tsm1): Add Read<type>ArrayBlock APIs to FileStore * introduced tmpl from Arrow, which allows existing templates to be reused with additional command-line properties to control output. * duplicated suite of ReadFloatBlock tests for ReadFloatArrayBlock * only the float data type is tested as the Read APIs are generated from a single template.	2018-07-16 08:55:37 -07:00
Stuart Carnie	790639d728	feat(tsm1): Add Read<Type>ArrayBlock APIs to TSMReader and mmapAccessor	2018-07-16 08:55:37 -07:00
Jeff Wendling	e6aec771b0	fix(tsdb): attempt to work on docker on windows multiple users have attempted to run influxdb in a docker container with a windows host and a volume mounted from windows. that causes problems because it apparently uses samba/cifs which does not support fsync on directories. this patchset will, if it receives an EINVAL on directory fsync, as is what appears to happen on samba/cifs, then it will ignore it. this should help. fixes #9833. fixes #9630.	2018-06-01 14:57:18 -06:00
Jacob Marble	9a7b652a1c	TSM: OpenLimiter must not be nil	2018-05-31 13:43:16 -07:00
Ben Johnson	cec2a2d988	Merge pull request #9918 from influxdata/bj-tsm-open-limiter TSM1 Open Limiter	2018-05-30 13:13:14 -06:00
Jacob Marble	bb313765e4	tsdb/tsm1: Clean up TSM filename format/parse	2018-05-29 09:57:48 -07:00
Ben Johnson	d3e3b05a49	Add tsm1 open limiter This commit restricts the number of TSM1 files that can be opened concurrently across the entire `tsdb.Store`. There is currently a limit for the number of shards that can be opened concurrently, however, this limit does not help when the number of CPU cores is higher than the number of shards. Because TSM1 files have a 2GB limit and there is no limit on the number of files per shard, extremely large shards (1TB+) can load 1,000s of files simultaneously.	2018-05-29 10:21:53 -06:00
Jeff Wendling	ce565965a4	tsdb: avoid nil checks on the observer this avoids nil panics in the case that someone eventually forgets.	2018-05-23 13:15:41 -06:00
Jeff Wendling	8ad515b387	tsdb: remove the shard id again callers can always ensure that the observer set on the engine options is appropriate for that shard id. this simplifies the api and reduces the chance of bugs due to mixing up shard ids.	2018-05-23 13:04:54 -06:00
Jeff Wendling	15ae0bd98d	tsdb: observe tombstone files as well	2018-05-22 22:07:16 -06:00
Jeff Wendling	eb4bf651e5	tsdb: add shard number to the observer an observer may want to know what shard the file is part of. this way, they don't have to rely on brittle file path parsing.	2018-05-18 18:15:44 -06:00
Jeff Wendling	6320316fd4	Merge pull request #9852 from influxdata/jmw-tsm-notifications file store: send notifications about new/deleted tsm files.	2018-05-18 11:29:34 -06:00
Jeff Wendling	27040d6f31	file store: send notifications about new/deleted tsm files. just adds some interface for hooks about when these files come and go. we do them before the action is taken so that if the hook has an error, it doesn't have any consistency problems.	2018-05-17 12:19:58 -06:00
Jacob Marble	c119f9a846	Close TSMReaders from FileStore.Close after releasing FileStore mutex	2018-05-17 09:12:36 -07:00
Jeff Wendling	1a8931af42	Merge pull request #9841 from influxdata/jmw-ensure-no-race-conditions tsm1: ensure some race conditions are impossible	2018-05-16 11:56:10 -06:00
Jeff Wendling	7d2bb19b74	tsm1: ensure some race conditions are impossible The InUse call on TSMFiles is inherently racy in the presence of Ref calls outside of the file store mutex. In addition, we return some TSMFiles to callers without them being Ref'd which might allow them to be closed from underneath. While I believe it is the case that it would be impossible, as the only thing that gets a handle externally is compaction, and compaction enforces that only one handle exists at a time, and thus is only deleted once after the compaction is done with it, it's not very obvious or enforced. Instead, always return a TSMFile with a Ref call under the read lock, and require that no one else calls Ref. That way, it cannot transition to referenced if the InUse call returns false under the write lock. The CreateSnapshot method was racy in a number of ways in the presence of multiple calls or compactions: it did not take references to the TSMFiles, and the temporary directory it creates could have been shared with concurrent CreateSnapshot calls. In addition, the files slice could have been concurrently mutated during a compaction as well. Instead, under the write lock, make a local copy of the state for the compaction, including Ref calls (write locks are implicitly read locks). Then, there is no need for a lock at all afterward. Add some comments to explain these issues at the call sites of InUse, and document that the Files method that returns the slice unprotected is only for tests.	2018-05-14 19:45:42 -06:00
Ben Johnson	35a64dee99	Inject tsm file naming.	2018-05-14 10:46:38 -06:00
Jason Wilder	ec3f5c353c	Fix panic in FileStore.walkKeys If a TSM file is replaced while walkKeys is running, a panic could occur because the mmap has been unmapped.	2018-04-30 17:26:23 -06:00
Ben Johnson	f459d87325	Merge pull request #9785 from influxdata/rename-bad-tsm-file Rename & log corrupt tsm files on load	2018-04-30 15:37:45 -06:00
Jacob Marble	7de2dcd3d9	TSM: TSMReader.Close blocks until reads complete	2018-04-30 13:46:03 -07:00
Stuart Carnie	e0ae9c5a2d	tsm1: Replace goroutine `merge` with k-way merge Previously replaced WalkKeys implementation for a considerable improvement to startup time	2018-04-30 07:57:55 -07:00
Ben Johnson	108fa09439	Rename corrupt tsm files on load.	2018-04-27 14:27:44 -06:00
Jonathan A. Sternberg	d38413a849	Merge pull request #9454 from influxdata/js-structured-logging Update logging calls to take advantage of structured logging	2018-02-21 09:14:40 -06:00
Jason Wilder	f7279b57f3	Re-open last WAL segment Re-open the last wal segment instead of creating a new one. This fixes an issue where the last modified time of the WAL would change on restart. It also avoids a lot of IO file churn on restart.	2018-02-20 14:24:04 -07:00
Jonathan A. Sternberg	2bbd96768d	Update logging calls to take advantage of structured logging Includes a style guide that details the basics of how to log.	2018-02-20 10:04:19 -06:00
Edd Robinson	90903fa6ed	Remove unused code/cleanup engine package	2018-01-20 13:56:45 +00:00
Adam	af2918a193	fix file_store path bug that affects windows users (#9219 )	2017-12-11 17:31:33 -05:00
Edd Robinson	12a2ff7fac	Add support for TSI shard streaming and shard size This commit firstly ensures that a shard's size on disk is accurately reported when using the tsi1 index, by including the on-disk size of the tsi1 index in the calculation. Secondly, this commit add support for shard streaming/copying when using the tsi1 index. Prior to this, a tsi1 index would not be correctly restored when streaming shards.	2017-11-28 15:57:02 +00:00
Stuart Carnie	e1ec331048	improve startup performance * replaces coordinating goroutines for single k-way heap merge iterator * removes contention sending keys across buffered channels startup time from 46s -> 28s for iterating 1MM keys across 14 shards	2017-11-27 12:44:58 -07:00
Jason Wilder	02dbe6dbd3	Fix KeyCursor not return remaing blocks If the first block that needs to be read was partially deleted such that the trailing end has no values, it was possible for the query cursor end early. This was caused by the KeyCursor.ReadFloatBlock returning no values instead of checking the remaing blocks.	2017-11-16 15:23:34 -07:00
Jason Wilder	80cd5e63af	Optimize DeleteSeriesRange This removes more allocations and speeds up some critical sections.	2017-11-13 09:02:10 -07:00

1 2 3 4

171 Commits (db/wait-timeout-utility)