influxdb

Commit Graph

Author	SHA1	Message	Date
Geoffrey Wossum	96bade409e	feat: add option to flush WAL on shutdown (#25444 ) * feat: add option to flush WAL on shutdown Add `--storage-wal-flush-on-shutdown` to flush WAL on database shutdown. On successful shutdown, all WAL data will be committed to TSM files and the WAL directories will not contain any .wal files. Closes: #25422	2024-10-10 15:27:54 -05:00
Geoffrey Wossum	da9615fdc3	chore: improve error messages and logging during shard opening (#25331 ) Ported from master-1.x. (cherry picked from commit `23008e5286`) Closes: #25328	2024-09-13 16:59:17 -05:00
Phil Bracikowski	5d801119c5	feat(tsm1/wal): encapsulate expiring WAL files in FileDisposer (#24611 ) * feat(tsm1/wal): encapsulate expiring WAL files in FileDisposer This changeset introduces an interface extension point named FileDisposer to control what to do with WAL files when they are no longer needed. Currently, the only implementation is to delete the file which is the existing behavior. * chore: accumulate errors Since we're here, capture the previously ignored fs errors and pass up a combined error (which the only callers log out).	2024-01-31 12:46:46 -08:00
Eng Zer Jun	903d30d658	test: use `T.TempDir` to create temporary test directory (#23258 ) * test: use `T.TempDir` to create temporary test directory This commit replaces `os.MkdirTemp` with `t.TempDir` in tests. The directory created by `t.TempDir` is automatically removed when the test and all its subtests complete. Prior to this commit, temporary directory created using `os.MkdirTemp` needs to be removed manually by calling `os.RemoveAll`, which is omitted in some tests. The error handling boilerplate e.g. defer func() { if err := os.RemoveAll(dir); err != nil { t.Fatal(err) } } is also tedious, but `t.TempDir` handles this for us nicely. Reference: https://pkg.go.dev/testing#T.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix failing TestSendWrite on Windows === FAIL: replications/internal TestSendWrite (0.29s) logger.go:130: 2022-06-23T13:00:54.290Z DEBUG Created new durable queue for replication stream {"id": "0000000000000001", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestSendWrite1627281409\\001\\replicationq\\0000000000000001"} logger.go:130: 2022-06-23T13:00:54.457Z ERROR Error in replication stream {"replication_id": "0000000000000001", "error": "remote timeout", "retries": 1} testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestSendWrite1627281409\001\replicationq\0000000000000001\1: The process cannot access the file because it is being used by another process. Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix failing TestStore_BadShard on Windows === FAIL: tsdb TestStore_BadShard (0.09s) logger.go:130: 2022-06-23T12:18:21.827Z INFO Using data dir {"service": "store", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestStore_BadShard1363295568\\001"} logger.go:130: 2022-06-23T12:18:21.827Z INFO Compaction settings {"service": "store", "max_concurrent_compactions": 2, "throughput_bytes_per_second": 50331648, "throughput_bytes_per_second_burst": 50331648} logger.go:130: 2022-06-23T12:18:21.828Z INFO Open store (start) {"service": "store", "op_name": "tsdb_open", "op_event": "start"} logger.go:130: 2022-06-23T12:18:21.828Z INFO Open store (end) {"service": "store", "op_name": "tsdb_open", "op_event": "end", "op_elapsed": "77.3µs"} testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestStore_BadShard1363295568\002\data\db0\rp0\1\index\0\L0-00000001.tsl: The process cannot access the file because it is being used by another process. Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix failing TestPartition_PrependLogFile_Write_Fail and TestPartition_Compact_Write_Fail on Windows === FAIL: tsdb/index/tsi1 TestPartition_PrependLogFile_Write_Fail/write_MANIFEST (0.06s) testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestPartition_PrependLogFile_Write_Failwrite_MANIFEST656030081\002\0\L0-00000003.tsl: The process cannot access the file because it is being used by another process. --- FAIL: TestPartition_PrependLogFile_Write_Fail/write_MANIFEST (0.06s) === FAIL: tsdb/index/tsi1 TestPartition_Compact_Write_Fail/write_MANIFEST (0.08s) testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestPartition_Compact_Write_Failwrite_MANIFEST3398667527\002\0\L0-00000003.tsl: The process cannot access the file because it is being used by another process. --- FAIL: TestPartition_Compact_Write_Fail/write_MANIFEST (0.08s) We must close the open file descriptor otherwise the temporary file cannot be cleaned up on Windows. Fixes: `619eb1cae6` ("fix: restore in-memory Manifest on write error") Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix failing TestReplicationStartMissingQueue on Windows === FAIL: TestReplicationStartMissingQueue (1.60s) logger.go:130: 2023-03-17T10:42:07.269Z DEBUG Created new durable queue for replication stream {"id": "0000000000000001", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestReplicationStartMissingQueue76668607\\001\\replicationq\\0000000000000001"} logger.go:130: 2023-03-17T10:42:07.305Z INFO Opened replication stream {"id": "0000000000000001", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestReplicationStartMissingQueue76668607\\001\\replicationq\\0000000000000001"} testing.go:1206: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestReplicationStartMissingQueue76668607\001\replicationq\0000000000000001\1: The process cannot access the file because it is being used by another process. Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: update TestWAL_DiskSize Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> * test: fix failing TestWAL_DiskSize on Windows === FAIL: tsdb/engine/tsm1 TestWAL_DiskSize (2.65s) testing.go:1206: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestWAL_DiskSize2736073801\001\_00006.wal: The process cannot access the file because it is being used by another process. Signed-off-by: Eng Zer Jun <engzerjun@gmail.com> --------- Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2023-03-21 16:22:11 -04:00
Sam Arnold	b970e359dc	feat: remaining storage metrics from OSS engine (#22938 ) * fix: simplify disk size tracking * refactor: EngineTags in tsdb package * fix: fewer compaction buckets and dead code removal * feat: shard metrics * chore: formatting * feat: tsdb store metrics * feat: retention check metrics * chore: fix go vet * fix: review comments	2021-12-02 09:01:46 -05:00
Sam Arnold	edb21abe91	feat: metrics for wal subsystem (#22918 ) https://github.com/influxdata/influxdb/issues/20026	2021-11-23 12:17:52 -05:00
Daniel Moran	d747e7ec4e	feat: add config parameters to toggle WAL concurrency and timeouts (#21621 ) * feat: add context parameter to Take() method on fixed limiter * refactor: plumb context through to uses of Take() * test: update tests to pass context as needed * feat: add config toggles for setting WAL write concurrency & timeout	2021-06-09 11:03:53 -04:00
Yun Zhao	ce536037dc	fix(tsm1): limit concurrent WAL encodings to reduce memory pressure under heavy write load (#20814 ) Co-authored-by: zhaoyun.248 <zhaoyun.248@bytedance.com>	2021-06-03 16:11:36 -04:00
Yun Zhao	265c1f311e	fix(tsm1): fix wal's totalOldDiskSize statistics (#20811 )	2021-03-03 15:20:24 -05:00
Tristan Su	1a00f2f123	fix(tsm): should not check write-ahead-log size against default size (#20585 ) it should check against the local saved SegmentSize instead of the default const DefaultSegmentSize.	2021-02-10 10:32:53 -05:00
Stuart Carnie	dee8977d2c	chore: move v2/v1/tsdb → v2/tsdb	2020-08-26 10:46:47 -07:00
Mark Rushakoff	f2898d1992	Wipe out workspace in preparation for v2 merge "Knock knock." "Who's there?" "InfluxDB Veet." ...	2019-01-11 10:38:50 -08:00
Edd Robinson	0fc7643d59	Fix data race in WAL This commit fixes a data race in the WAL, which can occur when writes and deletes are being executed concurrently. The WAL uses a buffer pool of `[]byte` when reading the WAL. WAL entries are unmarshaled into these buffers and passed along to the relevant methods handling the different types of entry (write, delete etc). In the case of deletes, the keys that need to be deleted were being stored for later processing, however these keys were part of the backing array of initial buffer from the pool. As such, those keys could be written to at a future time when handling other parts of the WAL.	2018-03-15 12:51:30 +00:00
Jonathan A. Sternberg	d38413a849	Merge pull request #9454 from influxdata/js-structured-logging Update logging calls to take advantage of structured logging	2018-02-21 09:14:40 -06:00
Jason Wilder	f7279b57f3	Re-open last WAL segment Re-open the last wal segment instead of creating a new one. This fixes an issue where the last modified time of the WAL would change on restart. It also avoids a lot of IO file churn on restart.	2018-02-20 14:24:04 -07:00
Jonathan A. Sternberg	2bbd96768d	Update logging calls to take advantage of structured logging Includes a style guide that details the basics of how to log.	2018-02-20 10:04:19 -06:00
Jason Wilder	3299e549aa	Increase WAL write buffer size The default of 4096 results in writes to the WAL still requiring muliple IOs. We had previously bumped this to 1M, but that was too high when there are many shards. Increasing to around 16k reduces the IOs to one or two for the workloads tested. We may want to make this configurable in the future.	2018-01-31 13:55:32 -07:00
Joe LeGasse	68e20c4f80	wal: update lastWriteTime behavior	2018-01-16 21:22:24 -05:00
Jonathan A. Sternberg	0b7c56bcd8	Update the zap logger dependency The previous sha was taken from a revision on a devel branch that I thought would continue staying in the tree after it was merged. That revision was rebased away and the API was changed for the logger. This updates the usage of the logger and adds a simple package for constructing the base logger. The 1.0 version of zap changed the format of the default console logger so this change moves over to this new logger instead of attempting to retain backwards compatibility with the old format.	2017-11-10 16:27:16 -06:00
Jason Wilder	fb7135ddc8	Fix corrupted wal segment panic on 32 bit systems	2017-10-16 09:41:20 -06:00
Jason Wilder	f668b0cc3f	Only use O_SYNC for tsm file writing Doing this for the WAL reduces throughput quite a bit.	2017-10-03 10:48:13 -06:00
Jason Wilder	122a74c692	Use synchronous IO for wal and tsm writing The fysncs due to large writes when writing to TSM files and the WAL can eventually cause large pauses. Since we already buffer writes, using synchronous IO reduces fsync latency by ensuring the individiual writes hit disk. This spreads out the latecncy across multiple writes better.	2017-09-25 12:44:57 -06:00
Jason Wilder	78922f9821	Set rc to nil when closing WALSegmentReader	2017-09-08 14:55:02 -06:00
Jason Wilder	5581f8b4ae	Re-use WALSegmentReaders at startup	2017-09-07 12:56:17 -06:00
Jason Wilder	778000435a	Conver all keys from string to []byte in TSM engine This switches all the interfaces that take string series key to take a []byte. This eliminates many small allocations where we convert between to two repeatedly. Eventually, this change should propogate futher up the stack.	2017-07-28 11:00:50 -06:00
Stuart Carnie	eec80692c4	Taught tsm1 storage engine how to read and write uint64 values * introduced UnsignedValue type * leveraged existing int64 compression algorithms (RLE, Simple 8B) * tsm and WAL can read and write UnsignedValue * compaction is aware of UnsignedValue * unsigned support to model, cursors and write points NOTE: there is no support to create unsigned points, as the line protocol has not been modified.	2017-07-24 09:03:22 -07:00
Jason Wilder	e9370e0b86	Fix indefinite hang in WAL.writeToLog There was a race in the WAL writeToLog and scheduleSync which could lead to a writing goroutine blocking indefinitely on its syncErr channel. The issue was that the clearing of the syncCount happenend after the wal was unlock. If a goroutine was able to lock, write and call scheduleSync before the existing scheduleSync goroutine returns and ran the defer to clear the syncCount, then a new scheduleSync goroutine would not get started. This left the writing goroutine block with nothing to signal it. While in this state, a RLock on the engine was held. If a Lock was requested on the engine during this time, all future writes and queries would block waiting on the blocked wal writer. The fix is to move the atomic clearing of syncCount before the Lock is released.	2017-07-07 13:31:52 -06:00
Stuart Carnie	c863923e68	cache MarshalSize	2017-05-12 14:05:25 -06:00
Stuart Carnie	0151afe31c	check size and allocate once	2017-05-12 14:05:25 -06:00
Stuart Carnie	096d6f65b4	explicit sizes	2017-05-12 14:05:24 -06:00
Jason Wilder	503d41a08f	Add LimitedBytePool for wal buffers This pool was previously a pool.Bytes to avoid repetitive allocations. It was recently switchted to a sync.Pool because pool.Bytes held onto very larger buffers at times which were never released. sync.Pool is showing up in allocation profiles quite frequently. This switches the pool to a new pool that limits how many buffers are in the pool as well as the max size of each buffer in the pool. This provides better bounds on allocations.	2017-05-11 11:27:00 -06:00
Jason Wilder	e102fcca9c	Use buffer writer for wal segments	2017-05-10 11:42:32 -06:00
Jason Wilder	88848a9426	Remove per shard monitor goroutine The monitor goroutine ran for each shard and updated disk stats as well as logged cardinality warnings. This goroutine has been removed by making the disks stats more lightweight and callable direclty from Statisics and move the logging to the tsdb.Store. The latter allows one goroutine to handle all shards.	2017-05-03 16:31:57 -06:00
Jason Wilder	137d0c0d09	Rename WAL.WritePoints to WAL.WriteMulti To match Cache.WriteMulti	2017-04-28 13:20:55 -06:00
Jason Wilder	d88604f6f2	Move repetive loop checks outside of values loop	2017-04-20 13:45:04 -06:00
Jason Wilder	888689f5d3	Move values loop under type switch All the values read must be of the same type so repeatedly using the type switch is confusing and less efficiient.	2017-04-20 13:39:49 -06:00
Jason Wilder	b0988511bf	Use fixed size array instead of slice	2017-04-20 13:38:33 -06:00
Jason Wilder	da6bdfdda8	Use bufio.Reader when reading wal segments Reduces disk IO due to small reads.	2017-04-20 13:33:42 -06:00
Jason Wilder	8e9cbd7ffc	Simplify WALSegmentReader.UnmarshalBinary There were two loops over nvals which created some extra allocation which coudl be replaced with a simplet slice capacity and append.	2017-04-20 13:33:42 -06:00
Jason Wilder	ef65ee77f4	Switch WAL byte pools to sync/pool The current bytes.Pool will hold onto byte slices indefinitely. Large writes can cause the pool to hold onto very large buffers over time. Testing w/ sync/pool seems to perform similarly now so using a sync/pool will allow these buffers to be GC'd when necessary.	2017-04-20 12:28:42 -06:00
Jason Wilder	d7c5dd0a3e	Reduce wal sync goroutine churn Under high write load, the sync goroutine would startup, and end very frequently. Starting a new goroutine so frequently adds a small amount of latency which causes writes to take long and sometimes timeout. This changes the goroutine to loop until there are no more waiters which reduce the churn and latency.	2017-04-20 12:28:34 -06:00
Jason Wilder	aa9925621b	Fix deadlock in wal If the sync waiters channel was full, it would block sending to the channel while holding a the wal write lock. The sync goroutine would then be stuck acquiring the write lock and could not drain the channel. This increases the buffer to 1024 which would require a very high write load to fill as well as retuns and error if the channel is full to prevent the blocking.	2017-04-19 11:33:13 -06:00
Jason Wilder	c443e639b0	Fix 32bit alignment issue in wal.sync	2017-03-22 11:21:29 -06:00
Jason Wilder	8f7b251afd	Merge branch 'master' into jw-tsi	2017-03-20 17:17:26 -06:00
Jason Wilder	e9eb925170	Coalesce multiple WAL fsyncs Fsyncs to the WAL can cause higher IO with lots of small writes or slower disks. This reworks the previous wal fsyncing to remove the extra goroutine and remove the hard-coded 100ms delay. Writes to the wal still maintain the invariant that they do not return to the caller until the write is fsync'd. This also adds a new config options wal-fsync-delay (default 0s) which can be increased if a delay is desired. This is somewhat useful for system with slower disks, but the current default works well as is.	2017-03-15 16:31:03 -06:00
Ben Johnson	358b1e0b05	Merge remote-tracking branch 'upstream/master' into tsi	2017-03-15 10:13:32 -06:00
Mark Rushakoff	601cbcd084	Merge branch '1.2' into mr-merge-12	2017-02-17 16:14:22 -08:00
Jonathan A. Sternberg	2fe48d6781	Rename zap import back to github.com/uber-go/zap They rebased a revision we were previously relying upon that allowed us to use the vanity name so we are reverting back to an older version with the old import path.	2017-02-17 17:17:22 -06:00
Jason Wilder	93a9d01643	Increase default waiting WAL writes	2017-02-06 11:48:51 -07:00
Jason Wilder	38a649fc40	Batch multiple WAL fsyncs Every write to the WAL current runs and fsync before returning. When there are lot of concurrent writes, this can cause the WAL to bottleneck write throughput since fsyncs are very expensive. This changes the writeToLog to fsync on an interval to allow multiple fsyncs calls to be batched up into one. The writeToLog behavior is the same in that it won't return until an fsync has been performed.	2017-02-06 11:48:45 -07:00

1 2 3

123 Commits (db/panic-at-the-cursor)