* test: use `T.TempDir` to create temporary test directory
This commit replaces `os.MkdirTemp` with `t.TempDir` in tests. The
directory created by `t.TempDir` is automatically removed when the test
and all its subtests complete.
Prior to this commit, temporary directory created using `os.MkdirTemp`
needs to be removed manually by calling `os.RemoveAll`, which is omitted
in some tests. The error handling boilerplate e.g.
defer func() {
if err := os.RemoveAll(dir); err != nil {
t.Fatal(err)
}
}
is also tedious, but `t.TempDir` handles this for us nicely.
Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* test: fix failing TestSendWrite on Windows
=== FAIL: replications/internal TestSendWrite (0.29s)
logger.go:130: 2022-06-23T13:00:54.290Z DEBUG Created new durable queue for replication stream {"id": "0000000000000001", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestSendWrite1627281409\\001\\replicationq\\0000000000000001"}
logger.go:130: 2022-06-23T13:00:54.457Z ERROR Error in replication stream {"replication_id": "0000000000000001", "error": "remote timeout", "retries": 1}
testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestSendWrite1627281409\001\replicationq\0000000000000001\1: The process cannot access the file because it is being used by another process.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* test: fix failing TestStore_BadShard on Windows
=== FAIL: tsdb TestStore_BadShard (0.09s)
logger.go:130: 2022-06-23T12:18:21.827Z INFO Using data dir {"service": "store", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestStore_BadShard1363295568\\001"}
logger.go:130: 2022-06-23T12:18:21.827Z INFO Compaction settings {"service": "store", "max_concurrent_compactions": 2, "throughput_bytes_per_second": 50331648, "throughput_bytes_per_second_burst": 50331648}
logger.go:130: 2022-06-23T12:18:21.828Z INFO Open store (start) {"service": "store", "op_name": "tsdb_open", "op_event": "start"}
logger.go:130: 2022-06-23T12:18:21.828Z INFO Open store (end) {"service": "store", "op_name": "tsdb_open", "op_event": "end", "op_elapsed": "77.3µs"}
testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestStore_BadShard1363295568\002\data\db0\rp0\1\index\0\L0-00000001.tsl: The process cannot access the file because it is being used by another process.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* test: fix failing TestPartition_PrependLogFile_Write_Fail and TestPartition_Compact_Write_Fail on Windows
=== FAIL: tsdb/index/tsi1 TestPartition_PrependLogFile_Write_Fail/write_MANIFEST (0.06s)
testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestPartition_PrependLogFile_Write_Failwrite_MANIFEST656030081\002\0\L0-00000003.tsl: The process cannot access the file because it is being used by another process.
--- FAIL: TestPartition_PrependLogFile_Write_Fail/write_MANIFEST (0.06s)
=== FAIL: tsdb/index/tsi1 TestPartition_Compact_Write_Fail/write_MANIFEST (0.08s)
testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestPartition_Compact_Write_Failwrite_MANIFEST3398667527\002\0\L0-00000003.tsl: The process cannot access the file because it is being used by another process.
--- FAIL: TestPartition_Compact_Write_Fail/write_MANIFEST (0.08s)
We must close the open file descriptor otherwise the temporary file
cannot be cleaned up on Windows.
Fixes: 619eb1cae6 ("fix: restore in-memory Manifest on write error")
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* test: fix failing TestReplicationStartMissingQueue on Windows
=== FAIL: TestReplicationStartMissingQueue (1.60s)
logger.go:130: 2023-03-17T10:42:07.269Z DEBUG Created new durable queue for replication stream {"id": "0000000000000001", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestReplicationStartMissingQueue76668607\\001\\replicationq\\0000000000000001"}
logger.go:130: 2023-03-17T10:42:07.305Z INFO Opened replication stream {"id": "0000000000000001", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestReplicationStartMissingQueue76668607\\001\\replicationq\\0000000000000001"}
testing.go:1206: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestReplicationStartMissingQueue76668607\001\replicationq\0000000000000001\1: The process cannot access the file because it is being used by another process.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* test: update TestWAL_DiskSize
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* test: fix failing TestWAL_DiskSize on Windows
=== FAIL: tsdb/engine/tsm1 TestWAL_DiskSize (2.65s)
testing.go:1206: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestWAL_DiskSize2736073801\001\_00006.wal: The process cannot access the file because it is being used by another process.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
---------
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* feat: add the ability to replicate based on bucket name rather than bucket id.
- This adds compatibility with 1.x replication targets
* fix: improve error checking and add tests
* fix: add additional constraint to replications table
* fix: use OR not AND for constraint
* feat: delete invalid replications on downgrade
* fix: should be less than 2.4
* test: add test around down migration and cleanup migration code
* fix: use nil instead of platform.ID(1) for better consistency
* fix: fix tests
* fix: fix tests
* feat: start work on remotes/replications phone home data
* feat: add remotes/replications phone home data (no tests
* refactor: use erroring binary conversions
* style: gofmt
* refactor: improve some error handling
* style: cleanup
* feat: add tests
* refactor: just list remotes/replications rather than decrement
* chore: linting fix
Co-authored-by: DStrand1 <dstrandboge@influxdata.com>
Fixes a few issues:
* flux needs to write to the replication service, instead of the engine directly.
* the replication service incorrectly had value receiver methods, I think this
was just an accident. Pointer receivers make things easier to reason about. Also
with value receivers flux was not picking up the replication config properly.
* The flux to() function previously did not receive the org properly for internal
writes. Previously this was not necessary as the write path only needs the bucket
ID at this level (after authentication). But now we need the org id to look up
replications properly.
Closes#23183
* fix: remove nats for scraper processing
Scrapers now use go channels instead of NATS and interprocess communication.
This should fix#23085 .
Additionally, found and fixed#23106 .
* chore: fix formatting
* chore: fix static check and go.mod
* test: fix some flaky tests
* fix: mark NATS arguments as deprecated
* refactor: eliminate sqlite query in case of no configured replications
* refactor: updated write-related tests to reflect tracking of orgID and localBucket by the queue manager
* refactor: removed redundant trackedReplications field
* refactor: corrected slice init in GetReplications and added TestGetReplications
* refactor: eliminated tracked package and moved TrackedReplication struct to influxdb package via replication.go
* chore: ran make fmt
* fix: added closeRq function back in to address flaky tests
* refactor: small changes to queue manager test based on code review
* fix: advance replications queue after successful remote writes to prevent data duplication on errors
* fix: loop on sendwrite
* chore: remove flaky test
* chore: add TODO about future optimization
* feat: remote write function for replications
* chore: implement UpdateResponseInfo store method
* chore: only set gzip heading for non-empty requests
* fix: address review feedback
* feat: added replications queue management to launcher tasks
* refactor: separated sql logic into replications service rather than durable queue manager
* refactor: extended replications feature flag to launcher code and minor change to startup function param
* chore: added unit test coverage for replications server startup queue management
* refactor: made error messages reusable and factored out unecessary string from queue management tests
* refactor: changed queue management error names to pass linter check
Maintaining the current queue size in a SQL column would require
updating the DB on every queue operation. Avoid that contention by
instead looking up the current size on the in-memory durable queue
struct, which is already tracked & updated as data enters & leaves
the queue.
* feat: sql down migrations
* refactor: different name for up migrations
* chore: update migrations ref in svc tests
* build: add lint step to verify sql migration names match
* feat: added durable queue management to replications service
* refactor: improved mapping of replication streams to durable queues
* refactor: modified replication stream durable queues to use user-specified engine path
* chore: generated test mocks for replications DurableQueueManager
* chore: add test coverage for replications durable queue manager
* refactor: made changes based on code review, added mutex to durableQueueManager, improved error logging
* chore: ran make fmt
* refactor: further improvements to error logging