Commit Graph

31 Commits (master)

Author SHA1 Message Date
Eng Zer Jun 903d30d658
test: use `T.TempDir` to create temporary test directory (#23258)
* test: use `T.TempDir` to create temporary test directory

This commit replaces `os.MkdirTemp` with `t.TempDir` in tests. The
directory created by `t.TempDir` is automatically removed when the test
and all its subtests complete.

Prior to this commit, temporary directory created using `os.MkdirTemp`
needs to be removed manually by calling `os.RemoveAll`, which is omitted
in some tests. The error handling boilerplate e.g.
	defer func() {
		if err := os.RemoveAll(dir); err != nil {
			t.Fatal(err)
		}
	}
is also tedious, but `t.TempDir` handles this for us nicely.

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: fix failing TestSendWrite on Windows

=== FAIL: replications/internal TestSendWrite (0.29s)
    logger.go:130: 2022-06-23T13:00:54.290Z	DEBUG	Created new durable queue for replication stream	{"id": "0000000000000001", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestSendWrite1627281409\\001\\replicationq\\0000000000000001"}
    logger.go:130: 2022-06-23T13:00:54.457Z	ERROR	Error in replication stream	{"replication_id": "0000000000000001", "error": "remote timeout", "retries": 1}
    testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestSendWrite1627281409\001\replicationq\0000000000000001\1: The process cannot access the file because it is being used by another process.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: fix failing TestStore_BadShard on Windows

=== FAIL: tsdb TestStore_BadShard (0.09s)
    logger.go:130: 2022-06-23T12:18:21.827Z	INFO	Using data dir	{"service": "store", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestStore_BadShard1363295568\\001"}
    logger.go:130: 2022-06-23T12:18:21.827Z	INFO	Compaction settings	{"service": "store", "max_concurrent_compactions": 2, "throughput_bytes_per_second": 50331648, "throughput_bytes_per_second_burst": 50331648}
    logger.go:130: 2022-06-23T12:18:21.828Z	INFO	Open store (start)	{"service": "store", "op_name": "tsdb_open", "op_event": "start"}
    logger.go:130: 2022-06-23T12:18:21.828Z	INFO	Open store (end)	{"service": "store", "op_name": "tsdb_open", "op_event": "end", "op_elapsed": "77.3µs"}
    testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestStore_BadShard1363295568\002\data\db0\rp0\1\index\0\L0-00000001.tsl: The process cannot access the file because it is being used by another process.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: fix failing TestPartition_PrependLogFile_Write_Fail and TestPartition_Compact_Write_Fail on Windows

=== FAIL: tsdb/index/tsi1 TestPartition_PrependLogFile_Write_Fail/write_MANIFEST (0.06s)
    testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestPartition_PrependLogFile_Write_Failwrite_MANIFEST656030081\002\0\L0-00000003.tsl: The process cannot access the file because it is being used by another process.
    --- FAIL: TestPartition_PrependLogFile_Write_Fail/write_MANIFEST (0.06s)

=== FAIL: tsdb/index/tsi1 TestPartition_Compact_Write_Fail/write_MANIFEST (0.08s)
    testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestPartition_Compact_Write_Failwrite_MANIFEST3398667527\002\0\L0-00000003.tsl: The process cannot access the file because it is being used by another process.
    --- FAIL: TestPartition_Compact_Write_Fail/write_MANIFEST (0.08s)

We must close the open file descriptor otherwise the temporary file
cannot be cleaned up on Windows.

Fixes: 619eb1cae6 ("fix: restore in-memory Manifest on write error")
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: fix failing TestReplicationStartMissingQueue on Windows

=== FAIL: TestReplicationStartMissingQueue (1.60s)
    logger.go:130: 2023-03-17T10:42:07.269Z	DEBUG	Created new durable queue for replication stream	{"id": "0000000000000001", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestReplicationStartMissingQueue76668607\\001\\replicationq\\0000000000000001"}
    logger.go:130: 2023-03-17T10:42:07.305Z	INFO	Opened replication stream	{"id": "0000000000000001", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestReplicationStartMissingQueue76668607\\001\\replicationq\\0000000000000001"}
    testing.go:1206: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestReplicationStartMissingQueue76668607\001\replicationq\0000000000000001\1: The process cannot access the file because it is being used by another process.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: update TestWAL_DiskSize

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: fix failing TestWAL_DiskSize on Windows

=== FAIL: tsdb/engine/tsm1 TestWAL_DiskSize (2.65s)
    testing.go:1206: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestWAL_DiskSize2736073801\001\_00006.wal: The process cannot access the file because it is being used by another process.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

---------

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2023-03-21 16:22:11 -04:00
Jeffrey Smith II 77fd64a975
fix: handle replication missing queue (#24123)
* fix: replications should startup after backup/restore

* chore: refactor

* test: improve logging and handle test better
2023-03-09 13:10:53 -05:00
suitableZebraCaller ec7fdd3a58
fix: Show Replication Queue size and Replication TCP Errors (#23960)
* feat: Show remaining replication queue size

* fix: Show non-http related error messages

* fix: Show non-http related error messages with backoff

* fix: Updates for replication tests

* chore: formatting

* chore: formatting

* chore: formatting

* chore: formatting

* chore: lowercase json field

---------

Co-authored-by: Geoffrey <suitableZebraCaller@users.noreply.github.com>
Co-authored-by: Jeffrey Smith II <jeffreyssmith2nd@gmail.com>
2023-02-02 09:47:45 -05:00
Jeffrey Smith II f026d7bdaf
fix: Fixes migrating when a remote already exists (#23912)
* fix: handle migrating with already defined remotes

* test: add test to verify migrating already defined remotes

* fix: properly handle Up
2022-11-17 14:23:10 -05:00
Ole Kristian (Zee) 666cabb1f4
fix: fix wrong max age transformation from seconds (#23684)
* fix: fix wrong max age transformation from seconds

* refactor: clarify max age intent

* refactor: remove unnecessary duration
2022-11-16 16:18:43 -05:00
Dane Strandboge 6fc66acb0a
fix: do not require remoteOrgID in remote config/creation request (#23838) 2022-11-01 09:47:45 -05:00
Dane Strandboge 55b7d29e4f
fix: sql scan error on remote bucket id when replication to 1.x (#23826) 2022-10-19 14:51:48 -05:00
Jeffrey Smith II 6f50e70960
feat: replicate based on bucket name rather than id (#23638)
* feat: add the ability to replicate based on bucket name rather than bucket id.

- This adds compatibility with 1.x replication targets

* fix: improve error checking and add tests

* fix: add additional constraint to replications table

* fix: use OR not AND for constraint

* feat: delete invalid replications on downgrade

* fix: should be less than 2.4

* test: add test around down migration and cleanup migration code

* fix: use nil instead of platform.ID(1) for better consistency

* fix: fix tests

* fix: fix tests
2022-08-18 14:21:59 -04:00
Dane Strandboge 9e556864a3
fix: replications remote write failure can deadlock remote writer (#23458) 2022-06-16 11:57:24 -05:00
Dane Strandboge 359fcc46b5
feat: add maximum age to replication queues (#23206)
Co-authored-by: Sam Arnold <sarnold@influxdata.com>
2022-03-25 13:06:05 -05:00
Sam Arnold e20b5e99a6
fix: remove nats for scraper processing (#23107)
* fix: remove nats for scraper processing

Scrapers now use go channels instead of NATS and interprocess communication.
This should fix #23085 .

Additionally, found and fixed #23106 .

* chore: fix formatting

* chore: fix static check and go.mod

* test: fix some flaky tests

* fix: mark NATS arguments as deprecated
2022-02-10 11:23:18 -05:00
William Baker c1d384de19
test: fix flaky enqueue test (#23035) 2022-01-10 08:04:59 -08:00
mcfarlm3 60234964d0
refactor: replications local write optimization (#22993)
* refactor: eliminate sqlite query in case of no configured replications

* refactor: updated write-related tests to reflect tracking of orgID and localBucket by the queue manager

* refactor: removed redundant trackedReplications field

* refactor: corrected slice init in GetReplications and added TestGetReplications

* refactor: eliminated tracked package and moved TrackedReplication struct to influxdb package via replication.go

* chore: ran make fmt

* fix: added closeRq function back in to address flaky tests

* refactor: small changes to queue manager test based on code review
2021-12-15 12:32:46 -08:00
William Baker a7a5233432
feat: advance queue scanner periodically instead of every remote write (#22981) 2021-12-13 10:09:36 -06:00
William Baker e3ff434f81
test: fix flaky replications tests (#22973)
* fix: fix test and run 20 times

* fix: unfix and run test 20 times

* test: wait for rq run fn to return in tests
2021-12-08 14:48:25 -06:00
William Baker e5cbd279ee
fix: advance replications queue after successful remote writes (#22967)
* fix: advance replications queue after successful remote writes to prevent data duplication on errors

* fix: loop on sendwrite

* chore: remove flaky test

* chore: add TODO about future optimization
2021-12-08 12:52:46 -06:00
William Baker 6096ee2ad4
feat: replications metrics include failure to enqueue (#22962)
* feat: replications metrics include failure to enqueue
2021-12-02 14:42:55 -06:00
William Baker e4e16335f5
fix: replications remote writes do not block server shutdown (#22958)
* fix: replications remote writes do not block server shutdown

* fix: don't leak goroutine
2021-12-02 12:04:52 -06:00
William Baker 3460f1cc52
feat: replication remote writes do not block local writes (#22956)
* feat: replication remote writes do not block local writes
2021-12-01 15:37:10 -06:00
William Baker f05d0136f1
feat: metrics collection for replications remote writes (#22952)
* feat: metrics collection for replications remote writes

* fix: don't update metrics with 204 error code on successful writes
2021-12-01 12:41:24 -06:00
William Baker 9873ccd657
feat: remote write function for replications (#22942)
* feat: remote write function for replications

* chore: implement UpdateResponseInfo store method

* chore: only set gzip heading for non-empty requests

* fix: address review feedback
2021-11-30 15:33:42 -06:00
William Baker f47d514225
refactor: move replications store functionality to separate package (#22923)
* refactor: move replications store functionality to separate package

* fix: make opening all repls on startup work right
2021-11-24 11:45:19 -06:00
William Baker 3a81166812
feat: added metrics collection for replications (#22906)
* feat: added metrics collection for replications

* fix: fixed panic when restarting

* fix: fix panic pt2

* chore: self-review fixes

* chore: simplify test
2021-11-22 11:40:03 -06:00
Dane Strandboge 6ee472725f
refactor: use remote write func in NewDurableQueueManager (#22888) 2021-11-19 11:31:10 -06:00
Dane Strandboge 40d9587ece
feat: add replications queue scanner (#22873)
Co-authored-by: “mcfarlm3” <“58636946+mcfarlm3@users.noreply.github.com”>
2021-11-16 10:30:52 -06:00
Daniel Moran 6b56af3c3f
feat: mirror writes to registered replications (#22833) 2021-11-10 08:25:47 -05:00
mcfarlm3 cd0243d2b4
feat: added replications queue management to launcher tasks (#22820)
* feat: added replications queue management to launcher tasks

* refactor: separated sql logic into replications service rather than durable queue manager

* refactor: extended replications feature flag to launcher code and minor change to startup function param

* chore: added unit test coverage for replications server startup queue management

* refactor: made error messages reusable and factored out unecessary string from queue management tests

* refactor: changed queue management error names to pass linter check
2021-11-09 11:32:07 -08:00
Daniel Moran 1aac92c5ee
refactor: remove replications.current_queue_size_bytes from sqlite (#22832)
Maintaining the current queue size in a SQL column would require
updating the DB on every queue operation. Avoid that contention by
instead looking up the current size on the in-memory durable queue
struct, which is already tracked & updated as data enters & leaves
the queue.
2021-11-05 14:35:12 -04:00
mcfarlm3 8825cd5d50
feat: replication apis durable queue management (#22719)
* feat: added durable queue management to replications service

* refactor: improved mapping of replication streams to durable queues

* refactor: modified replication stream durable queues to use user-specified engine path

* chore: generated test mocks for replications DurableQueueManager

* chore: add test coverage for replications durable queue manager

* refactor: made changes based on code review, added mutex to durableQueueManager, improved error logging

* chore: ran make fmt

* refactor: further improvements to error logging
2021-10-26 12:14:29 -07:00
Daniel Moran 7c19225bed
feat: implement replication validation (#22581) 2021-10-05 14:34:38 -04:00
Daniel Moran 12c8fd28d2
feat: implement metadata management for replications (#22302) 2021-09-01 12:01:41 -04:00