* feat(tsm1/wal): encapsulate expiring WAL files in FileDisposer
This changeset introduces an interface extension point named
FileDisposer to control what to do with WAL files when they are no
longer needed. Currently, the only implementation is to delete the file
which is the existing behavior.
* chore: accumulate errors
Since we're here, capture the previously ignored fs errors and pass up a
combined error (which the only callers log out).
* chore: download repository key to file
* fix: broken perf tests
Some perf tests had to be temporarily disabled. Work is
needed in the pref_tests repositories to make them work
again.
* fix(tsi1/partition/test): fix data race in test code
TestPartition_Compact_Write_Fail test was not locking the partition
before changing the value of MaxLogFileSize. This PR exports the mutex
of the partition to allow the test to access it and lock. Alternatives
require more changes such as a Setter method if we need to hide the
mutex.
* fixes#24042, for #24040
* chore: complete renaming of mutex in file and fix flux test
The flux test is another failing test because it was using a relative
time range.
Under certain circumstances, the retention service can fail to delete shards from
the store in a timely manner. When the shard groups are pruned based on age, this
leaves orphaned shard files on the disk. The retention service will then not attempt
to remove the obsolete shard files because the meta store does not know about them.
This can cause excessive disk space usage for some users.
This corrects that by requiring shards files be deleted before they can be removed
from the meta store.
fixes: #24529
(cherry picked from commit 7bd3f89d18)
closes https://github.com/influxdata/influxdb/issues/24545
Co-authored-by: Geoffrey Wossum <gwossum@influxdata.com>
HTTP 5XX errors were being returned incorrectly from
BoltDB errors that were actually bad requests, e.g.,
names that were too long for buckets, users, and
organizations. Map BoltDB errors to correct Influx
errors and return 4XX errors where appropriate. Also
add op codes to more errors
Some series files which are smaller than the standard
sizes cause SIGBUS in influx_inspect and influxd, because
entry iteration walks onto mapped memory not backed by the
the file. Avoid walking off the end of the file while
iterating series entries in oddly sized files.
closes https://github.com/influxdata/influxdb/issues/24508
Co-authored-by: Geoffrey Wossum <gwossum@influxdata.com>
(cherry picked from commit 969abf3da2)
closes https://github.com/influxdata/influxdb/issues/24511
To assist debugging of write failures
in Edge Data Replication, do not
write only the HTTP status code to
the log. Also include any messages
returned by the write recipient.
closes https://github.com/influxdata/influxdb/issues/24481
To allow rudimentary security auditing of logs,
add the authenticating ID and the user ID when
possible to the request logs. When a request is
authorized for V1 or V2 API, store the authorizer
object to be used by the logger up the call stack.
closes https://github.com/influxdata/influxdb/issues/24473
Correctly handle errors in converting MetricSlice
elements into model.Points. Add a test to verify
error handling.
(cherry picked from commit 19e5c0e1b7)
This updates the job logic so that workflow condition is evaluated
by CircleCI rather than the shell. This also uses the "aws-s3" orb
for uploading to S3 (rather than awscli).
The terraform shipped with snap (in the older version of Ubuntu)
only supported public key encryption with ssh-rsa. New versions
of Linux started deprecating ssh-rsa, so this version bump
is required.
This allows changelogs to be built from "non-release" tags. These
changelogs use "UNRELEASED" as the first section header. Commits
from these sections are eventually rolled into a proper "release"
tag (e.g v2.7.0).
* test: use `T.TempDir` to create temporary test directory
This commit replaces `os.MkdirTemp` with `t.TempDir` in tests. The
directory created by `t.TempDir` is automatically removed when the test
and all its subtests complete.
Prior to this commit, temporary directory created using `os.MkdirTemp`
needs to be removed manually by calling `os.RemoveAll`, which is omitted
in some tests. The error handling boilerplate e.g.
defer func() {
if err := os.RemoveAll(dir); err != nil {
t.Fatal(err)
}
}
is also tedious, but `t.TempDir` handles this for us nicely.
Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* test: fix failing TestSendWrite on Windows
=== FAIL: replications/internal TestSendWrite (0.29s)
logger.go:130: 2022-06-23T13:00:54.290Z DEBUG Created new durable queue for replication stream {"id": "0000000000000001", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestSendWrite1627281409\\001\\replicationq\\0000000000000001"}
logger.go:130: 2022-06-23T13:00:54.457Z ERROR Error in replication stream {"replication_id": "0000000000000001", "error": "remote timeout", "retries": 1}
testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestSendWrite1627281409\001\replicationq\0000000000000001\1: The process cannot access the file because it is being used by another process.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* test: fix failing TestStore_BadShard on Windows
=== FAIL: tsdb TestStore_BadShard (0.09s)
logger.go:130: 2022-06-23T12:18:21.827Z INFO Using data dir {"service": "store", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestStore_BadShard1363295568\\001"}
logger.go:130: 2022-06-23T12:18:21.827Z INFO Compaction settings {"service": "store", "max_concurrent_compactions": 2, "throughput_bytes_per_second": 50331648, "throughput_bytes_per_second_burst": 50331648}
logger.go:130: 2022-06-23T12:18:21.828Z INFO Open store (start) {"service": "store", "op_name": "tsdb_open", "op_event": "start"}
logger.go:130: 2022-06-23T12:18:21.828Z INFO Open store (end) {"service": "store", "op_name": "tsdb_open", "op_event": "end", "op_elapsed": "77.3µs"}
testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestStore_BadShard1363295568\002\data\db0\rp0\1\index\0\L0-00000001.tsl: The process cannot access the file because it is being used by another process.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* test: fix failing TestPartition_PrependLogFile_Write_Fail and TestPartition_Compact_Write_Fail on Windows
=== FAIL: tsdb/index/tsi1 TestPartition_PrependLogFile_Write_Fail/write_MANIFEST (0.06s)
testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestPartition_PrependLogFile_Write_Failwrite_MANIFEST656030081\002\0\L0-00000003.tsl: The process cannot access the file because it is being used by another process.
--- FAIL: TestPartition_PrependLogFile_Write_Fail/write_MANIFEST (0.06s)
=== FAIL: tsdb/index/tsi1 TestPartition_Compact_Write_Fail/write_MANIFEST (0.08s)
testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestPartition_Compact_Write_Failwrite_MANIFEST3398667527\002\0\L0-00000003.tsl: The process cannot access the file because it is being used by another process.
--- FAIL: TestPartition_Compact_Write_Fail/write_MANIFEST (0.08s)
We must close the open file descriptor otherwise the temporary file
cannot be cleaned up on Windows.
Fixes: 619eb1cae6 ("fix: restore in-memory Manifest on write error")
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* test: fix failing TestReplicationStartMissingQueue on Windows
=== FAIL: TestReplicationStartMissingQueue (1.60s)
logger.go:130: 2023-03-17T10:42:07.269Z DEBUG Created new durable queue for replication stream {"id": "0000000000000001", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestReplicationStartMissingQueue76668607\\001\\replicationq\\0000000000000001"}
logger.go:130: 2023-03-17T10:42:07.305Z INFO Opened replication stream {"id": "0000000000000001", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestReplicationStartMissingQueue76668607\\001\\replicationq\\0000000000000001"}
testing.go:1206: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestReplicationStartMissingQueue76668607\001\replicationq\0000000000000001\1: The process cannot access the file because it is being used by another process.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* test: update TestWAL_DiskSize
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* test: fix failing TestWAL_DiskSize on Windows
=== FAIL: tsdb/engine/tsm1 TestWAL_DiskSize (2.65s)
testing.go:1206: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestWAL_DiskSize2736073801\001\_00006.wal: The process cannot access the file because it is being used by another process.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
---------
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* feat(monitor-ci/415): get oss-e2es working locally in UI repo
* chore: remove old, unused artifact directories
* fix: handle import cycle caused by trying to use the onboarding client
---------
Co-authored-by: Jeffrey Smith II <jsmith@influxdata.com>