Avoid converting times to int64 in the Task Scheduler
to preserve time zone information. This corrects a
failure after fall back time changes which halts
every-type tasks
closes https://github.com/influxdata/influxdb/issues/25110
* fix: prevent retention service from hanging (#25055)
Fix issue that can cause the retention service to hang waiting on a
`Shard.Close` call. When this occurs, no other shards will be deleted
by the retention service. This is usually noticed as an increase in
disk usage because old shards are not cleaned up.
The fix adds to new methods to `Store`, `SetShardNewReadersBlocked`
and `InUse`. `InUse` can be used to poll if a shard has active readers,
which the retention service uses to skip over in-use shards to prevent
the service from hanging. `SetShardNewReadersBlocked` determines if
new read access may be granted to a shard. This is required to prevent
race conditions around the use of `InUse` and the deletion of shards.
If the retention service skips over a shard because it is in-use, the
shard will be checked again the next time the retention service is run.
It can be deleted on subsequent checks if it is no longer in-use. If
the shards is stuck in-use, the retention service will not be able to
delete the shards, which can be observed in the logs for manual
intervention. Other shards can still be deleted by the retention service
even if a shard is stuck with readers.
This is a port of ad68ec8 from master-1.x to main-2.x.
closes: #25076
(cherry picked from commit b4bd607eef)
Stacks and templates allow specifying file:// URLs. Add command line
option `--template-file-urls-disabled` to disable their use for people who don't require them.
* feat: update flux to latest head
Flux has updated some dependencies, including prometheus. Prometheus
has changed in some incompatible ways. Update the flux dependency
to a newer version with the updated prometheus dependency and apply
some small fixes to make everything build. This is in preparation
for a flux release later in the week.
The biggest change is in some tests that were using runtime.DeepEqual
to check the correctness of prometheus metrics. The internals of
these types have changed such that this is not a safe thing to do
anymore. The test now verifies the string representations, as
produced by String(), match.
* fix: update CI script
The scripts/ci/check-system-go-matches-go-mod.sh is failing because
newer go toolchains include the bugfix version in go.mod's go
directive. Update the script to check the major and minor versions
reported by both tools match.
This adds locking to the load method and renames it to Reload(). This
method replaces the cached data from the underlying kv.Store and needs a
write lock. The restore api uses it and may have been an issue with
concurrent writes into the cached data during a restore.
* fixes#24895
* feat(tsm1/wal): encapsulate expiring WAL files in FileDisposer
This changeset introduces an interface extension point named
FileDisposer to control what to do with WAL files when they are no
longer needed. Currently, the only implementation is to delete the file
which is the existing behavior.
* chore: accumulate errors
Since we're here, capture the previously ignored fs errors and pass up a
combined error (which the only callers log out).
* chore: download repository key to file
* fix: broken perf tests
Some perf tests had to be temporarily disabled. Work is
needed in the pref_tests repositories to make them work
again.
* fix(tsi1/partition/test): fix data race in test code
TestPartition_Compact_Write_Fail test was not locking the partition
before changing the value of MaxLogFileSize. This PR exports the mutex
of the partition to allow the test to access it and lock. Alternatives
require more changes such as a Setter method if we need to hide the
mutex.
* fixes#24042, for #24040
* chore: complete renaming of mutex in file and fix flux test
The flux test is another failing test because it was using a relative
time range.
Under certain circumstances, the retention service can fail to delete shards from
the store in a timely manner. When the shard groups are pruned based on age, this
leaves orphaned shard files on the disk. The retention service will then not attempt
to remove the obsolete shard files because the meta store does not know about them.
This can cause excessive disk space usage for some users.
This corrects that by requiring shards files be deleted before they can be removed
from the meta store.
fixes: #24529
(cherry picked from commit 7bd3f89d18)
closes https://github.com/influxdata/influxdb/issues/24545
Co-authored-by: Geoffrey Wossum <gwossum@influxdata.com>
HTTP 5XX errors were being returned incorrectly from
BoltDB errors that were actually bad requests, e.g.,
names that were too long for buckets, users, and
organizations. Map BoltDB errors to correct Influx
errors and return 4XX errors where appropriate. Also
add op codes to more errors
Some series files which are smaller than the standard
sizes cause SIGBUS in influx_inspect and influxd, because
entry iteration walks onto mapped memory not backed by the
the file. Avoid walking off the end of the file while
iterating series entries in oddly sized files.
closes https://github.com/influxdata/influxdb/issues/24508
Co-authored-by: Geoffrey Wossum <gwossum@influxdata.com>
(cherry picked from commit 969abf3da2)
closes https://github.com/influxdata/influxdb/issues/24511
To assist debugging of write failures
in Edge Data Replication, do not
write only the HTTP status code to
the log. Also include any messages
returned by the write recipient.
closes https://github.com/influxdata/influxdb/issues/24481
To allow rudimentary security auditing of logs,
add the authenticating ID and the user ID when
possible to the request logs. When a request is
authorized for V1 or V2 API, store the authorizer
object to be used by the logger up the call stack.
closes https://github.com/influxdata/influxdb/issues/24473
Correctly handle errors in converting MetricSlice
elements into model.Points. Add a test to verify
error handling.
(cherry picked from commit 19e5c0e1b7)
This updates the job logic so that workflow condition is evaluated
by CircleCI rather than the shell. This also uses the "aws-s3" orb
for uploading to S3 (rather than awscli).
The terraform shipped with snap (in the older version of Ubuntu)
only supported public key encryption with ssh-rsa. New versions
of Linux started deprecating ssh-rsa, so this version bump
is required.
This allows changelogs to be built from "non-release" tags. These
changelogs use "UNRELEASED" as the first section header. Commits
from these sections are eventually rolled into a proper "release"
tag (e.g v2.7.0).