* fix(garbage collector): limit catalog update for files to delete
Impose a 1000 LIMIT on flag_for_delete_by_retention so the garbage
collector's load on the catalog is limited. 1000 is used as the fixed
limit in another catalog DML.
* follow up to requests in #7562
* chore: add test for limit on update
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: add tests on month and year date_bin
* fix: add IOX_COMPARE: uuid to get deterministics name for output parquet_file in the explain
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: move authz-addr flag into router-specific config
* refactor: move authz-addr flag into querier-specific config
* refactor: remove global AuthzConfig which is now redundant with the pushdown to individual configs. Keep constant the env vars used universally.
* chore: make errors lowercase, and use the required bool for the authz-addr flag
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Remove the MockNamespaceNameProvider::default implementation that
hardcodes all namespace names to "bananas"; I just tore some of my hair
out trying to figure out why I was getting bananas WHEN THERE SHOULD
HAVE BEEN NO BANANAS
NamespaceData::new doesn't take an Arc of the namespace name loader, but
PartitionData::new does. If they both take Arcs, then making test
constants and helpers they can share is easier. Same deal for
TableData::new.
* feat: trying to get concurrency within partition compaction
* feat: Create a stream then map and try collect
* feat: set compactor concurrency within single partition's plans
* chore: add log about plan concurrency
---------
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Tests that use the in-memory catalog are creating different shards that
then creates old-style Parquet file paths, but in production, everything
uses the transition shard now. To make the tests more like production,
only ever create and use the transition shard, and stop checking for
different shard IDs.
This commit bounds the duration of time a write may take to be processed
after being added to the WAL for increased reliability.
In ideal cases, anything that is added to the WAL is buffered /
completed to prevent WAL replay materialising writes that never
completed in the first place (#7111), but in some cases (i.e. catalog
outage, etc) the write becomes blocked, stuck in a retry loop or
otherwise never making progress.
When a write is blocked, the data within it remains in RAM, and the
overhead of the spawned task retrying also consumes CPU resources. If
all writes are blocked for sufficient time, these limitless writes keep
building up until a resource is exhausted (i.e. RAM -> OOM) causing an
outage of the ingester.
Instead, this change allocates a write at most 15 seconds to complete,
before it is cancelled and an error returned to the user (if they're
still connected) to shed load. The error message "buffer apply request
timeout" was chosen to be uniquely identifying - systems have lots of
timeouts!
Normally the inner.apply() call at this point completes in microseconds,
so 15 seconds is more than long enough to avoid shedding legitimate
load.
* Don't print an info message for each deleted file. This can be 1000s
at a time and many more in total.
* Even if there are more files to delete, sleep the interval to decrease
catalog load.
* part of influxdata/idpe#17451