Allow the timeout to be referenced in docs, and increase it a bit - it's
it a bit close to being too low for ingester shutdown persistence to
complete.
Similar to #8109.
This was once implemented by the RUB but as it stands right now, no
chunk implements this anymore.
If we ever want to bring this back, we should use the output of
`QueryChunk::data` instead (i.e. use a data-based implementation instead
of a per-chunk one).
Closes#8096.
* refactor: convert projection mask earlier
* refactor: bundle projection schema calculation
Same as #8102 but for the projected schema. This now has a nice side
effect:
1. there is no longer a per chunk cache lookup
2. there is no longer ANY per chunk async computation
3. we no longer need an early pruning stage for the chunks (we've used
to do that so we can throw away chunks before doing the more
expensive part of the chunk creation)
This nicely streamlines and simplifies the code.
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This interface was once specially implemented by the RUB. The only
actual implementation of it is within the querier that just forwards it
to a simple schema scan. Lift this semantic to `iox_query_influxrpc`
instead so all the chunks can use it.
If we ever want to optimize this again, we should use `QueryChunk::data`
instead (i.e. instead of implementing it within the chunk it should use
the data method and do something smart based on that).
First half of #8096.
* feat: metrics for main tokio runtime
* feat: instrument executor tokio runtime
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Do not (ab)use per-chunk delete predicates for the retention policy.
Instead use a per-table predicate.
This makes the code way cleaner, since the scoping is correct (i.e.
delete predicates are a table-wide attribute, not a chunk-based one) and
it is consistent time predicates that the user providers (e.g. via
`WHERE time > x`).
It also allows us to remove delete predicates (in their current,
non-scalable form) from the query path. A potential future version would
likely not use per chunk predicates (and "is processed" markers) but use
the timestamp / chunk order to determine to which data the predicate
should be applied.
Note that the lowering of the retention policy changed slightly from
```text
(time > (now() - retention)) AND (time < MAX)
```
to
```text
time > (now() - retention)
```
Since the `MAX` cut is just an artifact of the lowering and was unnecessary.
Closes#7409.
Closes#7410.
* test: add regression test for high number of partition cache accesses
* refactor: bundle partition cache requests
Instead of accessing the partition cache for every single ingester
partition and parquet file, just collect all the partitions first and
request every partition only ones. Since the cache system needs to do
some locking and some bookkeeping (e.g. for LRU), this alone should be a
minimal perf win (the cache is quite efficient, so this might not be
measurable). However it also enables batching for catalog requests in
the future, see #8089.
* fix: typo
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Add the DERIVATIVE and NON_NEGATIVE_DERIVATIVE functions to influxql.
These are used to calculate derivatives over arbitrary time units.
The implementation is modeled after the DIFFERENCE and
NON_NEGATIVE_DIFFERENCE functions, with a difference that the unit
parameters is a configuration of the user-defined aggregator function
and therefore there cannot be a single shared definition of the
function.
The NON_NEGATIVE_DIFFERENCE function implementation has been
refactored to be an arbitrary NON_NEGATIVE wrapper for any Accumulator
function.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: provide convenience methods to create Scheduler, and keep the scheduler implementations crate private. External crates can only create a Scheduler based upon configs.
* feat: provide Scheduler as a component to compactor. Specifically, the scheduler configs are present within the compactor run config, and the scheduler in created within the compactor hardcoded components.
* feat: within the compactor ScheduledPartitionsSource, utilize the dyn Scheduler and Scheduler.get_jobs()
* feat: CompactionJob should be per partition, and have a uniqueness characteristic independent of the partition
* feat: keep compactor_scheduler separate from clap_blocks. Only interface is within ioxd_compactor where the CLI configs are transformed into ShardConfig and PartitionsSourceConfig.
* chore: make IdOnlyPartitionFilter into only pub(crate)
* chore: update scheduler display to include any report information (a.k.a. shard_config, if present)
For #8089 I would like to request each partition only once. Since
internally we store both the sort key and the column ranges in one cache
value anyways, there is no reason to offer two different methods to look
them up.
This only changes the `PartitionCache` interface. The actual lookups are
still separate, but will be changed in a follow-up.
We need to decode the ingester data in a serial fashion (since it is a
data stream). Cache access during that phase is costly since we cannot
parallize that. To avoid that, we gather the column ranges AFTER
decoding and calculate the chunk statistics accordingly.
This refactoring also removes the partition sort key from ingester
partitions since they are not required anymore. They are a leftover of
the old physical query planning. They were not marked as "unused" since
they were used by some test code.
Required for #8089.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
As we've learned in #8048 and #8052, rustflags do NOT stack. Since we
only want to change one specific parameter (the debug feature), use the
env variable that cargo provides us.
**In contrast to the linked PRs, this only changes the test excution. Prod
builds remain untouched.**
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
The circuit breaker needs to act on concurrent requests to the same
ingester. To do that, it performs the following steps per request:
1. check current circuit state (if open, then exit here)
2. perform request (if closed or as a half-open test request)
3. change circuit state based on results
Now only step 1 and step 3 hold locks to allow concurrency. This means
that in the meantime, the circuit state might change. To check that, the
circuit state has a generation counter.
The bug now was an overly strong assumption on the generation counter /
state change. Namely that if we are in step 3 and the state is
"half-open", then nobody else could have changed the state in the
meantime because for a single ingester, there can only be one test
request for the half-open state. While the latter part of this is
correct, the former is wrong. Namely we could have started in step 1
with a closed circuit and ended in a half-open one. Namely if the
following sequence happen:
1. request, blocks on upstream
2. circuit breaks
3. some time passes
4. a half-open requests starts, blocks on upstream
5. request from step 1 returns, finds itself confused
This now fixes the assertion (both in case that the request from step 1
succeeds and fails).
Includes tests for the two scenarios (`test_late_failure_after_half_open`,
`test_late_ok_after_half_open`) and an additional one that I came up with
while thinking about the issue (`test_late_failure_after_recovery`, was
passing on `main` but still good to have).
Fixes#8065.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Moved TABLE2_ID and TABLE2_NAME to the top of the test module, even
though TABLE2_NAME is only used in one spot, to encourage use of the
constants if new tests are added to this file that need a table that's
different from the arbitrary table.
Replaced all occurrences of TableId::new(1234) with TABLE2_ID even
though TABLE2_ID is 1234321; the exact value doesn't matter, the
important property is that it does not equal ARBITRARY_TABLE_ID (which
is 4).
Replace:
- PartitionId::new(0) with ARBITRARY_PARTITION_ID (which is actually 1)
- PartitionId::new(1) with PARTITION2_ID (actually 2)
- PartitionId::new(2) with PARTITION3_ID (actually 3)
So while adding one is a bit confusing in this diff, in the long run,
this will make the test more understandable and consistent with other
tests.
So that when I change the type of PartitionIds to TransitionPartitionId,
I don't have to update all these places that just need an arbitrary
partition ID or related values.
These test constants probably didn't exist when these tests
were created.