For #8089 I would like to request each partition only once. Since
internally we store both the sort key and the column ranges in one cache
value anyways, there is no reason to offer two different methods to look
them up.
This only changes the `PartitionCache` interface. The actual lookups are
still separate, but will be changed in a follow-up.
We need to decode the ingester data in a serial fashion (since it is a
data stream). Cache access during that phase is costly since we cannot
parallize that. To avoid that, we gather the column ranges AFTER
decoding and calculate the chunk statistics accordingly.
This refactoring also removes the partition sort key from ingester
partitions since they are not required anymore. They are a leftover of
the old physical query planning. They were not marked as "unused" since
they were used by some test code.
Required for #8089.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
As we've learned in #8048 and #8052, rustflags do NOT stack. Since we
only want to change one specific parameter (the debug feature), use the
env variable that cargo provides us.
**In contrast to the linked PRs, this only changes the test excution. Prod
builds remain untouched.**
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
The circuit breaker needs to act on concurrent requests to the same
ingester. To do that, it performs the following steps per request:
1. check current circuit state (if open, then exit here)
2. perform request (if closed or as a half-open test request)
3. change circuit state based on results
Now only step 1 and step 3 hold locks to allow concurrency. This means
that in the meantime, the circuit state might change. To check that, the
circuit state has a generation counter.
The bug now was an overly strong assumption on the generation counter /
state change. Namely that if we are in step 3 and the state is
"half-open", then nobody else could have changed the state in the
meantime because for a single ingester, there can only be one test
request for the half-open state. While the latter part of this is
correct, the former is wrong. Namely we could have started in step 1
with a closed circuit and ended in a half-open one. Namely if the
following sequence happen:
1. request, blocks on upstream
2. circuit breaks
3. some time passes
4. a half-open requests starts, blocks on upstream
5. request from step 1 returns, finds itself confused
This now fixes the assertion (both in case that the request from step 1
succeeds and fails).
Includes tests for the two scenarios (`test_late_failure_after_half_open`,
`test_late_ok_after_half_open`) and an additional one that I came up with
while thinking about the issue (`test_late_failure_after_recovery`, was
passing on `main` but still good to have).
Fixes#8065.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
We already have a method that adds some default metrics / instruments to
the metric registry. Use that for jemalloc as well. This makes it easier
to follow how metrics are setup up for our prod binary.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: adjust with_max_num_files_per_plan to more common setting
This significantly increases write amplification (see change in `written` at the conclusion of the cases)
* fix: compactor looping with unproductive compactions
* chore: formatting cleanup
* chore: fix typo in comment
* chore: add test case that compacts too many files at once
* fix: enforce max file count for compaction
* chore: insta churn from prior commit
---------
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: ensure that selectors check arg count
* feat: basic non-aggregates w/ InfluxQL selector functions
See #7533.
* refactor: clean up code
* feat: get more advanced cases to work
* docs: remove stale comments
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Use a protobuf "Any" to wrap the "ReadInfo" message in a DoGet
ticket. This will make it easier to extend in the future different
ticket types, as appropriate. It also makes the comment speak the
truth.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This is purely a movement of code, and not any definition of the interface methods yet. At best, it further solidifying the boundary of what partitions_source implementations are within the scheduler -- versus within the compactor.
I figured out that the reason inserting `Option<PartitionHashId>` was
giving me a compiler error that `Encode` wasn't implemented was because
I only implemented `Encode` for `&PartitionHashId` and sqlx only
implements `Encode` for `Option<T: Encode>`, not `Option<T> where &T:
Encode`. Using `as_ref` makes this work and gets rid of the `match` that
created two different queries (one of which was wrong!)
Also add tests that we can insert Parquet file records for partitions
that don't have hash IDs to ensure we don't break ingest of new data for
old-style partitions.
This will hold the deterministic ID for partitions.
Until all existing partitions have this value, this is optional/nullable.
The row ID still exists and is used as the main foreign key in the
parquet_file and skipped_compaction tables.
The hash_id has a unique index so that we can look up records based on
it (if it's available).
If the parquet file record has a partition_hash_id value, use that to
generate the object storage path instead of the partition_id.
Apparently rustflag configs don't stack, so we need to re-specify the
whole list.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
- move `tokio_unstable` to cargo config, so all we can use it within
our code (e.g. for #7982)
- disable incremental builds for prod docker builds. this was tried
before but got lost at some point because build params weren't passed
to docker correclty
- fix `CARGO_NET_GIT_FETCH_WITH_CLI` for docker builds (env wasn't
passed through)
The reconciler is a leftover from the Kafka-based write path. It doesn't
do anything anymore.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>