* feat: initial implementation of memory estimation for a compaction
* feat: estimate size of files and have the right actions for the needed budget
* feat: run candidates in parallel
* fix: have the right name for the column field of the output struct
* feat: add metrics for estimated budgets
* chore: cleanup
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: fix syntax after applying review's suggestions
* refactor: Convert a Vec to VecDeque to go well with pop and push
* chore: remove max_concurrent_size_bytes and input_size_threshold_bytes
* chore: remove input_file_count_threshold
* test: tests for estimate_arrow_bytes_for_file
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Changes the compactor code to tolerate a SplitExec yielding an empty
partition (with no rows).
This raises a WARN as the situation in which this is acceptable is very
rare, and is more likely indicative of an opportunity to improve the
SplitExec usage (i.e. pruning out unnecessary split points).
Previously when attempting to serialise a stream of one or more
RecordBatch containing no rows (resulting in an empty file), the parquet
serialisation code would panic.
This changes the code path to raise an error instead, to support the
compactor making multiple splits at once, which may overlap a single
chunk:
────────────── Time ────────────▶
│ │
┌█████──────────────────────█████┐
│█████ │ Chunk 1 │ █████│
└█████──────────────────────█████┘
│ │
│ │
Split T1 Split T2
In the example above, the chunk has an unusual distribution of write
timestamps over the time range it covers, with all data having a
timestamp before T1, or after T2. When a running a SplitExec to slice
this chunk at T1 and T2, the middle of the resulting 3 subsets will
contain no rows. Because we store only the min/max timestamps in the
chunk statistics, it is unfortunately impossible to prune one of these
split points from the plan ahead of time.
1. Cache converted schema instead of catalog schema. This safes a buch
of memcopies during conversion.
2. Simplify creation of new chunks, we now only need a `CachedTable`
instead of a namespace and a table schema.
In an artificial benchmark, this removed around 10ms from the query
(although that was prior to #5467 which moved schema conversion one
level up). Still I think it is the cleaner cache design.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: use a single timestamp in policy backend
Prior to this PR we had at least 1 `TimeProvider::now` calls per GET
request (for caches that only used LRU) and up to 3 calls (caches with
LRU + refresh + TTL). Let's instead use a single timestamp that is
created by the policy backend itself (instead of the policies). This has
the following consequences:
- **efficiency:** `SystemProvider::now` is not free, even though under Linux
this doesn't result in a syscall, it uses the stdlib time system which
also checks for monotonicity
- **consistency:** All changes for a single trigger (e.g. a
GET cache call) now use a single timestamp instead of slightly
increasing ones. I argue this is the better semantic, simpler to
understand and better to debug.
For some (slightly artificial) local performance experiment, this shaves
off around 2ms per single-table SQL query. However I expect that there might
be more degenerated cases (e.g. multi-table SQL queries or some
InfluxRPC requests that hit multiple tables).
The majority of this patch is moving the `TimeProvider` from the
policies into the policy backend.
* docs: explain `now` parameter
* chore: Update datafusion pin
* chore: Update now that user is a reserved word
* chore: Update cargo.lock
* fix: update query for user function
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: hoist repeated computation out of chunk creation
We have hundreds of chunks per table, so it is beneficial to only
do common work once.
* chore: remove TableCache as it is no longer used
* fix: prune chunks both before and after metadata fetch
Fetching the metadata for all the chunks in a table is expensive,
especially when we have a narrow time range query that only
needs a few chunks.
* chore: fix clippy
* fix: fix up some last tests
* fix: review comments
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Added /usr/local/rustup to the list of directories cached during build. This is
where rustup installs the toolchain, so we save the download and install on
every build.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
We currently only use the human-readable version string for the CLI
help, but for #5464 I want to use the GIT hash and a process-time UUID.
This is the prep work for that.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Adds instrumentation to the low-level (post-aggregation) Kafka client,
capturing the uncompressed, approximate message size (calculated as the
sum of all Record::approximate_size() returns, ignoring largely static
framing overhead).