* feat: add CompactRanges RoundInfo type
* chore: insta test updates for adding CompactRange
* feat: simplify/improve ManySmallFiles logic, now that its problem set is simpler
* chore: insta test updates for ManySmallFiles improvement
* chore: upgrade files more aggressively
* chore: insta updates from more aggressive file upgrades
* chore: addressing review comments
* feat: teach compactor to use sort_key_ids instead of sort_key
* test: update the test output after chatting with Joe and know the reason of the chnanges
* chore: test changes and additions in preparation for functional changes
* feat: move vertical splitting to RoundInfo calculation, align splits to L1 files
* chore: insta test churn
* feat: detect non-linear data distribution in vertical splitting
* chore: add tests for non-linear data distribution
* chore: insta churn
* chore: cleanup & comment additions
* chore: some variable renaming
* feat: add tracking of why bytes are written in simulator
* chore: enable breakdown of why bytes are written in a few larger tests
* chore: enable writes breakdown in another test
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* catalog.get_in_skipped_compaction() should handle for multiple partitions
* add the ability to perform transformation on sets of partitions (rather than filtering one by one). Start with the transformation to remove skipped partitions, in the scheduler.
* move the env var and cli flag setting, for when to ignore skipped partitions, to the scheduler config.
* fix: selectively merge L1 to L2 when L0s still exist
* fix: avoid grouping files that undo previous splits
* chore: add test case for new fixes
* chore: insta test churn
* chore: lint cleanup
When a long running query is in process and the querier is shutting
down, it might happen that the executor (= thread pool and tokio
executor responsible for the CPU-bound DataFusion execution) is shut
down while the query is running. From a "systems interaction" PoV I
think this is totally fine and I would like to avoid some weird
ref-counting. Or in other words: if the system is shutting down, shut it
down.
However the error was treated as "internal" which is not useful. The
client should rather be informed that its server was gone and that it is
OK (and desired) to retry. So as per
<https://grpc.github.io/grpc/core/md_doc_statuscodes.html> I think this
should signal "unavailable".
This change wires the error code in such a way that the gRPC service
layer can properly inspect it and then changes the error mapping.
Ref https://github.com/influxdata/idpe/issues/17917 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: adjust with_max_num_files_per_plan to more common setting
This significantly increases write amplification (see change in `written` at the conclusion of the cases)
* fix: compactor looping with unproductive compactions
* chore: formatting cleanup
* chore: fix typo in comment
* chore: add test case that compacts too many files at once
* fix: enforce max file count for compaction
* chore: insta churn from prior commit
---------
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: adjust with_max_num_files_per_plan to more common setting
This significantly increases write amplification (see change in `written` at the conclusion of the cases)
* fix: compactor looping with unproductive compactions
* chore: formatting cleanup
* chore: fix typo in comment
This adds 4 small test cases intending to test how compaction decisions made affect the final size of L1/L2 files.
The assumption is that when a steady stream of small L0 files is arriving, the compactor needs to be rewriting L1s so they grow to a reasonable size instead of getting left small.
Nothing gets the partition ID out of the metadata. The parts of the code
interacting with object storage that need the ID to create the object
store path were using the partition ID from the metadata out of
convenience, but I changed those places to pass in the partition ID in a
separate argument instead.
This will make the transition to deterministic partition IDs a bit
smoother.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>