* fix: Remove the max_compact_size knob and hardcode a multiple
Rather than panic if the user hasn't set this knob in a particular way,
set the max_compact_size to the minimum value we need by multiplying
max_desired_file_size_bytes by MIN_COMPACT_SIZE_MULTIPLE.
Fixesinfluxdata/idpe#17259.
* refactor: Move computation of max_compact_size_bytes into compactor config
* test: change test setups to reflect the purposes of the tests
---------
Co-authored-by: NGA-TRAN <nga-tran@live.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: initial implementation of the split
* feat: split many L0 files in groups and compact them into new and fewer L0 files
* test: remove iappropriate AllAtOnce test
* refactor: move file classification for initial target to its own function
* fix: pop the branch from start to end
* chore: address review comments
* feat: support splitting to many L1 files
* feat: only add extra round to compact level-n files to same level-n files if their files plus overlapped level-n-plus-1 over limit
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* chore: final cleanup and address comments
* chore: run fmt
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: move ParquetFileSimulator to compactor2_test_utils
* chore: Test with new algorithm + update display
* chore: Updates
* chore: Update setting to match prod
* test: allow testing the compactor w/o any real data
Things that are missing:
- output files have nondeterministic IDs which interferes w/ snapshot
testing. We should probably normalize the IDs somehow.
- time ranges of output files are not captured correctly (because the
mock sink doesn't know how to calculate them)
* fix: Add output assertion
* fix: fmt
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* fix: fmt
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* feat: `PartitionRepo::list_ids`
* refactor: `CatalogPartitionsSource` => `CatalogToCompactPartitionsSource`
* feat: allow the compactor to process all known partitions
Closes#6648.
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Instead of looping and polling a fresh set of partitions and
constructing a stream from that, use an endless stream instead. This
helps w/ efficiency during roll-overs since we can already start to
process the next set of partitions while the last ones from the previous
round are still in-progress.
Closes#6750.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: partition filters for TargetLevel version and a complete test
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* chore: run fmt after applying review suggestions in git
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: rename compact algo versions to reflect thier actual work
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
I'm not saying we have to use this, but this is a demonstration how easy
it would be to add sharding to the compaction tier and also acts as a
"backup / insurance" if we ever need it.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Add some rough "partition is too big" filter for now until we can deal
with them (the framework allows that but we need to set up the proper
divide-and-conquer components).
This will hopefully prevent our prod compactor from dying that often.
Note that this is also duct-tape around two issues:
- DataFusion not accounting in-flight data all the time
- Our wide fan-out query plans (see https://github.com/influxdata/idpe/issues/16768#issuecomment-1387056833 )
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: introduce scratchpad store for compactor
Use an intermediate in-memory store (can be a disk later if we want) to
stage all inputs and outputs of the compaction. The reasons are:
- **fewer IO ops:** DataFusion's streaming IO requires slightly more
IO requests (at least 2 per file) due to the way it is optimized to
read as little as possible. It first reads the metadata and then
decides which content to fetch. In the compaction case this is (esp.
w/o delete predicates) EVERYTHING. So in contrast to the querier,
there is no advantage of this approach. In contrary this easily adds
100ms latency to every single input file.
- **less traffic:** For divide&conquer partitions (i.e. when we need to
run multiple compaction steps to deal with them) it is kinda pointless
to upload an intermediate result just to download it again. The
scratchpad avoids that.
- **higher throughput:** We want to limit the number of concurrent
DataFusion jobs because we don't wanna blow up the whole process by
having too much in-flight arrow data at the same time. However while
we perform the actual computation, we were waiting for object store
IO. This was limiting our throughput substantially.
- **shadow mode:** De-coupling the stores in this way makes it easier to
implement #6645.
Note that we assume here that the input parquet files are WAY SMALLER
than the uncompressed Arrow data during compaction itself.
Closes#6650.
* fix: panic on shutdown
* refactor: remove shadow scratchpad (for now)
* refactor: make scratchpad safe to use
It seems that prod was hanging last night. This is pretty hard to debug
and in general we should protect the compactor against hanging /
malformed partitions that take forever. This is similar to the fact that
the querier also has a timeout for every query. Let's see if this shows
anything in prod (and if not it's still a desired safety net).
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: address review comment of previous PR
* refactor: execute compact plan
* refactor: we will now compact all L0 and L1 files of a partition and split them as needed
* chore: comnents
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Sets up crate and wires up the main binary. No tests yet, no algorithm
framework, just the bare minimum.
Also I decided to not offer a gRPC server in `compactor2` at the moment
and hence did not implement any handle/delegate infrastructure. We add
this later if we need it. This also means compactor2 does NOT provide a
catalog service for now.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>