* feat: initial implementation of the split
* feat: split many L0 files in groups and compact them into new and fewer L0 files
* test: remove iappropriate AllAtOnce test
* refactor: move file classification for initial target to its own function
* fix: pop the branch from start to end
* chore: address review comments
* feat: support splitting to many L1 files
* feat: only add extra round to compact level-n files to same level-n files if their files plus overlapped level-n-plus-1 over limit
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* chore: final cleanup and address comments
* chore: run fmt
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: Split layout tests into their own module
* feat: Add more tests, improve sizes to simulator run display more
* fix: Apply suggestions from code review
Co-authored-by: Nga Tran <nga-tran@live.com>
* fix: fix comment wording
* fix: reporting order of skipped compactions
* chore: Run cargo hakari tasks
* fix: revert changes to Cargo.lock
* fix: revert workspace hack change
---------
Co-authored-by: Nga Tran <nga-tran@live.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
* refactor: move ParquetFileSimulator to compactor2_test_utils
* chore: Test with new algorithm + update display
* chore: Updates
* chore: Update setting to match prod
* refactor: extract `FileClassifer` component
Make the driver slightly smaller. Also makes the "all-in-one" mode
easier to understand.
* docs: add some
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: extract compactor2 test utils into `compactor2_test_utils` and integration test
* fix: Update compactor2/src/components/mod.rs
Co-authored-by: Marco Neumann <marco@crepererum.net>
---------
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This commit adds initial support for "soft" namespace deletion, where
the actual records & data remain, but are no longer queryable /
writeable.
Soft deletion is eventually consistent - users can expect to continue
writing to and reading from a bucket after issuing a soft delete call,
until the various components either restart, or have their caches
flushed.
The components treat soft-deleted namespaces differently:
* router: ignore soft deleted namespaces
* ingester: accept soft deleted namespaces
* compactor: accept soft deleted namespaces
* querier: ignore soft deleted namespaces
* various gRPC services: ignore soft deleted namespaces
This ensures that the ingester & compactor do not see rows "vanishing"
from the database, and continue to make forward progress.
Writes for the deleted namespace that are buffered in the ingester will
be persisted as normal, allowing us to support "un-delete" operations
where the system is restored to a the state at which the delete was
issued (rather than loosing the buffered data).
Follow-on work is required to ensure GC drops the orphaned parquet files
after the configured GC time, and optimisations such as not compacting
parquet from soft-deleted namespaces seems like a trivial win.
* refactor: `PartitionInfoSource`
Clean up the driver code a bit. There is certainly a good point in
having all these three sources (partition, table, namespace) separate,
but the driver doesn't really need to know that. In the end, it just
wants to have a `PartitionInfo` instance.
* docs: typo
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* refactor: introduce IR before creating actual DF plan
Let's have an IR that presents a machine-readable form of how output
files may look like.
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* feat: also log plan type
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* test: allow testing the compactor w/o any real data
Things that are missing:
- output files have nondeterministic IDs which interferes w/ snapshot
testing. We should probably normalize the IDs somehow.
- time ranges of output files are not captured correctly (because the
mock sink doesn't know how to calculate them)
* fix: Add output assertion
* fix: fmt
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* fix: fmt
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* feat: `PartitionRepo::list_ids`
* refactor: `CatalogPartitionsSource` => `CatalogToCompactPartitionsSource`
* feat: allow the compactor to process all known partitions
Closes#6648.
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>