* chore(iox/compactor): add test cases tracking max_l0_created_at
* chore(iox/compactor): add invariant check to simulator for misordered max_l0_created_at
* chore(iox/compactor): test updates for max_l0_created_at
This commit updates the newly added tests to comply with the new invariant check.
Several test cases are deleted because the changes to comply with the invariant make the permutations pointless.
* chore(iox/simulator): Add test cases hitting new invariant check for illegal max_l0_created_at
* chore: update previous tests to comply with new invariant
* chore: address comments
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Making this clone more explicit, higher up. Deleting the old files from
the catalog might only need the IDs, and the compaction levels for the
metrics, but that can be later.
And define the conditions for that decision on a type responsible for
the data that goes into that condition.
Also start moving away from methods that only work on one variant of
FilesToSplitOrCompact, and add a state for when FilesToSplitOrCompact
doesn't actually contain any files.
I can't ever remember whether it's "compact or split" or "split or
compact", so now I think it's always "split or compact".
Also remove "FilesTo" from the enum variants because "FilesTo" is in the
overall enum name.
The filter in this spot could potentially do whatever, but the important
part of what should go in this field is that it will be called on the
files after they're classified by the file classifier.
* test: make max_l0_created_at more sense in the core tests
* chore: fix typo
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: set max_l0_created_at to reasonable values for the tests and also verify it using both test layout and catalog function
* fix: typo
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
If a partition takes longer than `partition_timeout` to compact, but it
did make _some_ progress, let the compactor try that partition again at
a later time so that compaction for the partition will eventually
complete.
If a partition times out and _no_ progress has been made, then still add
it to the skipped_compactions table because it's either too big to ever
compact or is otherwise stuck.
Closesinfluxdata/idpe#17234.
* test: common comapctor use cases
* test: add 3 L0 files during last comapction
* chore: clearer comments
* test: add intermediate test results per review request
* chore: comment only change to trigger circle CI
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* test: add test with less ingested data and fix output after main merge
* chore: run format after pulling suggestions
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
The assert_counter! and assert_histogram! macros use items in the metric
crate, but the macros can be called from other crates/modules that may
not have those items in scope.
* fix: Remove the max_compact_size knob and hardcode a multiple
Rather than panic if the user hasn't set this knob in a particular way,
set the max_compact_size to the minimum value we need by multiplying
max_desired_file_size_bytes by MIN_COMPACT_SIZE_MULTIPLE.
Fixesinfluxdata/idpe#17259.
* refactor: Move computation of max_compact_size_bytes into compactor config
* test: change test setups to reflect the purposes of the tests
---------
Co-authored-by: NGA-TRAN <nga-tran@live.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: add a panic limit_files_to_compact function to it is used in the right way
* test: provide correct output to the tests
* chore: remove no-longer valid comments
* feat: have the function limit_files_to_compact to also return files_to_further_split if the minimum set to compact is too large to do so
* refactor: rename files_to_split to start_level_files_to_split
* refactor: rename identify_files_to_split to identify_start_level_files_to_split before adding new split function
* feat: split 2 files of minimum set of compacting files if they are over max compact size
* test: since now we may split files in different levels, let us remove the missleading at level from the simulation tests
* chore: clearer comments
* test: add tests for tiny time ranges
* chore: address review comments
* test: add a panic limit_files_to_compact function to it is used in the right way
* test: provide correct output to the tests
* chore: Apply suggestions from code review
Co-authored-by: Joe-Blount <73478756+Joe-Blount@users.noreply.github.com>
---------
Co-authored-by: Joe-Blount <73478756+Joe-Blount@users.noreply.github.com>
* test: very large input compacting files
* chore: fix comments
* chore: add description for each test
* chore: commit to trigger CI tests to run again
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Nothing was pushing into the VecDeque; can just sort and iterate over
the collection as a Vec.
In `identify_files_to_split`, the collection should be empty after the
`for` loop, so it doesn't need to be pushed into the result.
Seeing a crate prefix in some places in a scope where it doesn't appear
in other places makes me think the prefix is disambiguating with a
different type; but these are all the same type.
* feat: split start-level files that overlap wiht many files
* test: split files and theit split times
* test: split test for L1 and L2 files
* feat: full implementation that support large-size overlapped files
* chore: modify comments to reflect the changes
* fix: typo
* chore: update test output
* docs: clearer comments
* chore: remove empty test files. Will add in in a separate PR
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* chore: address review comments
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* refactor: add a knob to turn large-size overlaps on and off
* fix: typo
* chore: update test output after merging main
* fix: split_times should not include the max_time of the file
* fix: fix an overlap bug while limitting number of files to compact
* test: unit tests for different overlap cases of limit files to compact
* chore: increase time range of the tests to let the split files work correctly
* fix: skip compacting files of tiny ranges
* test: add tests for time range 1
* chore: address review comments
* chore: remove enable_large_size_overlap_files knob
* fix: fix a bug that sort L1 files in thier min_time instead of l0_max_created_at
* refactor: use the same order_files function afer merging main into branch
* chore: typos and clearer comments
* chore: remove obsolete comments
* chore: add asserts per review suggestion
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* feat(compactor2): Verify invariants for compactor2 always
* fix: update tests
* fix: update actual time range and test output
---------
Co-authored-by: NGA-TRAN <nga-tran@live.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: L1 files must be sorted in their min_time if they need to split before compacting
* chore: clearer comments
* chore: Apply suggestions from code review
Co-authored-by: Joe-Blount <73478756+Joe-Blount@users.noreply.github.com>
* chore: run fmt after applying review suggestions
---------
Co-authored-by: Joe-Blount <73478756+Joe-Blount@users.noreply.github.com>
* test(compactor): Add test for large amounts of data with a single timestamp
* fix: Update compactor2/tests/layouts/single_timestamp.rs
Co-authored-by: Joe-Blount <73478756+Joe-Blount@users.noreply.github.com>
---------
Co-authored-by: Joe-Blount <73478756+Joe-Blount@users.noreply.github.com>
* fix(compactor2): fix off by one error in time ranges of simulator
* chore: update a test that were added recently and this PR fixes it
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Co-authored-by: NGA-TRAN <nga-tran@live.com>
* refactor: rename files and function to remove tartget level
* chore: update a comment
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>