Carol (Nichols || Goulding)
24e6248706
fix: Update a compactor snapshot
2023-04-24 10:08:00 -04:00
kodiakhq[bot]
45be11bd2c
Merge branch 'main' into cn/cold-vs-hot-cli
2023-04-19 16:33:54 +00:00
Carol (Nichols || Goulding)
3604f141dc
test: Add a unit test for logging that uses cold compaction
2023-04-19 12:33:00 -04:00
Carol (Nichols || Goulding)
31043811d9
feat: Log cold compaction selection
2023-04-14 17:50:51 -04:00
Carol (Nichols || Goulding)
bb02e1ce1b
feat: Don't actually compact anything if you're running in cold compaction mode
2023-04-14 17:33:05 -04:00
Carol (Nichols || Goulding)
9350b64314
feat: Add a CompactionType in compactor2::config as well as clap blocks
...
It's a little weird to have such similar types and have to convert them,
but doing this prevents too many crates from having to depend on/know
about each other.
2023-04-14 17:33:05 -04:00
Carol (Nichols || Goulding)
b4bad29357
docs: Wrap comments at 100 cols
2023-04-14 17:33:05 -04:00
Carol (Nichols || Goulding)
565a9c454d
refactor: Extract a function for creating the PartitionsSourceConfig
...
And then add some unit tests for that function. It's getting a smidge
complicated.
2023-04-14 17:33:05 -04:00
Carol (Nichols || Goulding)
76d155fe89
feat: Configuration for hot vs cold thresholds
...
This creates a separate option for the number of minutes *without* a
write that a partition must have before being considered for cold
compaction.
This is a new CLI flag so that it can have a different default from hot
compaction's compaction_partition_minute_threshold.
I didn't add "hot" to compaction_partition_minute_threshold's name so
that k8s-idpe doesn't have to change to continue running hot compaction
as it is today.
Then use the relevant threshold earlier, when creating the
PartitionsSourceConfig, to make it clearer which threshold is used
where.
Right now, this will silently ignore any CLI flag specified that isn't
relevant to the current compaction mode. We might want to change that to
warn or error to save debugging time in the future.
2023-04-14 17:33:05 -04:00
Carol (Nichols || Goulding)
5e6dbec909
fix: Remove tombstones as they aren't functional currently
2023-04-14 13:36:08 -04:00
Joe-Blount
7dd221aee0
chore: add logging around compaction job semaphore ( #7523 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-12 15:47:02 +00:00
Carol (Nichols || Goulding)
3199d65c2f
feat: Add the ability to specify a max threshold duration on CatalogToCompactPartitionsSource
2023-04-12 11:08:51 -04:00
Carol (Nichols || Goulding)
a244e5b078
test: Add some tests for CatalogToCompactPartitionsSource's existing behavior
2023-04-12 11:07:43 -04:00
Joe-Blount
f05be907cb
chore: increasing concurrency a little more ( #7510 )
...
* chore: increasing concurrency a little more
This raises the threshold for single threading compactions to 100 column partitions. With the non-linear scaling, 70 column partitions would take 49% of the concurrency limit (allowing only 2 of such sized partitions to compact concurrently). Anything over 70 can only compact with something smaller than itself.
I'm gradually walking these up, partly to avoid causing OOMs in prod, and partly because I want to get a feel for how reactive the average concurrency is to these changes.
* chore: fix comment typo
2023-04-11 22:04:44 +00:00
Joe-Blount
980589504d
chore: make compactor concurrency scale non-linearly ( #7509 )
...
* chore: make compactor concurrency scale non-linearly
* chore: rust formatter making the test cases harder to read
2023-04-11 20:10:20 +00:00
Andrew Lamb
8b25a3a64c
chore: Remove unused dependencies, found by cargo-machete ( #7491 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-11 08:32:32 +00:00
Joe-Blount
ef62f439b8
feat: scale compactor concurrency based on table column count ( #7492 )
...
* feat: scale compactor concurrency based on table column count
* chore: address review comments
2023-04-10 21:44:29 +00:00
Phil Bracikowski
a99da831e1
Merge branch 'main' into jrb_27_adjust_split_time_selection
2023-04-06 12:07:07 -07:00
Marco Neumann
5f43f2a719
refactor: remove old query planning code ( #7449 )
...
Closes #7406 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-06 16:05:08 +00:00
Joe-Blount
dcdf253a7a
chore: insta updates for split count adjustment
2023-04-06 09:37:05 -05:00
Joe-Blount
abde5aa543
chore: adjust how many splits applied in backlogged cases
2023-04-06 09:36:37 -05:00
Joe-Blount
e4b4f79c6b
feat: expand 'vertical splitting' to improve compaction efficiency ( #7450 )
...
* feat: expand vertical splitting and coordinate compactions
* chore: insta updates for prior commit
* chore: pr review nits
2023-04-05 21:02:16 +00:00
Joe-Blount
e27a8e815d
chore: add tracking of bytes written in simulator ( #7445 )
...
* chore: add tracking of bytes written in simulator; display in final output header
* chore: insta output churn corresponding to tracking bytes written
* chore: address comment
2023-04-04 20:58:50 +00:00
Joe-Blount
80a91142b5
Merge branch 'main' into jrb_24_backlogged_test_case
2023-04-04 11:21:54 -05:00
dependabot[bot]
66982f988b
chore(deps): Bump object_store from 0.5.5 to 0.5.6 ( #7433 )
...
Bumps [object_store](https://github.com/apache/arrow-rs ) from 0.5.5 to 0.5.6.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md )
- [Commits](https://github.com/apache/arrow-rs/commits )
---
updated-dependencies:
- dependency-name: object_store
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-04 08:43:34 +00:00
Joe-Blount
cd708183db
chore: update large backfill case to suppress output of runs
2023-04-03 15:02:36 -05:00
Joe-Blount
48407e78da
chore: add option to simulator to optionally suppress the output of compaction runs
2023-04-03 15:02:04 -05:00
Joe-Blount
711ccc153e
fix: address panic with single L0 overlapping multiple L1s
2023-04-03 14:09:55 -05:00
Carol (Nichols || Goulding)
90d07412ff
refactor: Extract functions for the different purposes of partition filters
2023-03-31 12:53:41 -04:00
Carol (Nichols || Goulding)
9a27736c65
docs: Fix some typos
2023-03-31 12:44:12 -04:00
Carol (Nichols || Goulding)
a32d536262
refactor: Extract a function to make the post-classification filters
2023-03-31 12:36:26 -04:00
Carol (Nichols || Goulding)
86dbd5c529
refactor: Extract function for creating the file classifier
2023-03-31 12:36:26 -04:00
Carol (Nichols || Goulding)
63d45532fb
refactor: Extract function for making the parquet files sink
2023-03-31 12:36:26 -04:00
Carol (Nichols || Goulding)
eef943ceec
refactor: Extract function for making the scratchpad gen
2023-03-31 12:36:26 -04:00
Carol (Nichols || Goulding)
fe0d3c17fd
refactor: Extract function for creating the df plan exec
2023-03-31 12:36:26 -04:00
Carol (Nichols || Goulding)
07c2c768e9
refactor: Extract a function for creating the df planner
2023-03-31 12:36:25 -04:00
Carol (Nichols || Goulding)
7bbf0fcd79
refactor: Import all components from super, not crate
2023-03-31 12:36:25 -04:00
Carol (Nichols || Goulding)
7d2d9dd6b7
refactor: Extract a function for creating the IR planner
2023-03-31 12:36:25 -04:00
Carol (Nichols || Goulding)
d7fe50b7ed
refactor: Move logging and metrics of the commit component into where it's created
2023-03-31 12:36:25 -04:00
Carol (Nichols || Goulding)
821ad7f38c
refactor: Move logging and metrics into where the rest of the partition done sink is created
2023-03-31 12:36:25 -04:00
Carol (Nichols || Goulding)
682ed14b9e
refactor: Extract function for creating the round info source
2023-03-31 12:36:25 -04:00
Carol (Nichols || Goulding)
3ce062fd2e
refactor: Extract function for creating partition files source
2023-03-31 12:36:25 -04:00
Carol (Nichols || Goulding)
338ca030ab
refactor: Extract function for creating the partition info source
2023-03-31 12:36:25 -04:00
Carol (Nichols || Goulding)
b5f233f037
refactor: Move all partition filter creation into the function for that purpose
2023-03-31 12:36:24 -04:00
Carol (Nichols || Goulding)
b9727d2e17
refactor: Extract a function for creating partitions source, commit, and done sink
2023-03-31 12:36:24 -04:00
Carol (Nichols || Goulding)
b7b15dff26
refactor: Extract function for making the partition stream
...
Trying to make the inputs and outputs more clear.
2023-03-31 12:36:24 -04:00
Carol (Nichols || Goulding)
c51ec1cc9a
docs: Clean up typos and line wrapping
...
Found while reading.
2023-03-31 12:36:24 -04:00
Carol (Nichols || Goulding)
e4d5c777d9
feat: Make catalog method not specific to compacting and take optional end time
2023-03-31 12:36:24 -04:00
Carol (Nichols || Goulding)
5afb9ccb73
fix: Remove TODO comment that is now done
2023-03-31 12:36:24 -04:00
kodiakhq[bot]
a1389e5962
Merge branch 'main' into cn/redo
2023-03-30 20:27:25 +00:00