Dom Dwyer
cd4087e00d
style: add no todo!() or dbg!() lints
...
Some crates had theme, some not - lets be consistent and have the
compiler spot dbg!() and todo!() macro calls - they should never be in
prod code!
2022-09-29 13:10:07 +02:00
Andrew Lamb
66dbb9541f
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0, `thrift` to 0.16.0 ( #5694 )
...
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0
* chore: Update thrift / remove parquet_format
* fix: Update APIs
* chore: Update lock + Run cargo hakari tasks
* fix: use patched version of arrow-rs to work around https://github.com/apache/arrow-rs/issues/2779
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-27 12:50:54 +00:00
Nga Tran
75ff805ee2
feat: instead of adding num_files and memory budget into the reason text column, let us create differnt columns for them. We will be able to filter them easily ( #5742 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-26 20:14:04 +00:00
Nga Tran
b11da1d98b
fix: a silly bug that did not capture file limit if a lot of L0 files and very few or non overlapped L1 ( #5736 )
2022-09-23 21:03:29 +00:00
Nga Tran
c4542d6b21
chore: more verbose about the memory budget inserted in to the catalog table skipped_comapction ( #5735 )
2022-09-23 18:40:09 +00:00
Nga Tran
bb7df22aa1
chore: always use a fixed number of rows (8192) per batch to estimate memory ( #5733 )
2022-09-23 15:51:25 +00:00
Nga Tran
da697815ff
chore: add more info about memory budget at the time of over-file-limit into skipped_compaction for us to see if we shoudl increase the file limit ( #5731 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-23 13:34:38 +00:00
Nga Tran
61075d57e2
chore: turn full cold compaction on ( #5728 )
2022-09-22 17:07:35 +00:00
Nga Tran
aaec5104d6
chore: turn compaction cold partition step 1 on to work with our new … ( #5726 )
...
* chore: turn compaction cold partition step 1 on to work with our new memory budget that considers the num_files limitation
* chore: run fmt
2022-09-22 14:59:27 +00:00
Nga Tran
e3deb23bcc
feat: add minimum row_count per file in estimating compacting memory… ( #5715 )
...
* feat: add minimum row_count per file in estiumating compacting memory budget and limit number files per compaction
* chore: cleanup
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* test: add test per review comments
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* test: add one more test that has limit num files larger than total input files
* fix: make the L1 files in tests not overlapped
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-22 14:37:39 +00:00
Carol (Nichols || Goulding)
aa822a40cf
refactor: Move config in with the relevant assertions
...
Now that only one hot test is using a CompactorConfig, move it into that
test to avoid spooky action at a distance.
2022-09-21 11:57:57 -04:00
Carol (Nichols || Goulding)
f0bf3bd21c
test: Clarify descriptions for the remaining assertion
...
The assertion remaining in this test is now important because of having
multiple shards and showing which partition per shard is chosen.
2022-09-21 11:57:57 -04:00
Carol (Nichols || Goulding)
7c7b058276
refactor: Extract unit test for case 5
2022-09-21 11:57:57 -04:00
Carol (Nichols || Goulding)
f5bd81ff3c
refactor: Extract unit test for case 4
2022-09-21 11:57:57 -04:00
Carol (Nichols || Goulding)
765feaa4d8
refactor: Extract a unit test for case 3
2022-09-21 11:57:57 -04:00
Carol (Nichols || Goulding)
a7a480c1ba
refactor: Extract a unit test for case 2
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
d95f252a8e
refactor: Extract a unit test for case 1
...
Also add coverage for when there are no *partitions* in addition to the
test for when there are no *parquet files*.
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
9372290ec9
refactor: Use iox_test helpers to simplify test setup
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
f22627a97f
test: Move an integration test of hot compact_one_partition to lib
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
a7bb0398e6
test: Move an integration test of compact_candidates_with_memory_budget to the same file
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
316ebfa8c1
test: Call the smaller inner hot_partitions_for_shard when only one shard is involved
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
fcf9a9d589
refactor: Move fetching of config from compactor inside hot_partitions_to_compact
...
But still pass them to hot_partitions_for_shard.
And make the order of the arguments the same as for
recent_highest_throughput_partitions because I've already messed the
order up. And make the names the same throughout.
This makes the closure passed to get_candidates_with_retry simpler.
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
48b7876174
refactor: Extract a function for computing query nanoseconds ago
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
7dcaf5bd3d
refactor: Extract a function for getting hot partitions for one shard
2022-09-21 11:57:56 -04:00
Carol (Nichols || Goulding)
b557c30fd3
refactor: Move hot compaction candidates to the hot module
2022-09-21 11:57:55 -04:00
Carol (Nichols || Goulding)
fa11031a36
refactor: Extract a shared function to retry fetching of compaction candidates
2022-09-21 11:57:55 -04:00
Nga Tran
1d306061b9
chore: disable cold compaction again since its step 1 is the culprit ( #5700 )
2022-09-20 20:34:28 +00:00
Nga Tran
34bc02b59b
chore: turn cold comapction on but only compact L0s and thier overlapped L1s ( #5698 )
2022-09-20 18:44:36 +00:00
Nga Tran
578ce1854d
chore: temporarily turn off cold compaction to investigate an oom ( #5696 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-20 14:17:22 +00:00
Carol (Nichols || Goulding)
414b0f02ca
fix: Use time helper methods in more places
2022-09-19 13:24:08 -04:00
Carol (Nichols || Goulding)
c0c0349bc5
fix: Use typed Time values rather than ns
2022-09-19 12:59:20 -04:00
Carol (Nichols || Goulding)
0e23360da1
refactor: Add helper methods for computing times to TimeProvider
2022-09-19 11:34:43 -04:00
kodiakhq[bot]
eed31bec4e
Merge branch 'main' into cn/share-code-with-full-compaction
2022-09-16 21:15:44 +00:00
Carol (Nichols || Goulding)
20f5f205bc
fix: ChunkOrder should be either max_seq or 0, not min_time
2022-09-16 16:57:31 -04:00
Carol (Nichols || Goulding)
d85e959820
fix: Sort l1 files by min_time rather than max_sequence_number
2022-09-16 16:15:18 -04:00
Carol (Nichols || Goulding)
50ddd588b1
test: Add a case of L1+L2 files being compacted into L2
2022-09-16 16:15:18 -04:00
Carol (Nichols || Goulding)
a8d817c91a
test: Explain expected value
2022-09-16 16:15:18 -04:00
Carol (Nichols || Goulding)
1ab250dfac
fix: Sort chunks taking into account what level compaction is targetting
2022-09-16 16:15:18 -04:00
Carol (Nichols || Goulding)
ca4c5d65e7
docs: Clarify comments on sort order of input/output of filtering
2022-09-16 16:15:17 -04:00
Nga Tran
346ef1c811
chore: reduce number of histogram buckets ( #5661 )
2022-09-16 19:44:22 +00:00
Carol (Nichols || Goulding)
cde0a94fd5
fix: Re-enable full compaction to level 2
...
This will work the same way that compacting level 0 -> level 1 does
except that the resulting files won't be split into potentially multiple
files. It will be limited by the memory budget bytes, which should limit
the groups more than the max_file_size_bytes would.
2022-09-15 14:53:12 -04:00
Carol (Nichols || Goulding)
e05657e8a4
feat: Make filter_parquet_files more general with regards to compaction level
2022-09-15 14:53:08 -04:00
Carol (Nichols || Goulding)
9b99af08e4
fix: Level 1 files need to be sorted by max sequence number for full compaction
2022-09-15 14:53:07 -04:00
Carol (Nichols || Goulding)
dc64e494bd
docs: Update comment to what we'd like this code to do
2022-09-15 14:53:07 -04:00
Carol (Nichols || Goulding)
f5497a3a3d
refactor: Extract a conversion for convenience in tests
2022-09-15 12:48:36 -04:00
Carol (Nichols || Goulding)
dcab9d0ffc
refactor: Combine relevant data with the FilterResult state
...
This encodes the result directly and has the FilterResult hold only the
relevant data to the state. So no longer any need to create or check for
empty vectors or 0 budget_bytes. Also creates a new type after checking
the filter result state and handling the budget, as actual compaction
doesn't need to care about that.
This could still use more refactoring to become a clearer pipeline of
different states, but I think this is a good start.
2022-09-15 11:13:18 -04:00
Carol (Nichols || Goulding)
e57387b8e4
refactor: Extract an inner function so partition isn't needed in tests
2022-09-15 11:10:14 -04:00
Carol (Nichols || Goulding)
a284cebb51
refactor: Store estimated bytes on the CompactorParquetFile
2022-09-15 11:10:14 -04:00
Carol (Nichols || Goulding)
70094aead0
refactor: Make estimating bytes a responsibility of the Partition
...
Table columns for a partition don't change, so rather than carrying
around table columns for the partition and parquet files to look up
repeatedly, have the `PartitionCompactionCandidateWithInfo` keep track
of its column types and be able to estimate bytes given a number of rows
from a parquet file.
2022-09-15 11:10:14 -04:00
Nga Tran
7c4c918636
chore: add parttion id into panic message ( #5641 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-15 02:21:13 +00:00