Commit Graph

274 Commits (f21cb4362479b25fd00e04c9effef96c57dade87)

Author SHA1 Message Date
Nga Tran f21cb43624
feat: add a few more buckets for the histograms (#5621)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-13 13:52:23 +00:00
Carol (Nichols || Goulding) e7a3f15ecf
test: Remove outdated description 2022-09-12 13:13:30 -04:00
Carol (Nichols || Goulding) 8981cbbd84
test: Reduce time from 18 to 9 hours 2022-09-12 13:13:29 -04:00
Carol (Nichols || Goulding) 2ceb779c28
test: Correct a comment that I missed in the 24 hr -> 8 hr switch 2022-09-12 13:13:29 -04:00
Carol (Nichols || Goulding) baec40a313
test: Correct and expand assertions and descriptions 2022-09-12 13:13:29 -04:00
Carol (Nichols || Goulding) 2aef7c7936
feat: Temporarily disable cold full compaction 2022-09-12 13:13:29 -04:00
Carol (Nichols || Goulding) 743b67f0e9
fix: Re-enable full cold compaction, in serial for now 2022-09-12 13:13:29 -04:00
Carol (Nichols || Goulding) 6e1b06c435
fix: Work with Arc of PartitionCompactionCandidateWithInfo 2022-09-12 13:13:29 -04:00
Carol (Nichols || Goulding) dfd7255c46
fix: Remove now-unused cold_input_file_count_threshold 2022-09-12 13:13:28 -04:00
Carol (Nichols || Goulding) 3a368c02c2
fix: Remove now-unused cold_input_size_threshold_bytes 2022-09-12 13:13:28 -04:00
Carol (Nichols || Goulding) eefc71ac90
fix: Remove now unused max_cold_concurrent_size_bytes 2022-09-12 13:13:28 -04:00
Carol (Nichols || Goulding) 2a22d79c94
feat: Make cold compaction like hot compaction except for candidate selection
Temporarily disable full compaction from level 1 to 2.

Re-use the memory budget estimation and parallelization for cold
compaction. Rather than choosing cold compaction candidates and then in
parallel compacting each partition from level 0 to 1 and then 1 to 2,
this commit switches to compacting in parallel (by memory budget) all
candidates form level 0 to 1. The next commit will re-enable full
compaction of all partitions in parallel (by memory budget).
2022-09-12 13:13:28 -04:00
Carol (Nichols || Goulding) 76228c9fd6
refactor: Move compact_in_parallel and compact_one_partition to lib and make more general
Cold compaction is going to use these too.
2022-09-12 13:13:28 -04:00
Carol (Nichols || Goulding) 7a3dffb750
refactor: Create wrapper fns that don't take size overrides
So that we don't have to pass an empty hashmap in as many places in real
code, because the size overrides are only for tests
2022-09-12 13:13:28 -04:00
Carol (Nichols || Goulding) 608290b83d
fix: Make some hot compaction code more general/parameterized 2022-09-12 13:13:28 -04:00
Carol (Nichols || Goulding) 2a5ef3058c
refactor: Move compact_candidates_with_memory_budget to share with cold 2022-09-12 13:13:28 -04:00
Carol (Nichols || Goulding) 955e7ea824
fix: Remove unused Error struct 2022-09-12 13:13:27 -04:00
Carol (Nichols || Goulding) ee3e1b851d
fix: Clean up some long lines, comments 2022-09-12 13:13:27 -04:00
Carol (Nichols || Goulding) 77f3490246
refactor: Extract cold compaction code into a module like hot 2022-09-12 13:13:27 -04:00
Carol (Nichols || Goulding) c12b3fbb03
refactor: Move to a module named hot to reduce naming duplication
My fingers are tired of typing 🤣
2022-09-12 13:13:27 -04:00
Carol (Nichols || Goulding) e3f9984878
docs: Clean up some comments while reading through 2022-09-12 13:13:27 -04:00
Carol (Nichols || Goulding) f2f99727ba
feat: Add metrics for files going into cold compaction 2022-09-12 13:13:27 -04:00
Carol (Nichols || Goulding) ad2db51ac2
refactor: Extract a function to share logic for compacting to L1 or L2 2022-09-12 13:13:27 -04:00
Carol (Nichols || Goulding) 6436afc3d9
fix: Remove cold max bytes CLI option; use existing max bytes CLI option
As discussed in https://github.com/influxdata/influxdb_iox/issues/5330#issuecomment-1218170063
2022-09-12 13:13:27 -04:00
Carol (Nichols || Goulding) 723aedfbca
test: Add more cases for cold compaction 2022-09-12 13:13:26 -04:00
Carol (Nichols || Goulding) 7cd78a3020
fix: Extract and test logic that groups files for cold compaction 2022-09-12 13:13:26 -04:00
Carol (Nichols || Goulding) da201ba87f
fix: Select by num of both l0 and l1 files for cold compaction
Now that we're going to compact level 1 files in to level 2 files as
well.
2022-09-12 13:13:26 -04:00
Carol (Nichols || Goulding) 6bba3fafaa
fix: If full compaction group has only 1 file, upgrade level
As opposed to running full compaction.

Makes the catalog function general and take the level as a parameter
rather than only upgrade to level 1.
2022-09-12 13:13:26 -04:00
Carol (Nichols || Goulding) 10ba3fef47
feat: Compact cold partitions completely
Fixes #5330.
2022-09-12 13:13:26 -04:00
Carol (Nichols || Goulding) 327446f0cd
fix: Change default cold hours threshold from 24 hours to 8
As requested in https://github.com/influxdata/influxdb_iox/issues/5330#issuecomment-1212468682
2022-09-12 13:13:26 -04:00
Carol (Nichols || Goulding) a64a705b60
refactor: Extract a fn for the first step of cold compaction
Which is currently the only step, compacting any remaining level 0 files
into level 1. Make a TODO function for performing full compaction of all
level 1 files next.
2022-09-12 13:13:26 -04:00
Carol (Nichols || Goulding) 7249ef4793
fix: Don't record cold compaction metrics if compaction fails 2022-09-12 13:13:25 -04:00
Marco Neumann 8933f47ec1
refactor: make `QueryChunk::partition_id` non-optional (#5614)
In our data model, a chunk always belongs to a partition[^1], so let's
not make this attribute optional. The optional value only leads to
-- mostly surprising -- conditional behavior, ranging from "do not equalize
the partition sort key" (querier) to "always consider the chunk overlapping"
(iox_query when dealing with ingester chunks).

[^1]: This is even true when the chunk belongs to a parquet file that is not
      yet added to the catalog, contrary to what a comment in the ingester
      stated. The catalog and data model used by the querier are two totally
      different things.
2022-09-12 13:52:51 +00:00
Carol (Nichols || Goulding) 13de7ac954
feat: Record reasons for skipping compaction of a partition in the database
Closes #5458.
2022-09-09 16:40:48 -04:00
Nga Tran f03e370ecc
refactor: allocate more accurate length for a hashmap (#5592)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-09 15:37:29 +00:00
Joe-Blount 333cfa4f3c chore: address comments - use TimestampMinMax passed by reference 2022-09-07 16:36:39 -05:00
Joe-Blount 97ebad5adb chore: rustfmt changes 2022-09-07 13:22:36 -05:00
Joe-Blount 4188230694 fix: avoid splitting compaction output for time ranges with no chunks 2022-09-07 13:01:14 -05:00
Carol (Nichols || Goulding) b5ca99a3d5
refactor: Make CompactorConfig fields pub
I'm spending way too long with the wrong number of arguments to
CompactorConfig::new and not a lot of help from the compiler. If these
struct fields are pub, they can be set directly and destructured, etc,
which the compiler gives way more help on. This also reduces duplication
and boilerplate that has to be updated when the config fields change.
2022-09-07 13:28:19 -04:00
Carol (Nichols || Goulding) 54eea79773
refactor: Make filtering the parquet files into a closure argument too
So that the cold compaction can use different filtering but still use
the memory budget function.

Not sure I'm happy with this yet, but it's a start.
2022-09-07 13:26:42 -04:00
Carol (Nichols || Goulding) 3e76a155f7
refactor: Make memory budget compaction group function more general
In preparation for using it for cold compaction too.
2022-09-07 13:26:42 -04:00
Carol (Nichols || Goulding) 1f69d11d46
refactor: Move hot compaction function into hot compaction module 2022-09-07 13:26:40 -04:00
Carol (Nichols || Goulding) 85fb0acea6
refactor: Extract read_parquet_file test helper function to iox_tests::utils 2022-09-07 13:21:28 -04:00
Marco Neumann 064f0e9b29
refactor: use DataFusion to read parquet files (#5531)
Remove our own hand-rolled logic and let DataFusion read the parquet
files.

As a bonus, this now supports predicate pushdown to the deserialization
step, so we can use parquets as in in-mem buffer.

Note that this currently uses some "nested" DataFusion hack due to the
way the `QueryChunk` interface works. Midterm I'll change the interface
so that the `ParquetExec` nodes are directly visible to DataFusion
instead of some opaque `SendableRecordBatchStream`.
2022-09-05 09:25:04 +00:00
Marco Neumann f45cbfb88d
refactor: fine-grained file size mocking (#5541)
* refactor: do not override parquet file size in querier

This is going to be an issue when we actually rely on the size for
reading, see #5531.

* refactor: use selected file size mocking in compactor

Do not blindly override parquet file sizes for all subsystems.

This is going to be an issue when we actually rely on the size for
reading, see #5531.

* refactor: remove ability to override file sizes in catalog

Blindly overriding data for all subsystems is dangerous, because some
parts of our stack actually rely on the actual file size. See #5531.

* docs: explain `size_overrides`
2022-09-05 08:50:04 +00:00
Nga Tran dde65fa7ef
fix: remove timestamp functions from SQLs to be able to use index for improving performance (#5547) 2022-09-02 19:43:52 +00:00
kodiakhq[bot] b9959fa2d8
Merge branch 'main' into cn/even-more-compactor-tests 2022-09-01 21:02:04 +00:00
Nga Tran c8cbc5299b
feat: make compactors to select candidates based on the last n minutes (#5535)
* feat: make compactors to select candidates based on the last n minutes to reduce workload for postgres catalog query

* refactor: remove 1-minute case per review comment
2022-09-01 20:07:26 +00:00
Carol (Nichols || Goulding) 16d631a247
test: Add test for current behavior of skipping a table without columns 2022-08-31 16:26:02 -04:00
Carol (Nichols || Goulding) 1120b49821
refactor: Extract the mock compactor function into a type 2022-08-31 16:17:43 -04:00