influxdb

Commit Graph

Author	SHA1	Message	Date
Carol (Nichols \|\| Goulding)	9b99af08e4	fix: Level 1 files need to be sorted by max sequence number for full compaction	2022-09-15 14:53:07 -04:00
Carol (Nichols \|\| Goulding)	dc64e494bd	docs: Update comment to what we'd like this code to do	2022-09-15 14:53:07 -04:00
Carol (Nichols \|\| Goulding)	f5497a3a3d	refactor: Extract a conversion for convenience in tests	2022-09-15 12:48:36 -04:00
Carol (Nichols \|\| Goulding)	dcab9d0ffc	refactor: Combine relevant data with the FilterResult state This encodes the result directly and has the FilterResult hold only the relevant data to the state. So no longer any need to create or check for empty vectors or 0 budget_bytes. Also creates a new type after checking the filter result state and handling the budget, as actual compaction doesn't need to care about that. This could still use more refactoring to become a clearer pipeline of different states, but I think this is a good start.	2022-09-15 11:13:18 -04:00
Carol (Nichols \|\| Goulding)	e57387b8e4	refactor: Extract an inner function so partition isn't needed in tests	2022-09-15 11:10:14 -04:00
Carol (Nichols \|\| Goulding)	a284cebb51	refactor: Store estimated bytes on the CompactorParquetFile	2022-09-15 11:10:14 -04:00
Carol (Nichols \|\| Goulding)	70094aead0	refactor: Make estimating bytes a responsibility of the Partition Table columns for a partition don't change, so rather than carrying around table columns for the partition and parquet files to look up repeatedly, have the `PartitionCompactionCandidateWithInfo` keep track of its column types and be able to estimate bytes given a number of rows from a parquet file.	2022-09-15 11:10:14 -04:00
Nga Tran	7c4c918636	chore: add parttion id into panic message (#5641 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-15 02:21:13 +00:00
kodiakhq[bot]	08e2523295	Merge branch 'main' into cn/always-get-extra-info	2022-09-14 17:01:59 +00:00
Nga Tran	44e12aa512	feat: add needed budget and memory budget into the message for us to diagnose and increase our memory budget as needed (#5640 )	2022-09-14 16:06:19 +00:00
Carol (Nichols \|\| Goulding)	e16306d21c	refactor: Move fetching of extra partition info into the method because it's always needed	2022-09-14 11:14:17 -04:00
kodiakhq[bot]	85641efa6f	Merge branch 'main' into cn/infallible-estimated-bytes	2022-09-14 01:00:10 +00:00
Nga Tran	f21cb43624	feat: add a few more buckets for the histograms (#5621 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-13 13:52:23 +00:00
Andrew Lamb	f86d3e31da	chore: Update datafusion + object_store (#5619 ) * chore: Update datafusion pin * chore: update object_store to 0.5.0 * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-13 12:34:54 +00:00
Carol (Nichols \|\| Goulding)	d971980fd3	fix: Box a source error to please clippy	2022-09-12 17:38:40 -04:00
Carol (Nichols \|\| Goulding)	c3937308f4	fix: Make estimate_arrow_bytes_for_file infallible	2022-09-12 16:50:25 -04:00
Andrew Lamb	1fd31ee3bf	chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 (#5591 ) * chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 * fix: enable dynamic comparison flag * chore: derive Eq for clippy * chore: update explain plans * chore: Update sizes for ReadBuffer encoding * chore: update more tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-12 17:45:03 +00:00
Carol (Nichols \|\| Goulding)	e7a3f15ecf	test: Remove outdated description	2022-09-12 13:13:30 -04:00
Carol (Nichols \|\| Goulding)	8981cbbd84	test: Reduce time from 18 to 9 hours	2022-09-12 13:13:29 -04:00
Carol (Nichols \|\| Goulding)	2ceb779c28	test: Correct a comment that I missed in the 24 hr -> 8 hr switch	2022-09-12 13:13:29 -04:00
Carol (Nichols \|\| Goulding)	baec40a313	test: Correct and expand assertions and descriptions	2022-09-12 13:13:29 -04:00
Carol (Nichols \|\| Goulding)	2aef7c7936	feat: Temporarily disable cold full compaction	2022-09-12 13:13:29 -04:00
Carol (Nichols \|\| Goulding)	743b67f0e9	fix: Re-enable full cold compaction, in serial for now	2022-09-12 13:13:29 -04:00
Carol (Nichols \|\| Goulding)	6e1b06c435	fix: Work with Arc of PartitionCompactionCandidateWithInfo	2022-09-12 13:13:29 -04:00
Carol (Nichols \|\| Goulding)	dfd7255c46	fix: Remove now-unused cold_input_file_count_threshold	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	3a368c02c2	fix: Remove now-unused cold_input_size_threshold_bytes	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	eefc71ac90	fix: Remove now unused max_cold_concurrent_size_bytes	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	2a22d79c94	feat: Make cold compaction like hot compaction except for candidate selection Temporarily disable full compaction from level 1 to 2. Re-use the memory budget estimation and parallelization for cold compaction. Rather than choosing cold compaction candidates and then in parallel compacting each partition from level 0 to 1 and then 1 to 2, this commit switches to compacting in parallel (by memory budget) all candidates form level 0 to 1. The next commit will re-enable full compaction of all partitions in parallel (by memory budget).	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	76228c9fd6	refactor: Move compact_in_parallel and compact_one_partition to lib and make more general Cold compaction is going to use these too.	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	7a3dffb750	refactor: Create wrapper fns that don't take size overrides So that we don't have to pass an empty hashmap in as many places in real code, because the size overrides are only for tests	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	608290b83d	fix: Make some hot compaction code more general/parameterized	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	2a5ef3058c	refactor: Move compact_candidates_with_memory_budget to share with cold	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	955e7ea824	fix: Remove unused Error struct	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	ee3e1b851d	fix: Clean up some long lines, comments	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	77f3490246	refactor: Extract cold compaction code into a module like hot	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	c12b3fbb03	refactor: Move to a module named hot to reduce naming duplication My fingers are tired of typing 🤣	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	e3f9984878	docs: Clean up some comments while reading through	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	f2f99727ba	feat: Add metrics for files going into cold compaction	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	ad2db51ac2	refactor: Extract a function to share logic for compacting to L1 or L2	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	6436afc3d9	fix: Remove cold max bytes CLI option; use existing max bytes CLI option As discussed in https://github.com/influxdata/influxdb_iox/issues/5330#issuecomment-1218170063	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	723aedfbca	test: Add more cases for cold compaction	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	7cd78a3020	fix: Extract and test logic that groups files for cold compaction	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	da201ba87f	fix: Select by num of both l0 and l1 files for cold compaction Now that we're going to compact level 1 files in to level 2 files as well.	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	6bba3fafaa	fix: If full compaction group has only 1 file, upgrade level As opposed to running full compaction. Makes the catalog function general and take the level as a parameter rather than only upgrade to level 1.	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	10ba3fef47	feat: Compact cold partitions completely Fixes #5330.	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	327446f0cd	fix: Change default cold hours threshold from 24 hours to 8 As requested in https://github.com/influxdata/influxdb_iox/issues/5330#issuecomment-1212468682	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	a64a705b60	refactor: Extract a fn for the first step of cold compaction Which is currently the only step, compacting any remaining level 0 files into level 1. Make a TODO function for performing full compaction of all level 1 files next.	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	7249ef4793	fix: Don't record cold compaction metrics if compaction fails	2022-09-12 13:13:25 -04:00
Marco Neumann	8933f47ec1	refactor: make `QueryChunk::partition_id` non-optional (#5614 ) In our data model, a chunk always belongs to a partition[^1], so let's not make this attribute optional. The optional value only leads to -- mostly surprising -- conditional behavior, ranging from "do not equalize the partition sort key" (querier) to "always consider the chunk overlapping" (iox_query when dealing with ingester chunks). [^1]: This is even true when the chunk belongs to a parquet file that is not yet added to the catalog, contrary to what a comment in the ingester stated. The catalog and data model used by the querier are two totally different things.	2022-09-12 13:52:51 +00:00
Carol (Nichols \|\| Goulding)	13de7ac954	feat: Record reasons for skipping compaction of a partition in the database Closes #5458.	2022-09-09 16:40:48 -04:00
Nga Tran	f03e370ecc	refactor: allocate more accurate length for a hashmap (#5592 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-09 15:37:29 +00:00
dependabot[bot]	786ce75e26	chore(deps): Bump tokio-util from 0.7.3 to 0.7.4 (#5596 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.3 to 0.7.4. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.3...tokio-util-0.7.4) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-09-09 07:40:16 +00:00
Joe-Blount	333cfa4f3c	chore: address comments - use TimestampMinMax passed by reference	2022-09-07 16:36:39 -05:00
Joe-Blount	97ebad5adb	chore: rustfmt changes	2022-09-07 13:22:36 -05:00
Joe-Blount	4188230694	fix: avoid splitting compaction output for time ranges with no chunks	2022-09-07 13:01:14 -05:00
Carol (Nichols \|\| Goulding)	b5ca99a3d5	refactor: Make CompactorConfig fields pub I'm spending way too long with the wrong number of arguments to CompactorConfig::new and not a lot of help from the compiler. If these struct fields are pub, they can be set directly and destructured, etc, which the compiler gives way more help on. This also reduces duplication and boilerplate that has to be updated when the config fields change.	2022-09-07 13:28:19 -04:00
Carol (Nichols \|\| Goulding)	54eea79773	refactor: Make filtering the parquet files into a closure argument too So that the cold compaction can use different filtering but still use the memory budget function. Not sure I'm happy with this yet, but it's a start.	2022-09-07 13:26:42 -04:00
Carol (Nichols \|\| Goulding)	3e76a155f7	refactor: Make memory budget compaction group function more general In preparation for using it for cold compaction too.	2022-09-07 13:26:42 -04:00
Carol (Nichols \|\| Goulding)	1f69d11d46	refactor: Move hot compaction function into hot compaction module	2022-09-07 13:26:40 -04:00
Carol (Nichols \|\| Goulding)	85fb0acea6	refactor: Extract read_parquet_file test helper function to iox_tests::utils	2022-09-07 13:21:28 -04:00
Marco Neumann	adeacf416c	ci: fix (#5569 ) * ci: use same feature set in `build_dev` and `build_release` * ci: also enable unstable tokio for `build_dev` * chore: update tokio to 1.21 (to fix console-subscriber 0.1.8 * fix: "must use"	2022-09-06 14:13:28 +00:00
Marco Neumann	064f0e9b29	refactor: use DataFusion to read parquet files (#5531 ) Remove our own hand-rolled logic and let DataFusion read the parquet files. As a bonus, this now supports predicate pushdown to the deserialization step, so we can use parquets as in in-mem buffer. Note that this currently uses some "nested" DataFusion hack due to the way the `QueryChunk` interface works. Midterm I'll change the interface so that the `ParquetExec` nodes are directly visible to DataFusion instead of some opaque `SendableRecordBatchStream`.	2022-09-05 09:25:04 +00:00
Marco Neumann	f45cbfb88d	refactor: fine-grained file size mocking (#5541 ) * refactor: do not override parquet file size in querier This is going to be an issue when we actually rely on the size for reading, see #5531. * refactor: use selected file size mocking in compactor Do not blindly override parquet file sizes for all subsystems. This is going to be an issue when we actually rely on the size for reading, see #5531. * refactor: remove ability to override file sizes in catalog Blindly overriding data for all subsystems is dangerous, because some parts of our stack actually rely on the actual file size. See #5531. * docs: explain `size_overrides`	2022-09-05 08:50:04 +00:00
Nga Tran	dde65fa7ef	fix: remove timestamp functions from SQLs to be able to use index for improving performance (#5547 )	2022-09-02 19:43:52 +00:00
kodiakhq[bot]	b9959fa2d8	Merge branch 'main' into cn/even-more-compactor-tests	2022-09-01 21:02:04 +00:00
Nga Tran	c8cbc5299b	feat: make compactors to select candidates based on the last n minutes (#5535 ) * feat: make compactors to select candidates based on the last n minutes to reduce workload for postgres catalog query * refactor: remove 1-minute case per review comment	2022-09-01 20:07:26 +00:00
Carol (Nichols \|\| Goulding)	16d631a247	test: Add test for current behavior of skipping a table without columns	2022-08-31 16:26:02 -04:00
Carol (Nichols \|\| Goulding)	1120b49821	refactor: Extract the mock compactor function into a type	2022-08-31 16:17:43 -04:00
Carol (Nichols \|\| Goulding)	b893251efc	test: Add a test that compacting no candidates compacts nothing	2022-08-31 15:30:25 -04:00
Carol (Nichols \|\| Goulding)	b0e871196c	test: Use more iox test utils in this compactor test	2022-08-31 14:37:59 -04:00
Nga Tran	a32d5180b3	fix: loop forever in compact_hot_partition_candidates (#5518 ) * fix: loop forever in compact_hot_partition_candidates * chore: cleanup * fix: avoid using continues that will cause bugs in corner cases * fix: Pass compaction fn as a closure instead to allow collection of groups in test * fix: Add Send bound as suggested by clippy * fix: fix the test to return data of round 3 instead of round 2 Co-authored-by: Carol (Nichols \|\| Goulding) <carol.nichols@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-31 17:46:59 +00:00
Andrew Lamb	6669d85fb4	chore: Update datafusion + arrow/parquet to `21.0.0` (#5519 ) * chore: Update arrow/arrow-flight/parquet to 21.0.0 * chore: Update datafusion pin * chore: Fix arrow update script * chore: Update Cargo.lock * chore: Update for new API	2022-08-31 13:30:47 +00:00
Nga Tran	cb10a7c6d8	feat: More accurate memory estimate for compaction (#5471 ) * feat: initial implementation of memory estimation for a compaction * feat: estimate size of files and have the right actions for the needed budget * feat: run candidates in parallel * fix: have the right name for the column field of the output struct * feat: add metrics for estimated budgets * chore: cleanup * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * fix: fix syntax after applying review's suggestions * refactor: Convert a Vec to VecDeque to go well with pop and push * chore: remove max_concurrent_size_bytes and input_size_threshold_bytes * chore: remove input_file_count_threshold * test: tests for estimate_arrow_bytes_for_file Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-30 13:44:44 +00:00
Dom Dwyer	2fc0ddbea1	fix: compactor tolerates empty output Changes the compactor code to tolerate a SplitExec yielding an empty partition (with no rows). This raises a WARN as the situation in which this is acceptable is very rare, and is more likely indicative of an opportunity to improve the SplitExec usage (i.e. pruning out unnecessary split points).	2022-08-30 14:52:31 +02:00
Carol (Nichols \|\| Goulding)	58f0b63cdc	refactor: Rename KafkaTopic to Topic or TopicMetadata or topic name as appropriate	2022-08-29 14:27:02 -04:00
Carol (Nichols \|\| Goulding)	74c9529062	fix: Rename KafkaPartition to ShardIndex	2022-08-29 14:07:18 -04:00
Carol (Nichols \|\| Goulding)	c9567cad7d	fix: Rename some more sequencer to shard	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	6443858870	fix: Rename compactor option from sequencer to shard	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	fe9c474620	fix: rustfmt	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	f6c93f7e67	fix: Remove moot comment	2022-08-29 14:06:44 -04:00
Carol (Nichols \|\| Goulding)	698f1a47ff	refactor: Rename test structures from sequencer to shard where appropriate	2022-08-29 14:06:44 -04:00
Jake Goulding	4abf21c724	refactor: Rename Sequencer (and its entourage) to Shard	2022-08-29 14:06:43 -04:00
Nga Tran	3220c6f88b	feat: add file_count_threshold for comapcting cold partitions (#5456 ) * feat: file file_count_threshold for comapcting cold partitions to make it consistent with the hot case and help set up to avoid oom easier * chore: remove unecessary commments	2022-08-23 20:12:21 +00:00
kodiakhq[bot]	2b3ca54168	Merge branch 'main' into cn/upgrade-l0-metrics	2022-08-17 16:01:42 +00:00
Andrew Lamb	7f0ae53d6f	chore: Update to (almost) released object_store 0.4.0 (#5419 ) * chore: update object_store * chore: update hakari config * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-17 13:44:48 +00:00
Carol (Nichols \|\| Goulding)	ef716a5b90	fix: Remove compaction level attribute from the compaction_input_file_bytes metric	2022-08-15 10:50:04 -04:00
Carol (Nichols \|\| Goulding)	a9ed32df89	fix: Remove compaction_counter as it's now redundant with the compaction_input_file_bytes histogram	2022-08-15 10:23:29 -04:00
Carol (Nichols \|\| Goulding)	af95ce7ca6	feat: Add a histogram tracking sizes of files used as inputs to compaction Fixes #5348.	2022-08-15 10:13:54 -04:00
Carol (Nichols \|\| Goulding)	cd6c809fe0	fix: Change metric tracking sizes of files selected for compaction to a histogram Connects to #5348.	2022-08-15 10:13:54 -04:00
Carol (Nichols \|\| Goulding)	b982bdaf2f	fix: Derive Eq when we derive PartialEq and members can derive Eq Allow this in generated code that we don't control, though. Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq	2022-08-11 15:04:06 -04:00
Marco Neumann	90fec1365f	feat: intern schemas during query planning (#5215 ) * feat: intern schemas during query planning Helps with #5202. * refactor: `SchemaMerger::build` shall return an `Arc` * feat: `SchemaMerger::with_interner` * refactor: hash-based schema interning	2022-08-11 12:28:51 +00:00
Jake Goulding	68e64af4d1	refactor: extract compactor loop body to call it separately	2022-08-10 11:28:51 -04:00
Jake Goulding	49c5281454	refactor: Supersede old CompactorHandlerImpl constructor	2022-08-10 11:28:51 -04:00
Jake Goulding	cc061b6ce9	refactor: add CompactorHandlerImpl::new_with_compactor This will allow us to refactor the code a level up to create a `Compactor` directly.	2022-08-10 11:28:51 -04:00
Andrew Lamb	c0fc91c627	chore: Warn if a parquet file has no sort key (#5368 )	2022-08-10 11:56:50 +00:00
Andrew Lamb	16ddc5efc6	chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360 ) * chore: Update datafusion and arrow * chore: Update Cargo.lock * chore: update to Decimal128 * chore: Update tonic/prost/pbjson/etc * chore: Run cargo hakari tasks * fix: doctest in generated types Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-09 17:30:44 +00:00
Nga Tran	b71c1a09ea	feat: only sleep when there are neither hot nor cold partitions to compact (#5329 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-05 16:36:36 +00:00
Carol (Nichols \|\| Goulding)	facc967320	fix: Specify hot or cold in more log messages	2022-08-04 16:55:48 -04:00
Carol (Nichols \|\| Goulding)	c9d66c30b1	fix: Make this field name consistent With the other fields on this struct and with the corresponding field on the clap block struct.	2022-08-04 16:55:48 -04:00
Carol (Nichols \|\| Goulding)	da0b031c44	feat: Add parameters to limit total memory usage of cold partition compaction	2022-08-04 16:55:48 -04:00
Carol (Nichols \|\| Goulding)	9d8f94d0d7	fix: Remove an unneeded sleep The cold case won't make a hot busy loop (hah), we'll just go back to working on the hot partitions if there's no cold partitions to do.	2022-08-04 16:55:48 -04:00
Carol (Nichols \|\| Goulding)	e1c45e836a	test: Remove copypastaed assertions that duplicate a different test	2022-08-04 16:55:48 -04:00
Carol (Nichols \|\| Goulding)	cb6442018e	test: Add more test cases varying number of partitions per sequencer	2022-08-04 16:55:48 -04:00
Carol (Nichols \|\| Goulding)	d55f45a5c2	feat: Run compaction of hot partitions a configurable number of times more than cold	2022-08-04 16:55:48 -04:00
Carol (Nichols \|\| Goulding)	827e82cfb8	feat: Upgrade one level 0, non-overlapping file without compacting Fixes #1078.	2022-08-04 16:55:47 -04:00
Carol (Nichols \|\| Goulding)	c1d016a00a	feat: Upgrade cold level 0 files when they have no overlaps	2022-08-04 16:55:47 -04:00
Carol (Nichols \|\| Goulding)	9052eabe50	feat: Separate out hot/cold partition compaction and filtering Cold partition compaction will (in the next commit) upgrade a level 0 file without any overlaps rather than running compaction. Cold partition filtering gathers all level 0 files in the (already deemed cold) partition with all overlapping level 1 files, and does not limit the set of files being compacted by their number or size.	2022-08-04 16:55:47 -04:00
Carol (Nichols \|\| Goulding)	fc62c82722	feat: Select cold partitions	2022-08-04 16:55:47 -04:00
Carol (Nichols \|\| Goulding)	6e9c752230	refactor: Extract current compaction into a fn for 'hot' partitions	2022-08-04 16:55:47 -04:00
Marco Neumann	eea8270e83	fix: `compute_split_time` with small step sizes (#5309 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-04 13:40:30 +00:00
Marco Neumann	039950b4fd	feat: ensure clean compactor executor shutdown	2022-08-03 18:07:00 +02:00
Marco Neumann	fd74f2639b	fix: do not attempt to poll future lists in compactor It seems that the buffering / parallelization code cannot deal with empty lists and just freezes forever (which blocks shutdown but will also freeze the compactor forever).	2022-08-03 18:04:05 +02:00
Nga Tran	4812db9887	feat: fewer buckets but larger ranges for compaction duration histogram (#5259 ) * chore: reduce log info * feat: fewer buckets but larger ranges for compaction duration histogram * chore: Apply suggestions from code review Co-authored-by: Marko Mikulicic <mkm@influxdata.com> * chore: run fmt after appying reviewer's suggestions Co-authored-by: Marko Mikulicic <mkm@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-02 14:19:30 +00:00
dependabot[bot]	fbd39844d8	chore(deps): Bump async-trait from 0.1.56 to 0.1.57 (#5247 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.56 to 0.1.57. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.56...0.1.57) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-08-01 08:30:33 +00:00
Andrew Lamb	7cc8486e5a	fix: remove left over `deb!` macro (#5224 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-30 15:33:02 +00:00
Marco Neumann	0e9695f202	feat: add a few helpful compactor debug logs (#5235 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-28 17:38:33 +00:00
Andrew Lamb	9215a534d0	chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` (#5229 ) * chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` * chore: Run cargo hakari tasks * fix: Update for API changes * fix: clippy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-28 08:10:47 +00:00
Nga Tran	fcce00bf09	feat: run many compact partitions in parallel (#5230 ) * feat: run many compact partitions in parallel * refactor: Use rust futures fu to run compactor jobs in parallel * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-07-27 20:55:45 +00:00
Andrew Lamb	7eebe061a6	fix: reduce log verbosity for `found compaction candidates` message (#5225 ) * fix: reduce log verbosity * refactor: sleep for a sec if no work, print debug Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-27 19:35:31 +00:00
Marco Neumann	9a9a1a4777	feat: limit per-table chunk data for every query (#5223 ) * feat: `QueryChunk::as_any` * feat: allo `ChunkPruner::prune_chunks` to fail * feat: limit per-table chunk data for every query Closes #5211. * fix: address review comments Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-07-27 13:20:05 +00:00
Andrew Lamb	bbcf4ec64e	fix: Run compactor streams in parallel to avoid deadlock (#5212 ) * fix: run compaction streams at once * fix: make it compile * fix: improve wording * fix: use task to write parquet files in parallel	2022-07-26 12:17:38 +00:00
Carol (Nichols \|\| Goulding)	f4d0f13689	feat: split large compactions (#5195 ) * feat: Split large compactions into multiple compacted files Connects to #5121 * refactor: Extract update catalog function and error type * refactor: Share physical plan to object store streaming And only differ in the logical plan building based on split times in different compaction cases. * fix: Test for a split time equal to the max time and don't split then	2022-07-22 20:35:31 +00:00
Nga Tran	69640c0ba5	feat: Different branch to hook up new compaction algorithm (#5194 ) * chore: cherry pick the first 3 commits of branch cn/connect-new-compaction * fix: modify the test to work correctly with compactor running Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-22 19:29:47 +00:00
Carol (Nichols \|\| Goulding)	94343b1f27	fix: compute_split_time returns one value when min_time = max_time (#5192 ) * test: Document the behavior of compute_split_time when min time = max time * fix: compute_split_time returns one value when min_time = max_time Co-authored-by: NGA-TRAN <nga-tran@live.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-22 17:29:50 +00:00
Nga Tran	bbe07fcc79	feat: metrics for selection partition candidates for compaction (#5190 ) * feat: metrics for selection partition candidates for compaction * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * refactor: remove unused metric labels Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-22 15:25:53 +00:00
Carol (Nichols \|\| Goulding)	d9ca28e83a	fix: Compact one level 0 file to avoid getting stuck on it If compaction is called on one level 0 file, the work of compaction doesn't really need to be done, but for simplicity's sake for now, run it through the compaction query/write process rather than special casing it. This will keep us from getting stuck in the unlikely event that a partition with one level 0 file gets selected as a top candidate for compaction.	2022-07-22 09:17:57 -04:00
Carol (Nichols \|\| Goulding)	aa030ba132	test: Additional coverage for the compaction operation	2022-07-21 16:01:26 -04:00
Carol (Nichols \|\| Goulding)	86c50b8033	fix: Use 0 for level 1 chunk order and max seq num for level 0 chunk order	2022-07-21 15:49:52 -04:00
Carol (Nichols \|\| Goulding)	f847365b1a	fix: Use actual partition values rather than placeholders	2022-07-21 15:10:24 -04:00
Carol (Nichols \|\| Goulding)	d46ec31aa1	feat: Compact filtered parquet files Connects to #5121.	2022-07-21 13:37:36 -04:00
Nga Tran	50186ef5ee	feat: add sort key and partition key into PartitionCompactionCandidateWithInfo (#5175 )	2022-07-21 16:57:17 +00:00
Nga Tran	69cb3f2b19	refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151 ) * refactor: remove min_sequnce_number * fix: typos * fix: remove min_sequencer_number from new files from merging main * fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier * test: add test_compactor_collision back but modify the input to make it work woth new changes Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-21 13:51:54 +00:00
Marco Neumann	0561423475	refactor: enforce proper `IOxSessionContext` (#5158 ) - remove `IOxSessionContext::default()` because untracked contexts should only be created by tests - remove `Option<IOxSessionContext>` because it is a typed workaround for `IOxSessionContext::default` Tests should use `IOxSessionContext::testing` and all _normal_ users should create proper contexts. I suspect this will help tracing or at least prevent silent regressions. See #5129. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-20 16:25:43 +00:00
dependabot[bot]	278a7f91af	chore(deps): Bump bytes from 1.1.0 to 1.2.0 (#5156 ) Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.1.0 to 1.2.0. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/bytes/compare/v1.1.0...v1.2.0) --- updated-dependencies: - dependency-name: bytes dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-20 10:00:08 +00:00
Carol (Nichols \|\| Goulding)	1eb4640931	fix: Stop when exactly equal to the max file size limit too	2022-07-18 15:41:17 -04:00
Carol (Nichols \|\| Goulding)	154cd28928	fix: Clarify filter_parquet_files doc comment	2022-07-18 15:41:17 -04:00
Carol (Nichols \|\| Goulding)	0a545bf325	fix: Clarify how metrics are recorded in the docs for the metric fields	2022-07-18 15:41:17 -04:00
Carol (Nichols \|\| Goulding)	07e10852a8	feat: Add an input file count threshold to the compactor settings	2022-07-18 15:41:17 -04:00
Carol (Nichols \|\| Goulding)	128833e7d9	fix: Change placeholder new_param to input_size_threshold_bytes	2022-07-18 15:16:43 -04:00
Carol (Nichols \|\| Goulding)	d62b1ed7ee	feat: Select a subset of parquet files for a partition to compact Fixes #5120.	2022-07-18 15:14:22 -04:00
Carol (Nichols \|\| Goulding)	4416f1ce37	fix: Remove max number of level 0 files configuration option	2022-07-18 15:08:16 -04:00
Nga Tran	c8f4000f04	feat: Select compaction candidates (#5131 ) * feat: initial implementation for selecting compaction candidates * feat: 2 catalog functions to choose the most thorughput partitions to compact and the selecting candidate function itself * test: tests for the new 2 queries * feat: more tests and metrics for chooing compaction candidates * chore: Apply self suggestions from self review * chore: cleanup * chore: fix doc comment * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * refactor: address review comments * fix: get the right time provider for the tests * refactor: remove the left over compaction_ * fix: typos * fix: make the param name and env name consistent * refactor: make relevant iSomething to uSomething * fix: typo Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2022-07-18 18:05:13 +00:00
Andrew Lamb	e2d871b00b	chore: Update datafusion and arrow/parquet/arrow-flight to `18.0.0` (#5079 ) * chore: Update datafusion to 10.0.0, arrow/parquet/arrow-flight to 18 * chore: Run cargo hakari tasks * fix: update cargo pin Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-18 15:01:03 +00:00
Jake Goulding	635f535e0e	refactor: replace level_2 with level_1	2022-07-16 21:49:45 -04:00
kodiakhq[bot]	18ffe581b5	Merge branch 'main' into dependabot/cargo/tokio-1.20.0	2022-07-14 14:18:51 +00:00
dependabot[bot]	9b67de2f43	chore(deps): Bump tokio from 1.19.2 to 1.20.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.19.2 to 1.20.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.19.2...tokio-1.20.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2022-07-14 01:21:43 +00:00
Carol (Nichols \|\| Goulding)	de74415cbe	feat: Gather parquet files for a partition compaction operation Fixes #5118. Given a partition ID, look up the non-deleted Parquet files for that partition. Separate them into level 0 and level 1, and sort the level 0 files by max sequence number. This is not called anywhere yet.	2022-07-13 16:53:21 -04:00
Carol (Nichols \|\| Goulding)	d19c468b9d	fix: Remove unused level 1 compaction; move level 2 to level 1 Fixes #5119.	2022-07-13 15:05:09 -04:00
Carol (Nichols \|\| Goulding)	61c023139b	refactor: Switch compaction levels to an enum with values rather than separate consts Bonuses: - Type checking - Validation - Less casting - Exhaustiveness checking - Less use of the numerical value	2022-07-13 11:30:36 -04:00
Carol (Nichols \|\| Goulding)	34fcf6a584	fix: Line wrap to 100 columns	2022-07-13 11:29:13 -04:00

1 2 3 4 5 ...

425 Commits (c2f479d3709a6a54642fda31779918c312eb6d8a)