influxdb

Commit Graph

Author	SHA1	Message	Date
Joe-Blount	c05739ff20	chore(compactor): move CompactRange up to RoundInfo (#8736 ) * chore(compactor): move CompactRange up to RoundInfo * chore: insta updates from compactor CompactRange refactor * chore: lint cleanup * chore: addressing some of the comments * chore: remove duplicated done check * chore: variable renaming	2023-09-19 16:53:36 +00:00
Joe-Blount	80f8b55baa	fix(compactor): retry OOM error at reduced concurrency (#8763 ) * fix(compactor): retry OOM error at reduced concurrency * chore: address comment	2023-09-18 20:01:08 +00:00
Carol (Nichols \|\| Goulding)	a42a00b6f2	refactor: Consistently order max tables first, then max columns I can't handle it.	2023-09-15 16:23:31 -04:00
Carol (Nichols \|\| Goulding)	c32a04388c	feat: Wrap max tables and max columns per table values in newtypes	2023-09-15 13:09:36 -04:00
Joe-Blount	ce34d4ffa3	fix: handle oversized files in compactor (#8700 )	2023-09-12 00:18:56 +00:00
Joe-Blount	90fc0370ae	chore: adjust compactor catalog query rate limiter for small clusters (#8699 )	2023-09-08 18:53:08 +00:00
Andrew Lamb	45c6bfea9c	chore: Update datafusion, arrow/flight/parquet to `46.0.0` , object_store to `0.7.0` (#8577 ) * chore: Update DataFusion pin * chore: Update for new API * fix: Update for API * fix: update compactor test * fix: Update to patched version of arrow 46.0.0 * fix: map `DataFusionError::Configuration` to an internal error * fix: do not use deprecated API --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-08 12:49:57 +00:00
dependabot[bot]	7f20b0faa0	chore(deps): Bump bytes from 1.4.0 to 1.5.0 (#8692 ) Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.4.0 to 1.5.0. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/bytes/compare/v1.4.0...v1.5.0) --- updated-dependencies: - dependency-name: bytes dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-08 12:17:12 +00:00
Michael Angerman	d87c2ae821	chore: change loglevel to info on No compaction job found (#8684 )	2023-09-06 14:49:47 -07:00
Joe-Blount	98960a353c	fix(compactor): prevent sort order mismatches from creating overlapping chains (#8675 ) * fix(compactor): prevent sort order mismatches from creating overlapping regions * chore: test additions for incorrectly created regions * fix(compactor): more sort order mismatch fixes * chore: insta updates * chore: insta updates after merge	2023-09-06 14:53:09 +00:00
Dom	b5a9a6c141	Merge branch 'main' into dom/gossip	2023-09-06 11:24:56 +01:00
Nga Tran	2a71fcbc76	feat: reland compactor consumes sort_key_ids (#8674 )	2023-09-05 18:45:49 +00:00
NGA-TRAN	399c0e257d	chore: prepare a revert PR just in case	2023-09-05 13:26:18 -04:00
Nga Tran	fb453ede1e	chore: reland 'teach compactor to use sortkey_ids' after catalog migration is fixed (#8575 )	2023-09-05 17:05:13 +00:00
Dom Dwyer	78d40ba59a	feat(compactor): gossip compaction complete events Add post-compaction calls to send a "compaction complete" gossip event containing the set of upgraded, deleted, and newly created parquet files.	2023-09-05 14:01:09 +02:00
Dom Dwyer	5aee376766	feat(compactor): initialise gossip subsystem Optionally initialise the gossip subsystem in the compactor. This will cause the compactor to perform PEX and join the cluster, but as it registers no topic interests, it will not receive any application-level payloads. No messages are currently sent (in fact, gossip shuts down immediately).	2023-09-05 13:58:19 +02:00
Dom Dwyer	871f9b6807	refactor(compactor): sort Cargo.toml dependencies Alphabetically sort the dependencies to avoid diff noise.	2023-09-05 13:58:02 +02:00
Joe-Blount	7a6de3d422	feat: use recurring L0 end time as hint for split times (#8635 ) * chore: add test case for L0 added after vertical splitting * feat: use recurring L0 end time as hint for split times * chore: insta test updates * chore: add split time verification to simulator	2023-09-01 15:34:26 +00:00
wiedld	a4567a80e6	chore: more cleanup of compactor panics (#8620 )	2023-08-30 14:41:42 -07:00
Joe-Blount	68a9019768	chore: remove compactor concurrency scaling (#8606 ) * chore: remove compactor concurrency scaling * chore: comment update to retrigger tests --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-29 23:42:53 +00:00
wiedld	5011f7fe22	chore: add partition_id into panic message (#8605 )	2023-08-29 16:37:21 -07:00
Joe-Blount	0996a95630	chore: enable more ManySmallFiles compactions (#8603 ) * chore: enable more ManySmallFiles compactions * chore: insta churn	2023-08-29 20:42:03 +00:00
wiedld	72c48d34f8	chore: add partition id to compactor panics (#8602 )	2023-08-29 12:40:11 -07:00
Joe-Blount	38147ec25e	chore: compact available regions before splitting everything (#8586 )	2023-08-29 13:29:11 +00:00
Carol (Nichols \|\| Goulding)	12b8095c46	feat: Upgrade to Rust 1.72.0 (#8589 ) * feat: Upgrade to Rust 1.72.0 * fix: Allow a warning about an error we're intentionally creating This is a test for an error. This lint warns that this code will cause an error. Thanks lint, that's what we wanted! * chore: rustfmt 1.72 * fix: Remove unnecessary hashes in raw string literals Thanks Clippy! https://rust-lang.github.io/rust-clippy/master/index.html#/needless_raw_string_hashes Note that there are a number of false negatives with this lint; see https://github.com/rust-lang/rust-clippy/issues/11420 * fix: Remove unnecessary explicit iteration Looks like clippy::explicit_iter_loop was improved. https://rust-lang.github.io/rust-clippy/master/index.html#/explicit_iter_loop * fix: Allow clippy::manual_try_fold in a few places Some of these might not be possible to rewrite with try_fold, or at least not trivially. I don't feel confident enough to change these, in any case. I think the lint is good to have on for future code though, so that new code can be written with try_fold. * fix: Remove useless creation of vectors when an array will do Mostly in tests. Also fix some long lines. Thanks Clippy! https://rust-lang.github.io/rust-clippy/master/index.html#/useless_vec * fix: Allow a single range in a vec init, which is actually what we want Looks like Clippy's trying to catch a common mistake here, but for realz we actually want `Vec<Range<usize>>` not `Vec<usize>` https://rust-lang.github.io/rust-clippy/master/index.html#/single_range_in_vec_init * fix: Remove a useless conversion This looks like removing explicit iteration, but it's actually caught by useless_conversion. https://rust-lang.github.io/rust-clippy/master/index.html#/useless_conversion * fix: Remove redundant pattern matching Thanks Clippy! https://rust-lang.github.io/rust-clippy/master/index.html#/redundant_pat * fix: Allow an unwrap on a literal None in a test This matches with the other tests better, and also when I tried to remove the `unwrap_or_default` it changed the JSON sent from something with an empty value to `null`, so I think the `or_default` part is actually changing from one `None` to another `None`. https://rust-lang.github.io/rust-clippy/master/index.html#/unnecessary_literal_unwrap	2023-08-29 05:57:38 +00:00
Joe-Blount	02c338ba70	fix: don't add empty compact branches (#8587 )	2023-08-28 19:02:19 +00:00
Joe-Blount	1df5948c97	feat: Add Compaction Regions (#8559 ) * feat: add CompactRanges RoundInfo type * chore: insta test updates for adding CompactRange * feat: simplify/improve ManySmallFiles logic, now that its problem set is simpler * chore: insta test updates for ManySmallFiles improvement * chore: upgrade files more aggressively * chore: insta updates from more aggressive file upgrades * chore: addressing review comments	2023-08-28 12:59:12 +00:00
Joe-Blount	46179bf164	chore: limit compactor region count computations	2023-08-25 14:41:24 -05:00
Nga Tran	2eb74ddb87	chore: revert teaching compactor to use sort_key_ids (#8574 )	2023-08-25 13:21:12 +00:00
Nga Tran	246918feb6	feat: teach compactor to use sort_key_ids instead of sort_key (#8560 ) * feat: teach compactor to use sort_key_ids instead of sort_key * test: update the test output after chatting with Joe and know the reason of the chnanges	2023-08-24 16:16:12 +00:00
Joe-Blount	36e66158a2	chore: add log message before compacting partition (#8569 )	2023-08-24 14:13:23 +00:00
Joe-Blount	53915f0653	feat: move vertical splitting & detect non-linear data (#8506 ) * chore: test changes and additions in preparation for functional changes * feat: move vertical splitting to RoundInfo calculation, align splits to L1 files * chore: insta test churn * feat: detect non-linear data distribution in vertical splitting * chore: add tests for non-linear data distribution * chore: insta churn * chore: cleanup & comment additions * chore: some variable renaming	2023-08-21 18:22:25 +00:00
Joe-Blount	1cc0926a7f	feat: track why bytes are written in compactor simulator (#8493 ) * feat: add tracking of why bytes are written in simulator * chore: enable breakdown of why bytes are written in a few larger tests * chore: enable writes breakdown in another test --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-16 13:39:37 +00:00
Joe-Blount	6d4729db1d	chore: Insta test updates	2023-08-14 15:41:37 -05:00
Joe-Blount	3003e44e78	feat: allow compactor to split 2ns files	2023-08-14 15:40:58 -05:00
Joe-Blount	964b2f6b97	fix: compactor simulator math error creates 0 byte files (#8478 ) * fix: math error in simulator results in 0 byte files during simulations * chore: insta churn from simulator file size fix	2023-08-14 20:00:19 +00:00
dependabot[bot]	4c63338354	chore(deps): Bump async-trait from 0.1.72 to 0.1.73 (#8481 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.72 to 0.1.73. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.72...0.1.73) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-14 06:25:33 +00:00
Joe-Blount	3213396caf	feat: force early L1 compaction, avoid invariant violations (#8444 ) * feat: force early L1 compaction, avoid invariant violations * chore: variable renaming * chore: make SplitBasedFileClassifier honor file selection for SimulatedLeadingEdge	2023-08-08 20:10:36 +00:00
Joe-Blount	b7c0bcb61f	chore(compactor): move divide & branches to RoundInfo calculation (#8410 )	2023-08-03 22:02:06 +00:00
wiedld	81b5d80a91	feat(idpe-17935): move filtering of skipped partitions to the scheduler (#8358 ) * catalog.get_in_skipped_compaction() should handle for multiple partitions * add the ability to perform transformation on sets of partitions (rather than filtering one by one). Start with the transformation to remove skipped partitions, in the scheduler. * move the env var and cli flag setting, for when to ignore skipped partitions, to the scheduler config.	2023-08-03 11:43:09 -07:00
wiedld	09ee60e35f	refactor(idpe-17789): rename mod to compaction_job_done_sink	2023-08-02 11:21:22 -07:00
wiedld	68ab2c97c1	feat(idpe 17789): provide job to CompactionJobDoneSink (formerly known as PartitionDoneSink) (#8368 ) * rename PartitionDoneSink to CompactionJobSink. and change signature in trait * update all trait implementations, including local variables and comments * rename partition_done_sink in the components and driver, to be compaction_job_done_sink	2023-08-02 11:19:50 -07:00
Carol (Nichols \|\| Goulding)	71b32d4dd6	fix: Persist Parquet files with the TransitionPartitionId	2023-08-02 10:17:23 -04:00
Carol (Nichols \|\| Goulding)	5db8ed677f	fix: Update compactor's use of QueryChunk to return TransitionPartitionId Also rename PartitionInfo's transition_partition_id to be partition_id so that it's consistent with the QueryChunk method. We might want to rename the partition_id field to catalog_partition_id, but for now I think the types will make compactor usage clear enough. This gets compactor to compile and pass its tests.	2023-08-02 10:17:22 -04:00
Carol (Nichols \|\| Goulding)	e4b9455344	feat: Have QueryChunk return a reference from partition_id()	2023-08-02 10:17:22 -04:00
wiedld	cc70a2c38b	Merge branch 'main' into idpe-17789/provide-job-on-commit	2023-07-31 08:20:45 -07:00
Joe-Blount	44e266d000	fix: compaction looping fixes (#8363 ) * fix: selectively merge L1 to L2 when L0s still exist * fix: avoid grouping files that undo previous splits * chore: add test case for new fixes * chore: insta test churn * chore: lint cleanup	2023-07-31 13:15:49 +00:00
Carol (Nichols \|\| Goulding)	4a9e76b8b7	feat: Make parquet_file.partition_id optional in the catalog (#8339 ) * feat: Make parquet_file.partition_id optional in the catalog This will acquire a short lock on the table in postgres, per: <https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads> This allows us to persist data for new partitions and associate the Parquet file catalog records with the partition records using only the partition hash ID, rather than both that are used now. * fix: Support transition partition ID in the catalog service * fix: Use transition partition ID in import/export This commit also removes support for the `--partition-id` flag of the `influxdb_iox remote store get-table` command, which Andrew approved. The `--partition-id` filter was getting the results of the catalog gRPC service's query for Parquet files of a table and then keeping only the files whose partition IDs matched. The gRPC query is no longer returning the partition ID from the Parquet file table, and really, this command should instead be using `GetParquetFilesByPartitionId` to only request what's needed rather than filtering. * feat: Support looking up Parquet files by either kind of Partition id Regardless of which is actually stored on the Parquet file record. That is, say there's a Partition in the catalog with: Partition { id: 3, hash_id: abcdefg, } and a Parquet file that has: ParquetFile { partition_hash_id: abcdefg, } calling `list_by_partition_not_to_delete(PartitionId(3))` should still return this Parquet file because it is associated with the partition that has ID 3. This is important for the compactor, which is currently only dealing in PartitionIds, and I'd like to keep it that way for now to avoid having to change Even More in this PR. * fix: Use and set new partition ID fields everywhere they want to be --------- Co-authored-by: Dom <dom@itsallbroken.com>	2023-07-31 12:40:56 +00:00
wiedld	1ce8e50f1a	feat(idpe-17789): provide job from compactor --> scheduler, on commit	2023-07-28 15:58:50 -07:00
wiedld	9a7ff9ecfc	chore(idpe-17789): update code comments to reflect both jobs and partitions	2023-07-27 15:39:18 -07:00

1 2 3 4 5 ...

622 Commits (29462d0fe570990f7eb08c3af545f1ba6d2d3800)