influxdb

Commit Graph

Author	SHA1	Message	Date
Carol (Nichols \|\| Goulding)	3a368c02c2	fix: Remove now-unused cold_input_size_threshold_bytes	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	eefc71ac90	fix: Remove now unused max_cold_concurrent_size_bytes	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	2a22d79c94	feat: Make cold compaction like hot compaction except for candidate selection Temporarily disable full compaction from level 1 to 2. Re-use the memory budget estimation and parallelization for cold compaction. Rather than choosing cold compaction candidates and then in parallel compacting each partition from level 0 to 1 and then 1 to 2, this commit switches to compacting in parallel (by memory budget) all candidates form level 0 to 1. The next commit will re-enable full compaction of all partitions in parallel (by memory budget).	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	76228c9fd6	refactor: Move compact_in_parallel and compact_one_partition to lib and make more general Cold compaction is going to use these too.	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	7a3dffb750	refactor: Create wrapper fns that don't take size overrides So that we don't have to pass an empty hashmap in as many places in real code, because the size overrides are only for tests	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	608290b83d	fix: Make some hot compaction code more general/parameterized	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	2a5ef3058c	refactor: Move compact_candidates_with_memory_budget to share with cold	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	955e7ea824	fix: Remove unused Error struct	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	ee3e1b851d	fix: Clean up some long lines, comments	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	77f3490246	refactor: Extract cold compaction code into a module like hot	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	c12b3fbb03	refactor: Move to a module named hot to reduce naming duplication My fingers are tired of typing 🤣	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	e3f9984878	docs: Clean up some comments while reading through	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	f2f99727ba	feat: Add metrics for files going into cold compaction	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	ad2db51ac2	refactor: Extract a function to share logic for compacting to L1 or L2	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	6436afc3d9	fix: Remove cold max bytes CLI option; use existing max bytes CLI option As discussed in https://github.com/influxdata/influxdb_iox/issues/5330#issuecomment-1218170063	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	723aedfbca	test: Add more cases for cold compaction	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	7cd78a3020	fix: Extract and test logic that groups files for cold compaction	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	da201ba87f	fix: Select by num of both l0 and l1 files for cold compaction Now that we're going to compact level 1 files in to level 2 files as well.	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	6bba3fafaa	fix: If full compaction group has only 1 file, upgrade level As opposed to running full compaction. Makes the catalog function general and take the level as a parameter rather than only upgrade to level 1.	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	10ba3fef47	feat: Compact cold partitions completely Fixes #5330.	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	327446f0cd	fix: Change default cold hours threshold from 24 hours to 8 As requested in https://github.com/influxdata/influxdb_iox/issues/5330#issuecomment-1212468682	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	a64a705b60	refactor: Extract a fn for the first step of cold compaction Which is currently the only step, compacting any remaining level 0 files into level 1. Make a TODO function for performing full compaction of all level 1 files next.	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	7249ef4793	fix: Don't record cold compaction metrics if compaction fails	2022-09-12 13:13:25 -04:00
Marco Neumann	8933f47ec1	refactor: make `QueryChunk::partition_id` non-optional (#5614 ) In our data model, a chunk always belongs to a partition[^1], so let's not make this attribute optional. The optional value only leads to -- mostly surprising -- conditional behavior, ranging from "do not equalize the partition sort key" (querier) to "always consider the chunk overlapping" (iox_query when dealing with ingester chunks). [^1]: This is even true when the chunk belongs to a parquet file that is not yet added to the catalog, contrary to what a comment in the ingester stated. The catalog and data model used by the querier are two totally different things.	2022-09-12 13:52:51 +00:00
Carol (Nichols \|\| Goulding)	13de7ac954	feat: Record reasons for skipping compaction of a partition in the database Closes #5458.	2022-09-09 16:40:48 -04:00
Nga Tran	f03e370ecc	refactor: allocate more accurate length for a hashmap (#5592 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-09 15:37:29 +00:00
dependabot[bot]	786ce75e26	chore(deps): Bump tokio-util from 0.7.3 to 0.7.4 (#5596 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.3 to 0.7.4. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.3...tokio-util-0.7.4) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-09-09 07:40:16 +00:00
Joe-Blount	333cfa4f3c	chore: address comments - use TimestampMinMax passed by reference	2022-09-07 16:36:39 -05:00
Joe-Blount	97ebad5adb	chore: rustfmt changes	2022-09-07 13:22:36 -05:00
Joe-Blount	4188230694	fix: avoid splitting compaction output for time ranges with no chunks	2022-09-07 13:01:14 -05:00
Carol (Nichols \|\| Goulding)	b5ca99a3d5	refactor: Make CompactorConfig fields pub I'm spending way too long with the wrong number of arguments to CompactorConfig::new and not a lot of help from the compiler. If these struct fields are pub, they can be set directly and destructured, etc, which the compiler gives way more help on. This also reduces duplication and boilerplate that has to be updated when the config fields change.	2022-09-07 13:28:19 -04:00
Carol (Nichols \|\| Goulding)	54eea79773	refactor: Make filtering the parquet files into a closure argument too So that the cold compaction can use different filtering but still use the memory budget function. Not sure I'm happy with this yet, but it's a start.	2022-09-07 13:26:42 -04:00
Carol (Nichols \|\| Goulding)	3e76a155f7	refactor: Make memory budget compaction group function more general In preparation for using it for cold compaction too.	2022-09-07 13:26:42 -04:00
Carol (Nichols \|\| Goulding)	1f69d11d46	refactor: Move hot compaction function into hot compaction module	2022-09-07 13:26:40 -04:00
Carol (Nichols \|\| Goulding)	85fb0acea6	refactor: Extract read_parquet_file test helper function to iox_tests::utils	2022-09-07 13:21:28 -04:00
Marco Neumann	adeacf416c	ci: fix (#5569 ) * ci: use same feature set in `build_dev` and `build_release` * ci: also enable unstable tokio for `build_dev` * chore: update tokio to 1.21 (to fix console-subscriber 0.1.8 * fix: "must use"	2022-09-06 14:13:28 +00:00
Marco Neumann	064f0e9b29	refactor: use DataFusion to read parquet files (#5531 ) Remove our own hand-rolled logic and let DataFusion read the parquet files. As a bonus, this now supports predicate pushdown to the deserialization step, so we can use parquets as in in-mem buffer. Note that this currently uses some "nested" DataFusion hack due to the way the `QueryChunk` interface works. Midterm I'll change the interface so that the `ParquetExec` nodes are directly visible to DataFusion instead of some opaque `SendableRecordBatchStream`.	2022-09-05 09:25:04 +00:00
Marco Neumann	f45cbfb88d	refactor: fine-grained file size mocking (#5541 ) * refactor: do not override parquet file size in querier This is going to be an issue when we actually rely on the size for reading, see #5531. * refactor: use selected file size mocking in compactor Do not blindly override parquet file sizes for all subsystems. This is going to be an issue when we actually rely on the size for reading, see #5531. * refactor: remove ability to override file sizes in catalog Blindly overriding data for all subsystems is dangerous, because some parts of our stack actually rely on the actual file size. See #5531. * docs: explain `size_overrides`	2022-09-05 08:50:04 +00:00
Nga Tran	dde65fa7ef	fix: remove timestamp functions from SQLs to be able to use index for improving performance (#5547 )	2022-09-02 19:43:52 +00:00
kodiakhq[bot]	b9959fa2d8	Merge branch 'main' into cn/even-more-compactor-tests	2022-09-01 21:02:04 +00:00
Nga Tran	c8cbc5299b	feat: make compactors to select candidates based on the last n minutes (#5535 ) * feat: make compactors to select candidates based on the last n minutes to reduce workload for postgres catalog query * refactor: remove 1-minute case per review comment	2022-09-01 20:07:26 +00:00
Carol (Nichols \|\| Goulding)	16d631a247	test: Add test for current behavior of skipping a table without columns	2022-08-31 16:26:02 -04:00
Carol (Nichols \|\| Goulding)	1120b49821	refactor: Extract the mock compactor function into a type	2022-08-31 16:17:43 -04:00
Carol (Nichols \|\| Goulding)	b893251efc	test: Add a test that compacting no candidates compacts nothing	2022-08-31 15:30:25 -04:00
Carol (Nichols \|\| Goulding)	b0e871196c	test: Use more iox test utils in this compactor test	2022-08-31 14:37:59 -04:00
Nga Tran	a32d5180b3	fix: loop forever in compact_hot_partition_candidates (#5518 ) * fix: loop forever in compact_hot_partition_candidates * chore: cleanup * fix: avoid using continues that will cause bugs in corner cases * fix: Pass compaction fn as a closure instead to allow collection of groups in test * fix: Add Send bound as suggested by clippy * fix: fix the test to return data of round 3 instead of round 2 Co-authored-by: Carol (Nichols \|\| Goulding) <carol.nichols@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-31 17:46:59 +00:00
Andrew Lamb	6669d85fb4	chore: Update datafusion + arrow/parquet to `21.0.0` (#5519 ) * chore: Update arrow/arrow-flight/parquet to 21.0.0 * chore: Update datafusion pin * chore: Fix arrow update script * chore: Update Cargo.lock * chore: Update for new API	2022-08-31 13:30:47 +00:00
Nga Tran	cb10a7c6d8	feat: More accurate memory estimate for compaction (#5471 ) * feat: initial implementation of memory estimation for a compaction * feat: estimate size of files and have the right actions for the needed budget * feat: run candidates in parallel * fix: have the right name for the column field of the output struct * feat: add metrics for estimated budgets * chore: cleanup * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * fix: fix syntax after applying review's suggestions * refactor: Convert a Vec to VecDeque to go well with pop and push * chore: remove max_concurrent_size_bytes and input_size_threshold_bytes * chore: remove input_file_count_threshold * test: tests for estimate_arrow_bytes_for_file Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-30 13:44:44 +00:00
Dom Dwyer	2fc0ddbea1	fix: compactor tolerates empty output Changes the compactor code to tolerate a SplitExec yielding an empty partition (with no rows). This raises a WARN as the situation in which this is acceptable is very rare, and is more likely indicative of an opportunity to improve the SplitExec usage (i.e. pruning out unnecessary split points).	2022-08-30 14:52:31 +02:00
Carol (Nichols \|\| Goulding)	58f0b63cdc	refactor: Rename KafkaTopic to Topic or TopicMetadata or topic name as appropriate	2022-08-29 14:27:02 -04:00
Carol (Nichols \|\| Goulding)	74c9529062	fix: Rename KafkaPartition to ShardIndex	2022-08-29 14:07:18 -04:00
Carol (Nichols \|\| Goulding)	c9567cad7d	fix: Rename some more sequencer to shard	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	6443858870	fix: Rename compactor option from sequencer to shard	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	fe9c474620	fix: rustfmt	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	f6c93f7e67	fix: Remove moot comment	2022-08-29 14:06:44 -04:00
Carol (Nichols \|\| Goulding)	698f1a47ff	refactor: Rename test structures from sequencer to shard where appropriate	2022-08-29 14:06:44 -04:00
Jake Goulding	4abf21c724	refactor: Rename Sequencer (and its entourage) to Shard	2022-08-29 14:06:43 -04:00
Nga Tran	3220c6f88b	feat: add file_count_threshold for comapcting cold partitions (#5456 ) * feat: file file_count_threshold for comapcting cold partitions to make it consistent with the hot case and help set up to avoid oom easier * chore: remove unecessary commments	2022-08-23 20:12:21 +00:00
kodiakhq[bot]	2b3ca54168	Merge branch 'main' into cn/upgrade-l0-metrics	2022-08-17 16:01:42 +00:00
Andrew Lamb	7f0ae53d6f	chore: Update to (almost) released object_store 0.4.0 (#5419 ) * chore: update object_store * chore: update hakari config * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-17 13:44:48 +00:00
Carol (Nichols \|\| Goulding)	ef716a5b90	fix: Remove compaction level attribute from the compaction_input_file_bytes metric	2022-08-15 10:50:04 -04:00
Carol (Nichols \|\| Goulding)	a9ed32df89	fix: Remove compaction_counter as it's now redundant with the compaction_input_file_bytes histogram	2022-08-15 10:23:29 -04:00
Carol (Nichols \|\| Goulding)	af95ce7ca6	feat: Add a histogram tracking sizes of files used as inputs to compaction Fixes #5348.	2022-08-15 10:13:54 -04:00
Carol (Nichols \|\| Goulding)	cd6c809fe0	fix: Change metric tracking sizes of files selected for compaction to a histogram Connects to #5348.	2022-08-15 10:13:54 -04:00
Carol (Nichols \|\| Goulding)	b982bdaf2f	fix: Derive Eq when we derive PartialEq and members can derive Eq Allow this in generated code that we don't control, though. Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq	2022-08-11 15:04:06 -04:00
Marco Neumann	90fec1365f	feat: intern schemas during query planning (#5215 ) * feat: intern schemas during query planning Helps with #5202. * refactor: `SchemaMerger::build` shall return an `Arc` * feat: `SchemaMerger::with_interner` * refactor: hash-based schema interning	2022-08-11 12:28:51 +00:00
Jake Goulding	68e64af4d1	refactor: extract compactor loop body to call it separately	2022-08-10 11:28:51 -04:00
Jake Goulding	49c5281454	refactor: Supersede old CompactorHandlerImpl constructor	2022-08-10 11:28:51 -04:00
Jake Goulding	cc061b6ce9	refactor: add CompactorHandlerImpl::new_with_compactor This will allow us to refactor the code a level up to create a `Compactor` directly.	2022-08-10 11:28:51 -04:00
Andrew Lamb	c0fc91c627	chore: Warn if a parquet file has no sort key (#5368 )	2022-08-10 11:56:50 +00:00
Andrew Lamb	16ddc5efc6	chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360 ) * chore: Update datafusion and arrow * chore: Update Cargo.lock * chore: update to Decimal128 * chore: Update tonic/prost/pbjson/etc * chore: Run cargo hakari tasks * fix: doctest in generated types Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-09 17:30:44 +00:00
Nga Tran	b71c1a09ea	feat: only sleep when there are neither hot nor cold partitions to compact (#5329 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-05 16:36:36 +00:00
Carol (Nichols \|\| Goulding)	facc967320	fix: Specify hot or cold in more log messages	2022-08-04 16:55:48 -04:00
Carol (Nichols \|\| Goulding)	c9d66c30b1	fix: Make this field name consistent With the other fields on this struct and with the corresponding field on the clap block struct.	2022-08-04 16:55:48 -04:00
Carol (Nichols \|\| Goulding)	da0b031c44	feat: Add parameters to limit total memory usage of cold partition compaction	2022-08-04 16:55:48 -04:00
Carol (Nichols \|\| Goulding)	9d8f94d0d7	fix: Remove an unneeded sleep The cold case won't make a hot busy loop (hah), we'll just go back to working on the hot partitions if there's no cold partitions to do.	2022-08-04 16:55:48 -04:00
Carol (Nichols \|\| Goulding)	e1c45e836a	test: Remove copypastaed assertions that duplicate a different test	2022-08-04 16:55:48 -04:00
Carol (Nichols \|\| Goulding)	cb6442018e	test: Add more test cases varying number of partitions per sequencer	2022-08-04 16:55:48 -04:00
Carol (Nichols \|\| Goulding)	d55f45a5c2	feat: Run compaction of hot partitions a configurable number of times more than cold	2022-08-04 16:55:48 -04:00
Carol (Nichols \|\| Goulding)	827e82cfb8	feat: Upgrade one level 0, non-overlapping file without compacting Fixes #1078.	2022-08-04 16:55:47 -04:00
Carol (Nichols \|\| Goulding)	c1d016a00a	feat: Upgrade cold level 0 files when they have no overlaps	2022-08-04 16:55:47 -04:00
Carol (Nichols \|\| Goulding)	9052eabe50	feat: Separate out hot/cold partition compaction and filtering Cold partition compaction will (in the next commit) upgrade a level 0 file without any overlaps rather than running compaction. Cold partition filtering gathers all level 0 files in the (already deemed cold) partition with all overlapping level 1 files, and does not limit the set of files being compacted by their number or size.	2022-08-04 16:55:47 -04:00
Carol (Nichols \|\| Goulding)	fc62c82722	feat: Select cold partitions	2022-08-04 16:55:47 -04:00
Carol (Nichols \|\| Goulding)	6e9c752230	refactor: Extract current compaction into a fn for 'hot' partitions	2022-08-04 16:55:47 -04:00
Marco Neumann	eea8270e83	fix: `compute_split_time` with small step sizes (#5309 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-04 13:40:30 +00:00
Marco Neumann	039950b4fd	feat: ensure clean compactor executor shutdown	2022-08-03 18:07:00 +02:00
Marco Neumann	fd74f2639b	fix: do not attempt to poll future lists in compactor It seems that the buffering / parallelization code cannot deal with empty lists and just freezes forever (which blocks shutdown but will also freeze the compactor forever).	2022-08-03 18:04:05 +02:00
Nga Tran	4812db9887	feat: fewer buckets but larger ranges for compaction duration histogram (#5259 ) * chore: reduce log info * feat: fewer buckets but larger ranges for compaction duration histogram * chore: Apply suggestions from code review Co-authored-by: Marko Mikulicic <mkm@influxdata.com> * chore: run fmt after appying reviewer's suggestions Co-authored-by: Marko Mikulicic <mkm@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-02 14:19:30 +00:00
dependabot[bot]	fbd39844d8	chore(deps): Bump async-trait from 0.1.56 to 0.1.57 (#5247 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.56 to 0.1.57. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.56...0.1.57) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-08-01 08:30:33 +00:00
Andrew Lamb	7cc8486e5a	fix: remove left over `deb!` macro (#5224 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-30 15:33:02 +00:00
Marco Neumann	0e9695f202	feat: add a few helpful compactor debug logs (#5235 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-28 17:38:33 +00:00
Andrew Lamb	9215a534d0	chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` (#5229 ) * chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` * chore: Run cargo hakari tasks * fix: Update for API changes * fix: clippy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-28 08:10:47 +00:00
Nga Tran	fcce00bf09	feat: run many compact partitions in parallel (#5230 ) * feat: run many compact partitions in parallel * refactor: Use rust futures fu to run compactor jobs in parallel * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-07-27 20:55:45 +00:00
Andrew Lamb	7eebe061a6	fix: reduce log verbosity for `found compaction candidates` message (#5225 ) * fix: reduce log verbosity * refactor: sleep for a sec if no work, print debug Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-27 19:35:31 +00:00
Marco Neumann	9a9a1a4777	feat: limit per-table chunk data for every query (#5223 ) * feat: `QueryChunk::as_any` * feat: allo `ChunkPruner::prune_chunks` to fail * feat: limit per-table chunk data for every query Closes #5211. * fix: address review comments Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-07-27 13:20:05 +00:00
Andrew Lamb	bbcf4ec64e	fix: Run compactor streams in parallel to avoid deadlock (#5212 ) * fix: run compaction streams at once * fix: make it compile * fix: improve wording * fix: use task to write parquet files in parallel	2022-07-26 12:17:38 +00:00
Carol (Nichols \|\| Goulding)	f4d0f13689	feat: split large compactions (#5195 ) * feat: Split large compactions into multiple compacted files Connects to #5121 * refactor: Extract update catalog function and error type * refactor: Share physical plan to object store streaming And only differ in the logical plan building based on split times in different compaction cases. * fix: Test for a split time equal to the max time and don't split then	2022-07-22 20:35:31 +00:00
Nga Tran	69640c0ba5	feat: Different branch to hook up new compaction algorithm (#5194 ) * chore: cherry pick the first 3 commits of branch cn/connect-new-compaction * fix: modify the test to work correctly with compactor running Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-22 19:29:47 +00:00
Carol (Nichols \|\| Goulding)	94343b1f27	fix: compute_split_time returns one value when min_time = max_time (#5192 ) * test: Document the behavior of compute_split_time when min time = max time * fix: compute_split_time returns one value when min_time = max_time Co-authored-by: NGA-TRAN <nga-tran@live.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-22 17:29:50 +00:00
Nga Tran	bbe07fcc79	feat: metrics for selection partition candidates for compaction (#5190 ) * feat: metrics for selection partition candidates for compaction * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * refactor: remove unused metric labels Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-22 15:25:53 +00:00
Carol (Nichols \|\| Goulding)	d9ca28e83a	fix: Compact one level 0 file to avoid getting stuck on it If compaction is called on one level 0 file, the work of compaction doesn't really need to be done, but for simplicity's sake for now, run it through the compaction query/write process rather than special casing it. This will keep us from getting stuck in the unlikely event that a partition with one level 0 file gets selected as a top candidate for compaction.	2022-07-22 09:17:57 -04:00
Carol (Nichols \|\| Goulding)	aa030ba132	test: Additional coverage for the compaction operation	2022-07-21 16:01:26 -04:00
Carol (Nichols \|\| Goulding)	86c50b8033	fix: Use 0 for level 1 chunk order and max seq num for level 0 chunk order	2022-07-21 15:49:52 -04:00
Carol (Nichols \|\| Goulding)	f847365b1a	fix: Use actual partition values rather than placeholders	2022-07-21 15:10:24 -04:00
Carol (Nichols \|\| Goulding)	d46ec31aa1	feat: Compact filtered parquet files Connects to #5121.	2022-07-21 13:37:36 -04:00
Nga Tran	50186ef5ee	feat: add sort key and partition key into PartitionCompactionCandidateWithInfo (#5175 )	2022-07-21 16:57:17 +00:00
Nga Tran	69cb3f2b19	refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151 ) * refactor: remove min_sequnce_number * fix: typos * fix: remove min_sequencer_number from new files from merging main * fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier * test: add test_compactor_collision back but modify the input to make it work woth new changes Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-21 13:51:54 +00:00
Marco Neumann	0561423475	refactor: enforce proper `IOxSessionContext` (#5158 ) - remove `IOxSessionContext::default()` because untracked contexts should only be created by tests - remove `Option<IOxSessionContext>` because it is a typed workaround for `IOxSessionContext::default` Tests should use `IOxSessionContext::testing` and all _normal_ users should create proper contexts. I suspect this will help tracing or at least prevent silent regressions. See #5129. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-20 16:25:43 +00:00
dependabot[bot]	278a7f91af	chore(deps): Bump bytes from 1.1.0 to 1.2.0 (#5156 ) Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.1.0 to 1.2.0. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/bytes/compare/v1.1.0...v1.2.0) --- updated-dependencies: - dependency-name: bytes dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-20 10:00:08 +00:00
Carol (Nichols \|\| Goulding)	1eb4640931	fix: Stop when exactly equal to the max file size limit too	2022-07-18 15:41:17 -04:00
Carol (Nichols \|\| Goulding)	154cd28928	fix: Clarify filter_parquet_files doc comment	2022-07-18 15:41:17 -04:00
Carol (Nichols \|\| Goulding)	0a545bf325	fix: Clarify how metrics are recorded in the docs for the metric fields	2022-07-18 15:41:17 -04:00
Carol (Nichols \|\| Goulding)	07e10852a8	feat: Add an input file count threshold to the compactor settings	2022-07-18 15:41:17 -04:00
Carol (Nichols \|\| Goulding)	128833e7d9	fix: Change placeholder new_param to input_size_threshold_bytes	2022-07-18 15:16:43 -04:00
Carol (Nichols \|\| Goulding)	d62b1ed7ee	feat: Select a subset of parquet files for a partition to compact Fixes #5120.	2022-07-18 15:14:22 -04:00
Carol (Nichols \|\| Goulding)	4416f1ce37	fix: Remove max number of level 0 files configuration option	2022-07-18 15:08:16 -04:00
Nga Tran	c8f4000f04	feat: Select compaction candidates (#5131 ) * feat: initial implementation for selecting compaction candidates * feat: 2 catalog functions to choose the most thorughput partitions to compact and the selecting candidate function itself * test: tests for the new 2 queries * feat: more tests and metrics for chooing compaction candidates * chore: Apply self suggestions from self review * chore: cleanup * chore: fix doc comment * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * refactor: address review comments * fix: get the right time provider for the tests * refactor: remove the left over compaction_ * fix: typos * fix: make the param name and env name consistent * refactor: make relevant iSomething to uSomething * fix: typo Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2022-07-18 18:05:13 +00:00
Andrew Lamb	e2d871b00b	chore: Update datafusion and arrow/parquet/arrow-flight to `18.0.0` (#5079 ) * chore: Update datafusion to 10.0.0, arrow/parquet/arrow-flight to 18 * chore: Run cargo hakari tasks * fix: update cargo pin Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-18 15:01:03 +00:00
Jake Goulding	635f535e0e	refactor: replace level_2 with level_1	2022-07-16 21:49:45 -04:00
kodiakhq[bot]	18ffe581b5	Merge branch 'main' into dependabot/cargo/tokio-1.20.0	2022-07-14 14:18:51 +00:00
dependabot[bot]	9b67de2f43	chore(deps): Bump tokio from 1.19.2 to 1.20.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.19.2 to 1.20.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.19.2...tokio-1.20.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2022-07-14 01:21:43 +00:00
Carol (Nichols \|\| Goulding)	de74415cbe	feat: Gather parquet files for a partition compaction operation Fixes #5118. Given a partition ID, look up the non-deleted Parquet files for that partition. Separate them into level 0 and level 1, and sort the level 0 files by max sequence number. This is not called anywhere yet.	2022-07-13 16:53:21 -04:00
Carol (Nichols \|\| Goulding)	d19c468b9d	fix: Remove unused level 1 compaction; move level 2 to level 1 Fixes #5119.	2022-07-13 15:05:09 -04:00
Carol (Nichols \|\| Goulding)	61c023139b	refactor: Switch compaction levels to an enum with values rather than separate consts Bonuses: - Type checking - Validation - Less casting - Exhaustiveness checking - Less use of the numerical value	2022-07-13 11:30:36 -04:00
Carol (Nichols \|\| Goulding)	34fcf6a584	fix: Line wrap to 100 columns	2022-07-13 11:29:13 -04:00
Nga Tran	5c5c964dfe	feat: config params for Compactor (#5108 ) * feat: config params for Compactor * refactor: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-13 13:50:07 +00:00
Nga Tran	bce8924b4c	refactor: use max_sequence_number to sort chunks for deduplication (#5101 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-12 16:23:53 +00:00
Carol (Nichols \|\| Goulding)	80b6c5c82f	fix: Correct typo in constant name so searching for COMPACTION_LEVEL returns all (#5077 )	2022-07-08 16:31:52 +00:00
Carol (Nichols \|\| Goulding)	2ba97dd9df	fix: Remove out of date comment	2022-07-08 10:31:13 -04:00
Carol (Nichols \|\| Goulding)	909c4b18d4	fix: Log more info when compacting files	2022-07-08 10:30:15 -04:00
Carol (Nichols \|\| Goulding)	a45767e705	fix: Restore compute_split_time to compactor utils	2022-07-08 10:14:41 -04:00
Carol (Nichols \|\| Goulding)	75065abfb6	fix: Compact all data for a partition to one file	2022-07-08 09:07:43 -04:00
Carol (Nichols \|\| Goulding)	959f0d3e02	fix: Clean up comments as I read through	2022-07-08 09:07:43 -04:00
Andrew Lamb	c46e1c6347	chore: Update datafusion + arrow/parquet/arrow-flight to `17.0.0` (#5021 ) * fix: correct nullability declaration of system tables * chore: Update datafusion and arrow/parquet/arrow-flight * chore: Run cargo hakari tasks * fix: Update tests * fix: Update tests * fix: predicate pruning * fix: add some tests * fix: query_functions * fix: fix read_buffer test * fix: fix clippy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-07 19:22:15 +00:00
Nga Tran	a48e6ae733	docs: add consensus for the desired final output of the compactor (#5069 ) * docs: add consensus for the desired final output of the compactor * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * docs: add initial readme to the compactor Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-07-07 19:11:16 +00:00
Marco Neumann	aacdeaca52	refactor: prep work for #5032 (#5060 ) * refactor: remove parquet chunk ID to `ChunkMeta` * refactor: return `Arc` from `QueryChunk::summary` This is similar to how we handle other chunk data like schemas. This allows a chunk to change/refine its "believe" over its own payload while it is passed around in the query stack. Helps w/ #5032.	2022-07-07 13:21:48 +00:00
Nga Tran	425b8a63cf	fix: avoid combing groups that overlap with other groups even if they are small (#5052 ) * fix: avoid combing groups that overlap with other groups even if they are small * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-06 14:03:15 +00:00
Nga Tran	d8b74f6af8	refactor: convert a panic into an error and throw a warning if we choose non-actionable compacting candidates (#5041 ) * refactor: convert a panic into an error and throw a warning if we choose non-actionable candidates * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: run fmt Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-07-05 18:53:52 +00:00
Nga Tran	1de022136c	feat: add max desired file size config param (#5025 ) * feat: add max desired file size config param * fix: comment typos * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-07-05 15:32:45 +00:00
Marco Neumann	16bd3e67c0	refactor: unify `apply_predicate_to_metadata` (#5030 ) Instead of using some hand-rolled timestamp-based logic (or just "unknown") all over the place, just use logic introduced in #5017. This requires slightly improved table summaries within the querier that at least has min/max for the timestamp column. For that, the former `IngesterChunk`-specific `calculate_summary` method was extended to `create_basic_summary` to include that data and is now also used by `QuerierParquetChunk`. Note: `QuerierRBChunk` already has detailled metrics that are provided by the read buffer implementation. Should we ever need even better pruning for `QuerierParquetChunk` (or `IngesterChunk`) then we _only_ need add extra data to the table summaries. Closes #4976. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-05 12:51:59 +00:00
Nga Tran	153c262d63	fix: do not panic on chunks with same range of sequence numbers but are not time-overlapped (#5018 ) * fix: do not panic on chunks with same range of sequence numbers but are not time-overlapped * chore: remove unused comment * chore: fix typo	2022-07-01 15:58:09 +00:00
Marco Neumann	87a8579742	refactor: `ChunkOrder::new` cannot fail (#5004 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-30 22:26:20 +00:00
Marco Neumann	be53716e4d	refactor: use IDs for `parquet_file.column_set` (#4965 ) * feat: `ColumnRepo::list_by_table_id` * refactor: use IDs for `parquet_file.column_set` Closes #4959. * refactor: introduce `TableSchema::column_id_map`	2022-06-30 15:08:41 +00:00
Raphael Taylor-Davies	835e1c91c7	chore: update object_store to 0.3.0 (#4707 ) * chore: update object_store to 0.3.0 * chore: review feedback Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-29 21:44:03 +00:00
Nga Tran	0cca975167	fix: Split overlapped files based on the order of sequence numbers and only group non-overlapped contigous small files (#4968 ) * fix: Split overlapped files based on the order of sequence numbers and only group non-overlapped contigous small files * test: add one more test for group contiguous files: * refactor: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-29 20:09:51 +00:00
Nga Tran	cfcc4b8426	refactor: change level 1 to level 2 preparing for next design changes (#4954 ) * refactor: change level 1 to level 2 preparing for next design changes * fix: make level-2 consistent everywhere * chore: remove unused comments * refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent * chore: add correspinding constants for the comapction levels in the comments Co-authored-by: Dom <dom@itsallbroken.com>	2022-06-29 14:08:58 +00:00
Marco Neumann	215f297162	refactor: parquet file metadata from catalog (#4949 ) * refactor: remove `ParquetFileWithMetadata` * refactor: remove `ParquetFileRepo::parquet_metadata` * refactor: parquet file metadata from catalog Closes #4124.	2022-06-27 15:38:39 +00:00
Marco Neumann	1a74f84494	refactor: remove `ParquetFileWithMetadata` usage outside the catalog (#4948 ) * refactor: remove `DecodedParquetFile` from `iox_tests` * refactor: remove `DecodedParquetFile` from querier Also pull out all the chunk schema and sort key handling into a function so that RB chunks and parquet chunks mostly use the same code path. * refactor: remove `DecodedParquetFile` * refactor: remove `ParquetFileWithMetadata` usage * fix: test data consistency	2022-06-27 15:19:29 +00:00
Marco Neumann	3b78bf1c48	refactor: remove binary parquet file MD from compactor (#4938 ) * refactor: simplify sort key calculation * refactor: use schema from catalog instead from file * refactor: do not request parquet file MD in compactor * test: ensure that `QueryableParquetChunk` works correctly	2022-06-27 15:11:15 +00:00
Marco Neumann	b9cbb3dfca	refactor: do not use in-parquet IOx metadata in compactor () (#4935 ) refactor: avoid feeding sort key from struct into same struct * feat: allow namespace schema query by ID * refactor: do not use binary parquet file MD in compactor tests * refactor: do not use in-parquet IOx metadata * refactor: reduce number of catalog queries	2022-06-27 08:06:11 +00:00
Nga Tran	3c0fb6e8ef	fix: avoid using min_time, which can be negative, for ChunkId. Using object store id which is uuid instead (#4942 ) * fix: avoid using min_time, which can be negative, for ChunkId. Using object store id which is uuid instead * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: run fmt Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 19:00:13 +00:00
Nga Tran	35dacf388b	feat: Compact now can split compacted results into multiple non-overlapped files based on config max file size (#4918 ) * feat: split times of compacting results based on the max file size * feat: cosider max file size while computing split time * test: tests for comput_split_time * feat: first step to teach the function split_the_steam to know how to split data into n streams using n-1 input PhysicalExprs * feat: make StreamSplitNode support a list of expression * docs: explain how StreamSplitNode works * feat: Teach compute_split_time to split a time range into many contiguous ranges and split compacted result into multiple non-overlapped files based on the config comapction_max_size_bytes * chore: cleanup * chore: clean up doc * chore: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 18:54:03 +00:00
Marco Neumann	bd6c4659af	refactor: slim down parquet chunk (remove Metadata) (#4934 ) * feat: conversion from `ParquetFile` to `ParquetFilePath` * refactor: slim down parquet chunk - ensure it works without binary parquet metadata - timestamp range is no longer optional (ensured by the NG type system) - remove table summary: this is only needed for SOME API users. The compactor can perfectly work without statistics since has the timestamp range which is sufficient for the current overlap check (we don't use any other primary key stats at the moment). The querier currently does NOT use parquet chunks (was replaced by read buffer) but if it will again in some future it will likely need to find a way to fetch and cache the statistics. - the schema is now provided by the API user since it can be reconstructed using the NG catalog only (and "wrong" column orders are tolerated as of #4921) Ref #4124 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 10:55:16 +00:00
Marco Neumann	9591bed696	refactor: make querier internals private (#4922 ) Queries internals are not meant to be used by other crates. Only a handful selected interfaces should be used by IOxD and the query tests. The compactor only used a very small subset just to read parquet files back into memory. It shall rather use the official `parquet_file` interface instead. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-22 13:00:08 +00:00
Marco Neumann	c3912e34e9	refactor: store per-file column set in catalog (#4908 ) * refactor: store per-file column set in catalog Together with the table-wide schema and the partition-wide sort key, this should be everything we need to read a parquet file directly into memory without peeking any file-level metadata. The querier will use this to directly load parquet files into the read buffer. WARNING: This requires a catalog wipe! Ref #4124. * refactor: use proper `ColumnSet` type	2022-06-21 10:26:12 +00:00
Nga Tran	72c8cfa6ed	fix: make ChunkOrder i64 data type to accept min sequence number 0 and match with data type of sequence number (#4888 ) * fix: make ChunkOrder u64 data type to accept min sequence number 0 * fix: make ChunkOrder i64 to match with sequence number type Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-17 13:45:17 +00:00
Marco Neumann	0fbff981ec	chore(deps): Bump sqlx to 0.6.0 and uuid to 1 (#4894 ) Closes #4889. Closes #4890. Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-17 10:28:28 +00:00
Nga Tran	3ca74744bf	chore: debug info about sequence number while it gets converted into ChunkOrder (#4884 )	2022-06-16 18:40:55 +00:00
Nga Tran	d57b0eb1fa	chore: more info for i64-to-u128 panic message (#4881 ) * chore: more info for i64-to-u128 panic message * chore: Apply suggestions from code review Co-authored-by: Dom <dom@itsallbroken.com> * chore: fix fmt Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-16 15:49:43 +00:00
Andrew Lamb	005610b172	refactor: remove some `&` use in iox_catalog (#4862 ) * refactor: remove some `&` use in iox_catalog * fix: Update data_types/src/lib.rs	2022-06-15 11:31:49 +00:00
Andrew Lamb	e91d00b10c	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `16.0.0 (#4851 ) * chore: TEMP Update DataFusion to pre-release * chore: update arrow et al to 16.0.0 * chore: Run cargo hakari tasks * fix: update reader read_dictionary API * chore: Update to real Datafusion release * fix: Update parquet API * fix: update test Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-06-14 16:31:40 +00:00
Dom Dwyer	b41ea1d718	refactor: PartitionKey type This commit changes the code base to use a new reference-counted PartitionKey type wrapper, instead of passing a bare String around. This allows the compiler to type check & verify usage of the partition key, instead of passing a bare string around. By reference counting the underlying string, we reduce memory usage for some use cases.	2022-06-14 14:47:56 +01:00
Nga Tran	99f1f0a10c	chore: Revert "feat: compact all overlapped files no matter how large they are (#4779 )" (#4831 ) This reverts commit `3e89daa0d4`.	2022-06-10 15:52:00 +00:00
Carol (Nichols \|\| Goulding)	1c7cbaf5ae	refactor: Use DurationHistogram in more places	2022-06-09 14:20:51 -04:00
Andrew Lamb	f34282be2c	fix: Do not run DataFusion optimizer pass twice (#4809 ) * fix: Do not run DataFusion optimizer pass twice * docs: improve docstring and logging	2022-06-08 21:01:22 +00:00
Nga Tran	b60e1be0cf	chore: remove irrelaevant comments (#4791 )	2022-06-07 00:43:56 +00:00
Nga Tran	3e89daa0d4	feat: compact all overlapped files no matter how large they are (#4779 ) * feat: add an option to compact all overlapped files no matter how large they are * chore: Apply suggestions from code review * feat: always compact oerlapped files no matter how large they are * chore: cleaup	2022-06-06 23:39:09 +00:00
dependabot[bot]	04c685b3b7	chore(deps): Bump tokio-util from 0.7.2 to 0.7.3 (#4784 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.2 to 0.7.3. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.2...tokio-util-0.7.3) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-06 14:46:27 +00:00
dependabot[bot]	e03bf94420	chore(deps): Bump tokio from 1.18.2 to 1.19.1 (#4783 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.18.2 to 1.19.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.18.2...tokio-1.19.1) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-06 14:15:12 +00:00
Carol (Nichols \|\| Goulding)	aa510ae4e6	fix: Remove test uses of parquet chunks and document as unused The querier is now using read buffer chunks only, but we're leaving the parquet chunk code around for the moment.	2022-06-03 09:16:04 -04:00
Andrew Lamb	3592aa52d8	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` (#4743 ) * chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` * chore: Update APIs * chore: Run cargo hakari tasks * feat: normalize parquet file metadata * chore: update size tests * chore: add docs on metadata stripping * chore: TEMP UPDATE TO DF BRANCH * chore: Update for new API * fix: Update to latest DF * fix: cargo hakari Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>	2022-06-03 10:32:26 +00:00
dependabot[bot]	9a21292db8	chore(deps): Bump async-trait from 0.1.53 to 0.1.56 (#4774 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.53 to 0.1.56. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.53...0.1.56) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-03 09:10:40 +00:00
Ryan Russell	d279deddad	docs(various): Improve Readability (#4768 ) Signed-off-by: Ryan Russell <git@ryanrussell.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-02 18:01:06 +00:00
Nga Tran	79895b995c	chore: add debug info to see how many concurrent partitions being compacted in each cycle (#4772 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-02 15:19:08 +00:00
Dom Dwyer	9ae58c89b6	refactor: constructor for ParquetFileWithTombstone Use a constructor to initialise a ParquetFileWithTombstone struct, rather than making the fields pub. This allows IDEs to "go to" places where this is constructed when browsing the code, but also keeps the type closed for modification of internals (SOLID).	2022-06-01 15:58:06 +01:00
Nga Tran	79220720be	chore: increase size of a compactor job and level of concurrency (#4746 ) * fix: let us not compact no-data * fix: split time must be greater min_time, too * fix: resolve merge conflict * chore: increase size of a compactor job and level of concurrency Co-authored-by: Dom <dom@itsallbroken.com>	2022-05-31 19:57:06 +00:00
Nga Tran	dfd35c05a1	fix: let us not compact no-data (#4744 ) * fix: let us not compact no-data * fix: split time must be greater min_time, too * fix: resolve merge conflict Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-31 17:02:14 +00:00
Dom Dwyer	70864b9f48	refactor: always use correct chunk sort key Don't use the same sort key for all files - sort keys may grow over time, and the information is already at hand.	2022-05-30 17:41:41 +01:00
Dom Dwyer	6aa2a6958a	refactor: assert consistent parquet file metadata Assert consistent metadata when evaluating candidate parquet files for compaction. Asserts all files have the same: * Sequencer ID * Namespace ID * Table ID * Partition ID * Sort key	2022-05-30 17:41:41 +01:00
Dom Dwyer	0f16d6cabb	refactor: consistent SortKey source Changes the compaction logic to always reference the same SortKey instance, rather than repeatedly querying for it. The Partition metadata is always read from the catalog as part of compact_partition(), where it previously threw away all metadata except the sort key, which was passed into compact(). Then compact() would always re-query the catalog to look up just the sort key again, and mix up the two instances during use - one passed into the fn, one freshly queried within the fn. Now the Partition metadata is resolved in compact_partition() as it was previously, but the entire Partition reference is passed to compact(), and this is consistently used do access the sort key. This also removes a catalog query per compaction call.	2022-05-30 17:41:41 +01:00
kodiakhq[bot]	842ef8e308	Merge branch 'main' into cn/fetch-from-parquet-file	2022-05-27 17:08:28 +00:00
Andrew Lamb	dde3c3922c	refactor: use consistent spelling of serialize (#4717 )	2022-05-27 14:42:59 +00:00
Nga Tran	ea81152fac	refactor: add partition ID into debug info and panic earlier to identify the bug easier (#4716 ) * chore: point tests to the new ticket * chore: cleanup * refactor: add partition ID into debug info and panic earlier to identify the bug easier	2022-05-27 12:20:36 +00:00
Carol (Nichols \|\| Goulding)	5fd3ffc17f	refactor: Rename ParquetChunkAdapter to only ChunkAdapter It might be creating chunks of different kinds other than ParquetChunks.	2022-05-26 16:52:14 -04:00
Carol (Nichols \|\| Goulding)	df10452e2e	refactor: Rename methods from new_querier_chunk to new_querier_parquet_chunk	2022-05-25 17:19:10 -04:00
Nga Tran	6cc767efcc	feat: teach compactor to compact smaller number of files (#4671 ) * refactor: split compact_partition into two functions to handle concurrency better * feat: limit number of files to compact * test: add test for limit num files * chore: fix cipply * feat: split group if over max size * fix: split the overlapped group to limit size or file num * chore: reduce config values * test: add tests and clearer comments for the split_overlapped_groups and test_limit_size_and_num_files * chore: more comments * chore: cleanup	2022-05-25 19:54:34 +00:00
Andrew Lamb	935743b525	refactor: Implement `new_querier_chunk` and `new_querier_chunk_from_file_with_metadata` (#4685 )	2022-05-24 21:58:27 +00:00
Dom Dwyer	c885b845dc	refactor: concurrent StreamSplitExec execution Changes the compactor to consume both StreamSplitExec output partitions concurrently. Practically speaking this means both Parquet files will be generated concurrently, and uploaded to object store concurrently.	2022-05-24 14:10:46 +01:00
Dom Dwyer	8f05250c96	feat: steaming compaction This commit changes the Compactor::compact() method to stream the RecordBatch instances directly to the parquet serialiser, before being uploaded directly to object storage.	2022-05-24 14:09:10 +01:00
Dom Dwyer	2e6c49be83	refactor: remove IoxMetadata min & max timestamp Removes the min/max timestamp fields from the IoxMetadata proto structure embedded within a Parquet file's metadata. These values are redundant as they already exist within the Parquet column statistics, and precluded streaming serialisation as these removed min/max values were needed before serialising the file.	2022-05-23 16:27:08 +01:00
Dom Dwyer	a142a9eb57	refactor: remove row_count from IoxMetadata Remove the redundant row_count from the IoxMetadata structure that is serialised into the Parquet file. The reasoning is twofold: * The Parquet file's native metadata already contains a row count * Needing to know the number of rows up-front precludes streaming	2022-05-23 16:18:35 +01:00
Dom	f0d0f1ba0c	Merge branch 'main' into dom/codec-object-store	2022-05-23 15:39:54 +01:00
Dom Dwyer	7df7c4844c	refactor: remove redundant ParquetChunk errors Eliminates unused / refactors away unnecessary errors for the parquet::chunk module.	2022-05-20 15:17:40 +01:00
Dom Dwyer	b9a745d42d	feat: RecordBatch stream to Parquet file upload Implements an upload() method on the ParquetStorage type, consuming a stream of RecordBatch, serialising the Parquet file, and uploading the result to object storage. Returns the IOx-specific file metadata. Currently while the upload() method accepts a stream of RecordBatch, the actual resulting Parquet file is buffered in memory before uploading to object store, due to lack of streaming upload functionality in the ObjectStore abstraction - this isn't the end of the world, as the files tend to be relatively small with our current usage. This impl should be easily modified to be fully streaming once streaming object store puts are implemented: https://github.com/influxdata/object_store_rs/issues/9	2022-05-20 15:17:40 +01:00
Carol (Nichols \|\| Goulding)	5fcf18cc02	fix: Add missing assert call around contains tests `contains` is now must_use. Thanks Rust!	2022-05-19 14:39:51 -04:00
Dom Dwyer	baa86d846f	refactor: use ParquetStore instead of ObjectStore Changes the code paths that interact with Parquet files in the object store to reference the ParquetStorage directly (DRY refactor). This change takes us from a dependency graph of: ┌─────────────────┐ │ │ ▼ │ Parquet Consumer │ │ ┌──────────────┐ ├────────▶│ParquetStorage│ ▼ └──────────────┘ ┌──────────────┐ │ ObjectStore │ └──────────────┘ │ ┌────┴────┐ ▼ ▼ File s3 System (etc) to: Parquet Consumer │ ▼ ┌──────────────┐ │ParquetStorage│ └──────────────┘ │ ▼ ┌──────────────┐ │ ObjectStore │ └──────────────┘ │ ┌────┴────┐ ▼ ▼ File s3 System (etc) With the ParquetStorage being solely responsible for managing interactions with the object store when dealing with Parquet files.	2022-05-19 13:52:51 +01:00
Dom Dwyer	d3548653d5	refactor: rename Storage -> ParquetStorage Renames the Storage type so the context is clear in usage (i.e. fn args), rather than having to rely on knowing the fully-qualified import path to know what the type stores.	2022-05-19 13:51:07 +01:00
Dom Dwyer	e20b02b914	refactor: tidy ParquetChunk constructor Removes two unused constructors for a ParquetChunk, and moves the bare fn constructor that is actually used to be an associated method (a conventional constructor).	2022-05-19 13:51:07 +01:00
Marco Neumann	770293a973	feat: add LRU cache metrics (#4632 ) * refactor: require `Resource`s to be convertible to `u64` * refactor: require `Resource`s to have a unit name * refactor: make LRU cache IDs static * feat: add LRU cache metrics * docs: improve type names in LRU doctest * docs: epxlain `MeasuredT` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * docs: explain `test_metrics` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-05-19 08:05:17 +00:00
Marco Neumann	52346642a0	ci: fix cargo deny (#4629 ) * ci: fix cargo deny * chore: downgrade `socket2`, version 0.4.5 was yanked * chore: rename `query` to `iox_query` `query` is already taken on crates.io and yanked and I am getting tired of working around that.	2022-05-18 09:38:35 +00:00
Andrew Lamb	3a33e806c7	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `14.0.0` (#4619 ) * chore: Update datafusion deps * chore: update arrow/parquet/arrow flight deps * chore: Run cargo hakari tasks * chore: Update location of utils * chore: Update some more APIs Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-05-17 14:13:03 +00:00
Marco Neumann	779f0e9cdf	feat: querier RAM pool (#4593 ) * feat: `SortKey::size` * feat: `FunctionEstimator` * feat: querier RAM pool Let's put all the caches into a single RAM pool, so we can at least somewhat control RAM usage. Note that this does NOT limit the peak memory during query execution though, but should at least stop unlimited cache growth. A follow-up PR will add metrics. * refactor: improve some size calculations Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-17 13:11:20 +00:00
dependabot[bot]	259d2486c1	chore(deps): Bump tokio-util from 0.7.1 to 0.7.2 (#4605 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.1 to 0.7.2. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.1...tokio-util-0.7.2) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-05-16 11:42:31 +00:00
Nga Tran	9530e73925	chore: move noisy debug to trace and fix some comments (#4598 ) * chore: move noisy debug to trace and fix some comments * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: fix format Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-13 19:18:15 +00:00
Raphael Taylor-Davies	f2bb0fdf77	feat: update to crates.io object_store version (#4595 ) * feat: update to crates.io object_store version * chore: Run cargo hakari tasks * fix: tests * chore: remove object store integration test plumbing Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-05-13 16:26:07 +00:00
kodiakhq[bot]	0f8f294319	Merge branch 'main' into cn/remove-chunk-addr	2022-05-13 13:54:44 +00:00
Carol (Nichols \|\| Goulding)	55313d290a	fix: Update or remove comments that mention NG or OG Connects to #4450.	2022-05-12 16:09:08 -04:00
Carol (Nichols \|\| Goulding)	07c7c75067	fix: Remove ng_chunk method Connects to #4450.	2022-05-12 16:09:08 -04:00
Carol (Nichols \|\| Goulding)	b581a42fde	fix: Rename new_id_for_ng to new_id Connects to #4450.	2022-05-12 16:09:07 -04:00
Carol (Nichols \|\| Goulding)	faba90d992	fix: Remove ChunkAddr	2022-05-12 15:50:41 -04:00
Nga Tran	f9e3495e47	feat: add more metrics for compactor (#4575 ) * feat: add more metrics for compactor * chore: clearer comment	2022-05-12 13:20:43 +00:00
Raphael Taylor-Davies	8b379c83cc	refactor: simplify object_store path handling (#4534 ) * refactor: simplify object_store path handling * fix: aws integration tests * chore: lint * fix: update gcs tests * refactor: move errors into submodules * chore: lint * chore: review feedback * refactor: replace provider with Display * fix: failing tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-09 18:43:22 +00:00
Carol (Nichols \|\| Goulding)	d458e390ad	fix: Move allow dead code to be more specific in compactor; remove actually dead code	2022-05-06 16:58:02 -04:00
Jake Goulding	e07bcd40c2	refactor: Remove unused dependencies These were found by iterating over all of the dependencies of each Cargo.toml, then grepping that crate for the dependency's name. If it didn't show up, I attempted to remove it. I left a few dependencies that this process flagged: * generated_types - `pbjson`,`serde`. Apparently used by the generated code. * grpc-router-test-gen - `prost`. Apparently used by the generated code. * influxdb_iox - `heappy`. Doesn't appear used, but is behind enough feature flags that I don't care to reason about and it's already optional. - `tikv_jemalloc_sys`. Appears to be setting a feature flag of an indirect dependency. * iox_gitops_adapter - `k8s_openapi`. Appears to be setting a feature flag of an indirect dependency.	2022-05-06 15:57:58 -04:00
Carol (Nichols \|\| Goulding)	068096e7e1	fix: Rename data_types2 to data_types	2022-05-06 14:45:39 -04:00
Carol (Nichols \|\| Goulding)	0541c6e40f	fix: Remove data_types crate where it's no longer used	2022-05-06 14:45:39 -04:00
Carol (Nichols \|\| Goulding)	2ef44f2024	fix: Move timestamp types to data_types2	2022-05-06 14:45:38 -04:00
Carol (Nichols \|\| Goulding)	eb31b347b0	refactor: Move tombstones_to_delete_predicates to the predicate crate	2022-05-06 14:45:37 -04:00
Carol (Nichols \|\| Goulding)	ea46830954	fix: Remove iox_object_store crate; move ParquetFilePath to parquet_file	2022-05-06 14:45:36 -04:00
Andrew Lamb	02893e598c	chore: Update datafusion and upgrade arrow/parquet/arrow-flight to 13 (#4516 ) * chore: Tool for automating arrow version update * chore: Update datafusion and arrow/parquet/arrow-flight * fix: update for changes in Arrow API Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-05 00:21:02 +00:00
Andrew Lamb	48d2fe1396	fix: shutdown compactor more quickly on cancel (#4495 ) * fix: shutdown compactor more quickly on cancel * fix: fixup docs	2022-05-02 17:22:58 +00:00
Andrew Lamb	dd3147c2ec	fix: allow `--grpc-bind` and `--api-bind` args in all-in-one mode (#4494 ) * fix: allow `--grpc-bind` and `--api-bind` args in all-in-one mode * fix: shutdown compactor more quickly on cancel Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-01 20:20:16 +00:00
dependabot[bot]	420c306caa	chore(deps): Bump tokio from 1.17.0 to 1.18.0 (#4453 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.17.0 to 1.18.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.17.0...tokio-1.18.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-04-28 08:21:17 +00:00
Nga Tran	fa2c1febf4	feat: use stored partition sort key to deduplicate data (#4360 ) * feat: use stored sort key to deduplicate data * refactor: verify if one is a super sort key of the other * test: unit tests for scan and deduplication plans * fix: typo * refactor: refactor and add comments * feat: cache partition sort key to read during planning as needed * test: tests for query plans with different overlap groups * chore: cleanup * chore: resolve merge conflicts Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-26 20:36:32 +00:00
二手掉包工程师	4b47d723b1	refactor: Rename time to iox_time (#4416 ) Signed-off-by: hi-rustin <rustin.liu@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-26 00:19:59 +00:00
Nga Tran	0a440bb638	refactor: grouping overlaps now uses the same overlap function in both compactor and deduplication (#4420 ) * refactor: grouping overlaps is now use the same overlap function in both compactor and deduplication * chore: commit missing file * chore: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-25 20:32:51 +00:00
Nga Tran	d963110842	feat: group chunk overlaps based on time range only (#4389 ) * feat: overlap for NG querier * chore: cleanup * refactor: address review comments * fix: typo Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-25 13:32:07 +00:00
Carol (Nichols \|\| Goulding)	c7a1c496cf	fix: incorrect overlapped grouping (#4082 ) * test: Failing test for finding overlapped groups * test: Failing test for query overlap too :( * fix: Group parquet files overlapped by time correctly Inspired by https://towardsdatascience.com/overlapping-time-period-problem-b7f1719347db Not sure what the real name for this algorithm is * refactor: Group items without an intermediate hashmap needed * chore: cleanup Co-authored-by: NGA-TRAN <nga-tran@live.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-21 18:51:30 +00:00
Andrew Lamb	73bed810da	chore: Update arrow, arrow-flight, parquet, tonic, prost, etc (#4357 ) * chore: Update datafusion * chore: Update arrow/arrow-flight/parquet to 12 * chore: update datafusion correctly * chore: Update prost, tonic, and dependents * fix: Fixup some api changes * fix: Update test output in db * fix: Update test output in parquet_file * fix: remove old pbjson types * fix: Add "--experimental_allow_proto3_optional" flag * chore: Run cargo hakari tasks * fix: compile error * chore: Update heappy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-20 11:12:17 +00:00
Nga Tran	2a601c3099	fix: Revert "chore: Revert "fx: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299 )" (#4303 )" (#4327 )" (#4328 ) * fix: Revert "chore: Revert "fx: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299)" (#4303)" (#4327)" This reverts commit `7e5d719027`. * chore: resolve merge conflict Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-18 15:27:39 +00:00
Nga Tran	8e2d158a37	test: deadlock test and add more debug log (#4319 ) * test: use Paul deadlock reproducer and add more debug log * test: remove compare many output rows * test: verify the test putput * chore: cleanup Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-14 18:06:22 +00:00
Nga Tran	7e5d719027	chore: Revert "fix: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299 )" (#4303 )" (#4327 ) This reverts commit `fe8d9948d5`.	2022-04-14 17:11:55 +00:00
Carol (Nichols \|\| Goulding)	fe8d9948d5	fix: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299 )" (#4303 ) This reverts commit `7ddbf7c025`. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-14 15:42:28 +00:00
Nga Tran	3070d78e8c	chore: add more compactor debug info (#4310 ) * chore: add more compactor debug info * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: fix format Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-13 19:22:19 +00:00
Carol (Nichols \|\| Goulding)	65b1a83419	fix: Use parquet_metadata less in compactor Connects to #4124. The parquet metadata is still needed to create the ParquetChunk, but iox_metadata isn't needed in QueryableParquetChunk.	2022-04-13 10:43:20 -04:00
Carol (Nichols \|\| Goulding)	94dcde4996	fix: Do fewer queries for metadata By adding another _with_metadata catalog function. Also introduce a new type rather than passing around tuples everywhere.	2022-04-13 10:43:20 -04:00
Carol (Nichols \|\| Goulding)	02fee3b84f	feat: Request parquet metadata from the catalog when needed only	2022-04-13 10:43:19 -04:00
Carol (Nichols \|\| Goulding)	7ddbf7c025	fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-13 14:11:10 +00:00
Carol (Nichols \|\| Goulding)	d23d7b190f	fix: Filter compacted sort key to present columns In parquet files written after compaction, use the catalog sort key but filter it to only those columns that appear in the merged schema. Panic if there are any columns in the merged schema's primary key that aren't in the catalog sort key; that shouldn't happen.	2022-04-11 14:09:46 -04:00
Carol (Nichols \|\| Goulding)	48d3d0e471	fix: Panic earlier if a partition doesn't have a catalog sort key Because we decided a panic was ok to do if the catalog doesn't have a sort key for the partition, move the panic earlier to catch it before doing other work.	2022-04-11 14:09:46 -04:00
Carol (Nichols \|\| Goulding)	b6253b8046	docs: Explain why this panic might happen	2022-04-11 14:09:45 -04:00
Carol (Nichols \|\| Goulding)	55fe3b8d50	feat: Use the sort key stored in the catalog during compaction Fixes #4249.	2022-04-11 14:09:45 -04:00
Nga Tran	f838cb78a2	fix: not to add IOxReadFilterNode for empty non-duplicated chunks (#4264 ) * fix: not to add IOxReadFilterNode for no data of non-duplicated chunks if there is already scan node for overlapped/duplicated chunks * refactor: address review comments * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-08 21:03:22 +00:00
Dom Dwyer	6131381b8d	refactor: extra debug in compactor Continues pushing more debug through the compaction processing loop.	2022-04-08 11:20:19 +01:00
Dom Dwyer	3706ac042d	refactor: add debug in compaction path Adds debug!() and friends through the compaction path.	2022-04-07 17:13:45 +01:00
Paul Dix	a6f18e86fe	chore: add compactor logs (#4239 )	2022-04-05 21:26:59 +00:00
dependabot[bot]	bea49e7611	chore(deps): Bump arrow from 11.0.0 to 11.1.0 (#4234 ) Bumps [arrow](https://github.com/apache/arrow-rs) from 11.0.0 to 11.1.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/11.0.0...11.1.0) --- updated-dependencies: - dependency-name: arrow dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-05 16:54:28 +00:00
Paul Dix	0892ccf7fb	fix: compactor use join_all (#4211 ) I forgot to address this in #4139. Have the compactor use join and make sure the error gets logged.	2022-04-02 14:23:33 -04:00
Paul Dix	3aa3ebe0e8	chore: add compactor logging (#4207 )	2022-04-01 18:51:01 -04:00
Nga Tran	77ad4a7dad	feat: replace a compactor constant with an CLI config param (#4204 )	2022-04-01 17:50:43 +00:00
Nga Tran	a6eb83d47d	feat: compact small contiguous files of the same partition even if they do not overlap (#4197 ) * feat: compact small contiguous files of the same partition even if they do not overlap * test: more tests * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * refactor: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2022-04-01 15:26:43 +00:00
Nga Tran	9c50a4c9fb	test: replace find_and_compact with compact_partition in tests (#4185 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-31 13:51:22 +00:00
Nga Tran	ddc2c8304f	fix: have the compaction level set correctly (#4184 ) * fix: have the compaction level set correctly, especially for compacted file from the compactor * fix: typo	2022-03-30 21:23:40 +00:00
Paul Dix	04d961e70d	feat: wire up compactor scheduler and config (#4139 ) Add configuration options for compactor for the max size of level 0 files and split percentage. Add metrics for compaction to track the number of candidates, compactions, and durations. Add functions to separate identifying partitions to compact from running compaction. Make compaction run in smaller chunks, specifically per partition. Update compaction to automatically promote level 0 files that are non-overlapping without waiting some period of time. Closes #4120 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-30 17:45:24 +00:00
Marco Neumann	20bbb88dc5	refactor: remove table name from `TableSummary` (#4170 ) This allows us to remove the table name from the low-level chunk representations (like `ParquetFile`, RUB, ...) since table names are already tracked by the higher-level data structures (e.g. catalog, catalog chunk) that manage the low-level chunk representations. This is similar to #4167. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-30 13:24:00 +00:00
Marco Neumann	036626a576	refactor: remove partition key from `ParquetChunk` (#4167 ) The parquet chunk is always wrapped into some higher-level data structure (e.g. a catalog chunk, a partition, ...) that knows exactly "where" the chunk is located. There is no need for the parquet chunk to back-reference container-level attributes. In the contrary: double-bookkeeping makes the code more complex and costs additional memory. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-30 09:24:56 +00:00
Nga Tran	bfd5568acf	fix: make sure the QueryableParquetChunks are always sorted correctly (#4163 ) * fix: make sure the chunks are always sorted correctly * fix: output * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: make new function for new chunk id Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-29 21:36:45 +00:00
Carol (Nichols \|\| Goulding)	db5cd70c77	fix: logical merge conflict, unused import	2022-03-29 08:29:23 -04:00
Carol (Nichols \|\| Goulding)	4a51c9eda6	feat: Add a garbage collector to be called in a background loop Fixes #3954.	2022-03-29 08:15:26 -04:00
Carol (Nichols \|\| Goulding)	f3f792fd08	feat: Add namespace_id to the parquet_files table; object store paths need it	2022-03-29 08:15:26 -04:00
Carol (Nichols \|\| Goulding)	a373c90415	refactor: Extract the list_all function to object store I'm about to use this in a third file, so time to extract this. Make it clear that this is appropriate for tests only.	2022-03-29 08:15:24 -04:00
dependabot[bot]	17af5fcbd1	chore(deps): Bump tokio-util from 0.7.0 to 0.7.1 (#4154 ) * chore(deps): Bump tokio-util from 0.7.0 to 0.7.1 Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.0 to 0.7.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.0...tokio-util-0.7.1) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-29 08:39:02 +00:00
Nga Tran	80b7e9cce1	feat: delete fully processed tombstones & integration tests for find_and_compact (#4116 ) * feat: remove fully processed tombstones * test: first few tests * fix: delete SQL * fix: test how IN (...) works in PG * fix: test how IN (?) works in PG * fix: test how IN (?) works in PG * fix: dynamically add IN (?, ?, ...) * fix: dynamically add IN (?, ?, ...) & its dynamic values * fix: add argument directly in the SQL * test: more tests for catalog read and update functions * chore: move a subfunction to make it easier to read) * test: first test for find_can_compact but disabled due to bug * test: integration tests and a bug fix for find_and_compact * chore: cleanup * refactor: address review comments * fix: put 2 delete processed tombstones and tombstones in a transaction	2022-03-28 18:35:54 +00:00
dependabot[bot]	4f9515ffba	chore(deps): Bump async-trait from 0.1.52 to 0.1.53 (#4141 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.52 to 0.1.53. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.52...0.1.53) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-03-28 08:55:24 +00:00
kodiakhq[bot]	15a9108135	Merge branch 'main' into dom/revert-revert-revert	2022-03-24 16:38:06 +00:00
Dom Dwyer	bf782de421	fix: compactor early shutdown The compactor stub code would wait on nothing when the caller waited on join()-ing the compactor handler, and this meant any caller who blocked on join() would immediately return.	2022-03-24 15:58:02 +00:00
Andrew Lamb	5c69a3f43b	chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 (#4119 ) * chore: update datafusion * chore(deps): Bump arrow from 10.0.0 to 11.0.0 Bumps [arrow](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0) --- updated-dependencies: - dependency-name: arrow dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0 Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0) --- updated-dependencies: - dependency-name: arrow-flight dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore: update parquet to 11.0.0 * fix: error on create schema, test for same * fix: upgrade zstd * chore: Run cargo hakari tasks * fix: fix logical merge conflict * fix: hakari * fix: hakari * fix: update newly introduced dep Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-24 15:27:36 +00:00
Carol (Nichols \|\| Goulding)	67e13a7c34	fix: Change to_delete column on parquet_files to be a time (#4117 ) Set to_delete to the time the file was marked as deleted rather than true. Fixes #4059. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-23 18:47:27 +00:00
Marco Neumann	51da6dd7fa	feat: store sort key in NG metadata (#4110 ) The sort key is optional and currently only produced by `iox_tests`. Writing it within the ingester/compactor is tracked by #3968. The sort key is read by the querier (and this will be verified by the query tests and is required to merge #4103). Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-23 18:24:46 +00:00
Carol (Nichols \|\| Goulding)	c3a8834970	test: Add a test for add_tombstones_to_groups	2022-03-23 09:56:27 -04:00
Carol (Nichols \|\| Goulding)	080156aa27	fix: Only do one catalog query for tombstones per each group of parquet files The query will get all tombstones that could be relevant to the group; then associate subsets of the results with each parquet file.	2022-03-23 09:56:26 -04:00
Carol (Nichols \|\| Goulding)	2749c37d02	fix: Query for tombstones in a time range, not for a particular parquet file The compactor at this point is still querying for each file; this is an intermediate step	2022-03-23 09:52:00 -04:00
Carol (Nichols \|\| Goulding)	4d2e71c03e	feat: Wrap parquet files with their relevant tombstones	2022-03-23 09:52:00 -04:00
Nga Tran	c3ef56588f	feat: use creation time to check level upgradable (#4094 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-22 13:51:18 +00:00
Nga Tran	886f9dc8c1	feat: split compacted data into 2 compacted sets (#4088 ) * feat: split compacted data into 2 compacted sets * chore: clean up * refactor: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-22 13:28:32 +00:00
Andrew Lamb	b83b000590	chore: Update datafusion (#4071 ) * chore: update to datafusion 5936edc2a94d5fb20702a41eab2b80695961b9dc * chore: Update apis to match datafusion changes	2022-03-22 13:17:41 +00:00
Carol (Nichols \|\| Goulding)	201ced1d66	test: Mark a parquet file deleted in the update catalog operation	2022-03-21 10:16:58 -04:00
Carol (Nichols \|\| Goulding)	dbca54d917	refactor: Move add parquet file and tombstones within update catalog This should never be done on its own so doesn't really need to be its own method. We also don't do anything with the returned data, so no need to allocate those vectors.	2022-03-21 10:16:58 -04:00
Carol (Nichols \|\| Goulding)	2fea10dfd7	feat: Mark old compacted parquet files to be deleted in transaction Connects to #3952	2022-03-21 10:16:58 -04:00
Carol (Nichols \|\| Goulding)	5b294968a5	feat: Add processed tombstone records with compacted parquet file In a transaction when the parquet file is added to the catalog. Connects to #3952.	2022-03-21 10:16:57 -04:00
Carol (Nichols \|\| Goulding)	b983b24fcf	fix: Adding processed tombstones to catalog only needs tombstone ID	2022-03-21 10:16:57 -04:00
Carol (Nichols \|\| Goulding)	8fd3d85634	refactor: Move add_parquet_file_with_tombstones from ingester to compactor	2022-03-21 10:16:57 -04:00
Carol (Nichols \|\| Goulding)	933dc69ecf	feat: For each compacted data set, persist new parquet file to object store (#4058 ) * feat: Rearrange skeleton functions for split/persist/catalog update * feat: Persist compacted files to object storage Fixes #3951. * docs: Add comment about batches' schemas	2022-03-21 14:16:03 +00:00
Marco Neumann	d1df95df87	refactor: dyn-dispatch chunks in query subsystem - this is what DataFusion is doing as well; it's also fast enough because the number of chunks in a query is not THAT massive (it's not like we are doing row-level dyn dispatching) - it simplifies abstracting over different databases - it allows us to drop our enum-based dispatching that we have for `DbChunk` and that we would also need for the querier (e.g. depending on if a chunk is backed by a parquet file or ingester data) - it likely speeds up compile times because the `query` is no longer contains massive amounts of generic code For #3934.	2022-03-21 12:47:54 +01:00
Marco Neumann	169fa2fb2f	refactor: make `QueryChunk` object-safe This makes it way easier to dyn-type database implementations. The only real change is that we make `QueryChunk::Error` opaque. Nobody is going to inspect that anyways, it's just printed to the user. This is a follow-up of #4053. Ref #3934.	2022-03-18 11:40:31 +01:00
Carol (Nichols \|\| Goulding)	cd9c483864	feat: Group files by whether they overlap in time (#4048 ) Fixes #3949. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-17 13:05:18 +00:00
Dom Dwyer	65273721b6	feat(compactor): enable object store metrics	2022-03-15 16:32:52 +00:00
Dom Dwyer	5585dd3c21	refactor: switch to using DynObjectStore Changes all consumers of the object store to use the dynamically dispatched DynObjectStore type, instead of using a hardcoded concrete implementation type.	2022-03-15 16:32:52 +00:00
Dom Dwyer	1d5066c421	refactor: rename ObjectStore -> ObjectStoreImpl Frees up the name for so we can use `dyn ObjectStore` throughout the code instead of `ObjectStoreApi`.	2022-03-15 16:29:43 +00:00
Carol (Nichols \|\| Goulding)	1dacf567d9	feat: Add a function to the catalog to fetch level 1 parquet files Fixes #3946.	2022-03-11 15:40:34 -05:00
Carol (Nichols \|\| Goulding)	f184b7023c	feat: Update specified parquet file records to compaction level 1 Fixes #3950.	2022-03-11 15:34:40 -05:00
Carol (Nichols \|\| Goulding)	fabd262442	feat: Add a function to the catalog to fetch level 0 parquet files Connects to #3946.	2022-03-11 15:34:05 -05:00
Nga Tran	5a29d070ea	feat: Implement the compact function for NG Compactor (#4001 ) * feat: initial implementation of compact a given list of overlapped parquet files * feat: Add QueryableParquetChunk and some refactoring * feat: build queryable parquet chunks for parquet files with tombstones * feat: second half the implementation for Compactor's compact. Tests will be next * fix: comments for trait funnctions fof QueryChunkMeta * test: add tests for compactor's compact function * fix: typos * refactor: address Jake's review comments * refactor: address Andrew's comments and add one more test for files in different order in the vector Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-11 20:25:19 +00:00
Andrew Lamb	b24ae7d23b	refactor: extract out compactor creation from config (#4018 ) * refactor: extract out compactor creation from config * fix: fmt Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-11 14:46:34 +00:00
Carol (Nichols \|\| Goulding)	944f628e29	fix: Remove data_types as a dependency of ng compactor (#3993 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-09 17:03:02 +00:00
Nga Tran	09fba1d2c0	feat: NG Compactor - main function for finding and compacting parquet files (#3973 ) * feat: main function for finding and compacting parquet files * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: rename file and struct Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-08 16:34:43 +00:00
Andrew Lamb	b870b9340b	chore: remove uneeded dependencies (#3929 ) * chore: remove unused deps in compactor * chore: remove unused deps in influxdb_ioxd * chore: remove unused deps in object_store * chore: remove unused deps in server * fix: object_store needs observability deps when compiled with aws Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-04 17:39:46 +00:00
Luke Bond	34e06e8689	fix: compactor server stays up; removed unused delegates (#3855 ) * fix: compactor server stays up; removed unused delegates * chore: fmt Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-24 16:30:44 +00:00
dependabot[bot]	ad3868ed7c	chore(deps): Bump tokio from 1.16.1 to 1.17.0 (#3814 ) * chore(deps): Bump tokio from 1.16.1 to 1.17.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.16.1 to 1.17.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * build: update workspace-hack Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom Dwyer <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-22 16:27:43 +00:00
Luke Bond	0f012de70c	feat: adding compactor CLI command and crate Closes: #3777	2022-02-21 12:24:09 +00:00

... 4 5 6 7 8 ...

550 Commits (1ddc64d68db906c6490f36d4aecde7ccd5bff945)