influxdb

Commit Graph

Author	SHA1	Message	Date
Carol (Nichols \|\| Goulding)	b2df492558	feat: Limit L1 -> L2 compaction based on file size	2022-10-13 16:20:22 -04:00
Andrew Lamb	9134ccd6c3	chore: Update datafusion again (#5855 ) * chore: Update datafusion * chore: Updates for changes in datafusion * chore: more updates * fix: update doc example Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-13 19:18:57 +00:00
Carol (Nichols \|\| Goulding)	082d045633	fix: Update test compactor limit values	2022-10-13 14:25:10 -04:00
Carol (Nichols \|\| Goulding)	cdd01eb3fc	test: Verify L1 files chosen for compaction are limited by the memory budget	2022-10-13 14:15:39 -04:00
Carol (Nichols \|\| Goulding)	3cdf2556ec	test: Verify L1 files in a group by themselves get upgraded to L2	2022-10-13 14:15:39 -04:00
Nga Tran	fab3cd845c	feat: add memory need for output streams into our estimation (#5847 ) * feat: add memory need for output streams into our estimation * test: modify tests to have better coverage * refactor: use constants isntead of numbers * chore: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-13 14:31:19 +00:00
Nga Tran	1400bf99e4	refactor: split memory estimation into bytes to store and bytes to stream (#5845 ) * refactor: split memory estimation into bytes to store and bytes to stream * chore: cleanup Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-12 16:59:51 +00:00
Andrew Lamb	d57c99638c	chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0 (#5792 ) * chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0 * fix: Update for coercion, fix explain plans for change in column name display * chore: Update datafusion lock * fix: Update for other API changes * chore: Update to latest datafusion pin * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-12 16:19:14 +00:00
Nga Tran	f05ca867a5	feat: add file size into estimated memory (#5837 ) * feat: add file size into estimataed memory * chore: cleanup * chore: fmt * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <alamb@influxdata.com> * chore: run fmt after applying review suggestion * fix: fix tests towork with the change for review suggestion Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-12 14:42:53 +00:00
Nga Tran	b7153862b0	refactor: due to limit in size uplaoed to S3, we need to split output file of cold compaction, too (#5834 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-11 17:22:19 +00:00
dependabot[bot]	933493fab3	chore(deps): Bump object_store from 0.5.0 to 0.5.1 Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.0 to 0.5.1. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.0...object_store_0.5.1) --- updated-dependencies: - dependency-name: object_store dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-10-11 01:19:10 +00:00
Marco Neumann	c4c83e0840	fix: query error propagation (#5801 ) - treat OOM protection as "resource exhausted" - use `DataFusionError` in more places instead of opaque `Box<dyn Error>` - improve conversion from/into `DataFusionError` to preserve more semantics Overall, this improves our error handling. DF can now return errors like "resource exhausted" and gRPC should now automatically generate a sensible status code for it. Fixes #5799.	2022-10-06 08:54:01 +00:00
Nga Tran	2f08a64f16	feat: not split output files in the first step of cold compaction (#5781 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-30 16:08:03 +00:00
Dom Dwyer	cd4087e00d	style: add no todo!() or dbg!() lints Some crates had theme, some not - lets be consistent and have the compiler spot dbg!() and todo!() macro calls - they should never be in prod code!	2022-09-29 13:10:07 +02:00
Andrew Lamb	66dbb9541f	chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0, `thrift` to 0.16.0 (#5694 ) * chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0 * chore: Update thrift / remove parquet_format * fix: Update APIs * chore: Update lock + Run cargo hakari tasks * fix: use patched version of arrow-rs to work around https://github.com/apache/arrow-rs/issues/2779 * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-27 12:50:54 +00:00
Nga Tran	75ff805ee2	feat: instead of adding num_files and memory budget into the reason text column, let us create differnt columns for them. We will be able to filter them easily (#5742 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-26 20:14:04 +00:00
Nga Tran	b11da1d98b	fix: a silly bug that did not capture file limit if a lot of L0 files and very few or non overlapped L1 (#5736 )	2022-09-23 21:03:29 +00:00
Nga Tran	c4542d6b21	chore: more verbose about the memory budget inserted in to the catalog table skipped_comapction (#5735 )	2022-09-23 18:40:09 +00:00
Nga Tran	bb7df22aa1	chore: always use a fixed number of rows (8192) per batch to estimate memory (#5733 )	2022-09-23 15:51:25 +00:00
Nga Tran	da697815ff	chore: add more info about memory budget at the time of over-file-limit into skipped_compaction for us to see if we shoudl increase the file limit (#5731 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-23 13:34:38 +00:00
Nga Tran	61075d57e2	chore: turn full cold compaction on (#5728 )	2022-09-22 17:07:35 +00:00
Nga Tran	aaec5104d6	chore: turn compaction cold partition step 1 on to work with our new … (#5726 ) * chore: turn compaction cold partition step 1 on to work with our new memory budget that considers the num_files limitation * chore: run fmt	2022-09-22 14:59:27 +00:00
Nga Tran	e3deb23bcc	feat: add minimum row_count per file in estimating compacting memory… (#5715 ) * feat: add minimum row_count per file in estiumating compacting memory budget and limit number files per compaction * chore: cleanup * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * test: add test per review comments * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * test: add one more test that has limit num files larger than total input files * fix: make the L1 files in tests not overlapped Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-22 14:37:39 +00:00
Carol (Nichols \|\| Goulding)	aa822a40cf	refactor: Move config in with the relevant assertions Now that only one hot test is using a CompactorConfig, move it into that test to avoid spooky action at a distance.	2022-09-21 11:57:57 -04:00
Carol (Nichols \|\| Goulding)	f0bf3bd21c	test: Clarify descriptions for the remaining assertion The assertion remaining in this test is now important because of having multiple shards and showing which partition per shard is chosen.	2022-09-21 11:57:57 -04:00
Carol (Nichols \|\| Goulding)	7c7b058276	refactor: Extract unit test for case 5	2022-09-21 11:57:57 -04:00
Carol (Nichols \|\| Goulding)	f5bd81ff3c	refactor: Extract unit test for case 4	2022-09-21 11:57:57 -04:00
Carol (Nichols \|\| Goulding)	765feaa4d8	refactor: Extract a unit test for case 3	2022-09-21 11:57:57 -04:00
Carol (Nichols \|\| Goulding)	a7a480c1ba	refactor: Extract a unit test for case 2	2022-09-21 11:57:56 -04:00
Carol (Nichols \|\| Goulding)	d95f252a8e	refactor: Extract a unit test for case 1 Also add coverage for when there are no partitions in addition to the test for when there are no parquet files.	2022-09-21 11:57:56 -04:00
Carol (Nichols \|\| Goulding)	9372290ec9	refactor: Use iox_test helpers to simplify test setup	2022-09-21 11:57:56 -04:00
Carol (Nichols \|\| Goulding)	f22627a97f	test: Move an integration test of hot compact_one_partition to lib	2022-09-21 11:57:56 -04:00
Carol (Nichols \|\| Goulding)	a7bb0398e6	test: Move an integration test of compact_candidates_with_memory_budget to the same file	2022-09-21 11:57:56 -04:00
Carol (Nichols \|\| Goulding)	316ebfa8c1	test: Call the smaller inner hot_partitions_for_shard when only one shard is involved	2022-09-21 11:57:56 -04:00
Carol (Nichols \|\| Goulding)	fcf9a9d589	refactor: Move fetching of config from compactor inside hot_partitions_to_compact But still pass them to hot_partitions_for_shard. And make the order of the arguments the same as for recent_highest_throughput_partitions because I've already messed the order up. And make the names the same throughout. This makes the closure passed to get_candidates_with_retry simpler.	2022-09-21 11:57:56 -04:00
Carol (Nichols \|\| Goulding)	48b7876174	refactor: Extract a function for computing query nanoseconds ago	2022-09-21 11:57:56 -04:00
Carol (Nichols \|\| Goulding)	7dcaf5bd3d	refactor: Extract a function for getting hot partitions for one shard	2022-09-21 11:57:56 -04:00
Carol (Nichols \|\| Goulding)	b557c30fd3	refactor: Move hot compaction candidates to the hot module	2022-09-21 11:57:55 -04:00
Carol (Nichols \|\| Goulding)	fa11031a36	refactor: Extract a shared function to retry fetching of compaction candidates	2022-09-21 11:57:55 -04:00
Nga Tran	1d306061b9	chore: disable cold compaction again since its step 1 is the culprit (#5700 )	2022-09-20 20:34:28 +00:00
Nga Tran	34bc02b59b	chore: turn cold comapction on but only compact L0s and thier overlapped L1s (#5698 )	2022-09-20 18:44:36 +00:00
Nga Tran	578ce1854d	chore: temporarily turn off cold compaction to investigate an oom (#5696 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-20 14:17:22 +00:00
Carol (Nichols \|\| Goulding)	414b0f02ca	fix: Use time helper methods in more places	2022-09-19 13:24:08 -04:00
Carol (Nichols \|\| Goulding)	c0c0349bc5	fix: Use typed Time values rather than ns	2022-09-19 12:59:20 -04:00
Carol (Nichols \|\| Goulding)	0e23360da1	refactor: Add helper methods for computing times to TimeProvider	2022-09-19 11:34:43 -04:00
kodiakhq[bot]	eed31bec4e	Merge branch 'main' into cn/share-code-with-full-compaction	2022-09-16 21:15:44 +00:00
Carol (Nichols \|\| Goulding)	20f5f205bc	fix: ChunkOrder should be either max_seq or 0, not min_time	2022-09-16 16:57:31 -04:00
Carol (Nichols \|\| Goulding)	d85e959820	fix: Sort l1 files by min_time rather than max_sequence_number	2022-09-16 16:15:18 -04:00
Carol (Nichols \|\| Goulding)	50ddd588b1	test: Add a case of L1+L2 files being compacted into L2	2022-09-16 16:15:18 -04:00
Carol (Nichols \|\| Goulding)	a8d817c91a	test: Explain expected value	2022-09-16 16:15:18 -04:00
Carol (Nichols \|\| Goulding)	1ab250dfac	fix: Sort chunks taking into account what level compaction is targetting	2022-09-16 16:15:18 -04:00
Carol (Nichols \|\| Goulding)	ca4c5d65e7	docs: Clarify comments on sort order of input/output of filtering	2022-09-16 16:15:17 -04:00
Nga Tran	346ef1c811	chore: reduce number of histogram buckets (#5661 )	2022-09-16 19:44:22 +00:00
Carol (Nichols \|\| Goulding)	cde0a94fd5	fix: Re-enable full compaction to level 2 This will work the same way that compacting level 0 -> level 1 does except that the resulting files won't be split into potentially multiple files. It will be limited by the memory budget bytes, which should limit the groups more than the max_file_size_bytes would.	2022-09-15 14:53:12 -04:00
Carol (Nichols \|\| Goulding)	e05657e8a4	feat: Make filter_parquet_files more general with regards to compaction level	2022-09-15 14:53:08 -04:00
Carol (Nichols \|\| Goulding)	9b99af08e4	fix: Level 1 files need to be sorted by max sequence number for full compaction	2022-09-15 14:53:07 -04:00
Carol (Nichols \|\| Goulding)	dc64e494bd	docs: Update comment to what we'd like this code to do	2022-09-15 14:53:07 -04:00
Carol (Nichols \|\| Goulding)	f5497a3a3d	refactor: Extract a conversion for convenience in tests	2022-09-15 12:48:36 -04:00
Carol (Nichols \|\| Goulding)	dcab9d0ffc	refactor: Combine relevant data with the FilterResult state This encodes the result directly and has the FilterResult hold only the relevant data to the state. So no longer any need to create or check for empty vectors or 0 budget_bytes. Also creates a new type after checking the filter result state and handling the budget, as actual compaction doesn't need to care about that. This could still use more refactoring to become a clearer pipeline of different states, but I think this is a good start.	2022-09-15 11:13:18 -04:00
Carol (Nichols \|\| Goulding)	e57387b8e4	refactor: Extract an inner function so partition isn't needed in tests	2022-09-15 11:10:14 -04:00
Carol (Nichols \|\| Goulding)	a284cebb51	refactor: Store estimated bytes on the CompactorParquetFile	2022-09-15 11:10:14 -04:00
Carol (Nichols \|\| Goulding)	70094aead0	refactor: Make estimating bytes a responsibility of the Partition Table columns for a partition don't change, so rather than carrying around table columns for the partition and parquet files to look up repeatedly, have the `PartitionCompactionCandidateWithInfo` keep track of its column types and be able to estimate bytes given a number of rows from a parquet file.	2022-09-15 11:10:14 -04:00
Nga Tran	7c4c918636	chore: add parttion id into panic message (#5641 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-15 02:21:13 +00:00
kodiakhq[bot]	08e2523295	Merge branch 'main' into cn/always-get-extra-info	2022-09-14 17:01:59 +00:00
Nga Tran	44e12aa512	feat: add needed budget and memory budget into the message for us to diagnose and increase our memory budget as needed (#5640 )	2022-09-14 16:06:19 +00:00
Carol (Nichols \|\| Goulding)	e16306d21c	refactor: Move fetching of extra partition info into the method because it's always needed	2022-09-14 11:14:17 -04:00
kodiakhq[bot]	85641efa6f	Merge branch 'main' into cn/infallible-estimated-bytes	2022-09-14 01:00:10 +00:00
Nga Tran	f21cb43624	feat: add a few more buckets for the histograms (#5621 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-13 13:52:23 +00:00
Andrew Lamb	f86d3e31da	chore: Update datafusion + object_store (#5619 ) * chore: Update datafusion pin * chore: update object_store to 0.5.0 * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-13 12:34:54 +00:00
Carol (Nichols \|\| Goulding)	d971980fd3	fix: Box a source error to please clippy	2022-09-12 17:38:40 -04:00
Carol (Nichols \|\| Goulding)	c3937308f4	fix: Make estimate_arrow_bytes_for_file infallible	2022-09-12 16:50:25 -04:00
Andrew Lamb	1fd31ee3bf	chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 (#5591 ) * chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 * fix: enable dynamic comparison flag * chore: derive Eq for clippy * chore: update explain plans * chore: Update sizes for ReadBuffer encoding * chore: update more tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-12 17:45:03 +00:00
Carol (Nichols \|\| Goulding)	e7a3f15ecf	test: Remove outdated description	2022-09-12 13:13:30 -04:00
Carol (Nichols \|\| Goulding)	8981cbbd84	test: Reduce time from 18 to 9 hours	2022-09-12 13:13:29 -04:00
Carol (Nichols \|\| Goulding)	2ceb779c28	test: Correct a comment that I missed in the 24 hr -> 8 hr switch	2022-09-12 13:13:29 -04:00
Carol (Nichols \|\| Goulding)	baec40a313	test: Correct and expand assertions and descriptions	2022-09-12 13:13:29 -04:00
Carol (Nichols \|\| Goulding)	2aef7c7936	feat: Temporarily disable cold full compaction	2022-09-12 13:13:29 -04:00
Carol (Nichols \|\| Goulding)	743b67f0e9	fix: Re-enable full cold compaction, in serial for now	2022-09-12 13:13:29 -04:00
Carol (Nichols \|\| Goulding)	6e1b06c435	fix: Work with Arc of PartitionCompactionCandidateWithInfo	2022-09-12 13:13:29 -04:00
Carol (Nichols \|\| Goulding)	dfd7255c46	fix: Remove now-unused cold_input_file_count_threshold	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	3a368c02c2	fix: Remove now-unused cold_input_size_threshold_bytes	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	eefc71ac90	fix: Remove now unused max_cold_concurrent_size_bytes	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	2a22d79c94	feat: Make cold compaction like hot compaction except for candidate selection Temporarily disable full compaction from level 1 to 2. Re-use the memory budget estimation and parallelization for cold compaction. Rather than choosing cold compaction candidates and then in parallel compacting each partition from level 0 to 1 and then 1 to 2, this commit switches to compacting in parallel (by memory budget) all candidates form level 0 to 1. The next commit will re-enable full compaction of all partitions in parallel (by memory budget).	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	76228c9fd6	refactor: Move compact_in_parallel and compact_one_partition to lib and make more general Cold compaction is going to use these too.	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	7a3dffb750	refactor: Create wrapper fns that don't take size overrides So that we don't have to pass an empty hashmap in as many places in real code, because the size overrides are only for tests	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	608290b83d	fix: Make some hot compaction code more general/parameterized	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	2a5ef3058c	refactor: Move compact_candidates_with_memory_budget to share with cold	2022-09-12 13:13:28 -04:00
Carol (Nichols \|\| Goulding)	955e7ea824	fix: Remove unused Error struct	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	ee3e1b851d	fix: Clean up some long lines, comments	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	77f3490246	refactor: Extract cold compaction code into a module like hot	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	c12b3fbb03	refactor: Move to a module named hot to reduce naming duplication My fingers are tired of typing 🤣	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	e3f9984878	docs: Clean up some comments while reading through	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	f2f99727ba	feat: Add metrics for files going into cold compaction	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	ad2db51ac2	refactor: Extract a function to share logic for compacting to L1 or L2	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	6436afc3d9	fix: Remove cold max bytes CLI option; use existing max bytes CLI option As discussed in https://github.com/influxdata/influxdb_iox/issues/5330#issuecomment-1218170063	2022-09-12 13:13:27 -04:00
Carol (Nichols \|\| Goulding)	723aedfbca	test: Add more cases for cold compaction	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	7cd78a3020	fix: Extract and test logic that groups files for cold compaction	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	da201ba87f	fix: Select by num of both l0 and l1 files for cold compaction Now that we're going to compact level 1 files in to level 2 files as well.	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	6bba3fafaa	fix: If full compaction group has only 1 file, upgrade level As opposed to running full compaction. Makes the catalog function general and take the level as a parameter rather than only upgrade to level 1.	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	10ba3fef47	feat: Compact cold partitions completely Fixes #5330.	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	327446f0cd	fix: Change default cold hours threshold from 24 hours to 8 As requested in https://github.com/influxdata/influxdb_iox/issues/5330#issuecomment-1212468682	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	a64a705b60	refactor: Extract a fn for the first step of cold compaction Which is currently the only step, compacting any remaining level 0 files into level 1. Make a TODO function for performing full compaction of all level 1 files next.	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	7249ef4793	fix: Don't record cold compaction metrics if compaction fails	2022-09-12 13:13:25 -04:00
Marco Neumann	8933f47ec1	refactor: make `QueryChunk::partition_id` non-optional (#5614 ) In our data model, a chunk always belongs to a partition[^1], so let's not make this attribute optional. The optional value only leads to -- mostly surprising -- conditional behavior, ranging from "do not equalize the partition sort key" (querier) to "always consider the chunk overlapping" (iox_query when dealing with ingester chunks). [^1]: This is even true when the chunk belongs to a parquet file that is not yet added to the catalog, contrary to what a comment in the ingester stated. The catalog and data model used by the querier are two totally different things.	2022-09-12 13:52:51 +00:00
Carol (Nichols \|\| Goulding)	13de7ac954	feat: Record reasons for skipping compaction of a partition in the database Closes #5458.	2022-09-09 16:40:48 -04:00
Nga Tran	f03e370ecc	refactor: allocate more accurate length for a hashmap (#5592 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-09 15:37:29 +00:00
dependabot[bot]	786ce75e26	chore(deps): Bump tokio-util from 0.7.3 to 0.7.4 (#5596 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.3 to 0.7.4. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.3...tokio-util-0.7.4) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-09-09 07:40:16 +00:00
Joe-Blount	333cfa4f3c	chore: address comments - use TimestampMinMax passed by reference	2022-09-07 16:36:39 -05:00
Joe-Blount	97ebad5adb	chore: rustfmt changes	2022-09-07 13:22:36 -05:00
Joe-Blount	4188230694	fix: avoid splitting compaction output for time ranges with no chunks	2022-09-07 13:01:14 -05:00
Carol (Nichols \|\| Goulding)	b5ca99a3d5	refactor: Make CompactorConfig fields pub I'm spending way too long with the wrong number of arguments to CompactorConfig::new and not a lot of help from the compiler. If these struct fields are pub, they can be set directly and destructured, etc, which the compiler gives way more help on. This also reduces duplication and boilerplate that has to be updated when the config fields change.	2022-09-07 13:28:19 -04:00
Carol (Nichols \|\| Goulding)	54eea79773	refactor: Make filtering the parquet files into a closure argument too So that the cold compaction can use different filtering but still use the memory budget function. Not sure I'm happy with this yet, but it's a start.	2022-09-07 13:26:42 -04:00
Carol (Nichols \|\| Goulding)	3e76a155f7	refactor: Make memory budget compaction group function more general In preparation for using it for cold compaction too.	2022-09-07 13:26:42 -04:00
Carol (Nichols \|\| Goulding)	1f69d11d46	refactor: Move hot compaction function into hot compaction module	2022-09-07 13:26:40 -04:00
Carol (Nichols \|\| Goulding)	85fb0acea6	refactor: Extract read_parquet_file test helper function to iox_tests::utils	2022-09-07 13:21:28 -04:00
Marco Neumann	adeacf416c	ci: fix (#5569 ) * ci: use same feature set in `build_dev` and `build_release` * ci: also enable unstable tokio for `build_dev` * chore: update tokio to 1.21 (to fix console-subscriber 0.1.8 * fix: "must use"	2022-09-06 14:13:28 +00:00
Marco Neumann	064f0e9b29	refactor: use DataFusion to read parquet files (#5531 ) Remove our own hand-rolled logic and let DataFusion read the parquet files. As a bonus, this now supports predicate pushdown to the deserialization step, so we can use parquets as in in-mem buffer. Note that this currently uses some "nested" DataFusion hack due to the way the `QueryChunk` interface works. Midterm I'll change the interface so that the `ParquetExec` nodes are directly visible to DataFusion instead of some opaque `SendableRecordBatchStream`.	2022-09-05 09:25:04 +00:00
Marco Neumann	f45cbfb88d	refactor: fine-grained file size mocking (#5541 ) * refactor: do not override parquet file size in querier This is going to be an issue when we actually rely on the size for reading, see #5531. * refactor: use selected file size mocking in compactor Do not blindly override parquet file sizes for all subsystems. This is going to be an issue when we actually rely on the size for reading, see #5531. * refactor: remove ability to override file sizes in catalog Blindly overriding data for all subsystems is dangerous, because some parts of our stack actually rely on the actual file size. See #5531. * docs: explain `size_overrides`	2022-09-05 08:50:04 +00:00
Nga Tran	dde65fa7ef	fix: remove timestamp functions from SQLs to be able to use index for improving performance (#5547 )	2022-09-02 19:43:52 +00:00
kodiakhq[bot]	b9959fa2d8	Merge branch 'main' into cn/even-more-compactor-tests	2022-09-01 21:02:04 +00:00
Nga Tran	c8cbc5299b	feat: make compactors to select candidates based on the last n minutes (#5535 ) * feat: make compactors to select candidates based on the last n minutes to reduce workload for postgres catalog query * refactor: remove 1-minute case per review comment	2022-09-01 20:07:26 +00:00
Carol (Nichols \|\| Goulding)	16d631a247	test: Add test for current behavior of skipping a table without columns	2022-08-31 16:26:02 -04:00
Carol (Nichols \|\| Goulding)	1120b49821	refactor: Extract the mock compactor function into a type	2022-08-31 16:17:43 -04:00
Carol (Nichols \|\| Goulding)	b893251efc	test: Add a test that compacting no candidates compacts nothing	2022-08-31 15:30:25 -04:00
Carol (Nichols \|\| Goulding)	b0e871196c	test: Use more iox test utils in this compactor test	2022-08-31 14:37:59 -04:00
Nga Tran	a32d5180b3	fix: loop forever in compact_hot_partition_candidates (#5518 ) * fix: loop forever in compact_hot_partition_candidates * chore: cleanup * fix: avoid using continues that will cause bugs in corner cases * fix: Pass compaction fn as a closure instead to allow collection of groups in test * fix: Add Send bound as suggested by clippy * fix: fix the test to return data of round 3 instead of round 2 Co-authored-by: Carol (Nichols \|\| Goulding) <carol.nichols@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-31 17:46:59 +00:00
Andrew Lamb	6669d85fb4	chore: Update datafusion + arrow/parquet to `21.0.0` (#5519 ) * chore: Update arrow/arrow-flight/parquet to 21.0.0 * chore: Update datafusion pin * chore: Fix arrow update script * chore: Update Cargo.lock * chore: Update for new API	2022-08-31 13:30:47 +00:00
Nga Tran	cb10a7c6d8	feat: More accurate memory estimate for compaction (#5471 ) * feat: initial implementation of memory estimation for a compaction * feat: estimate size of files and have the right actions for the needed budget * feat: run candidates in parallel * fix: have the right name for the column field of the output struct * feat: add metrics for estimated budgets * chore: cleanup * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * fix: fix syntax after applying review's suggestions * refactor: Convert a Vec to VecDeque to go well with pop and push * chore: remove max_concurrent_size_bytes and input_size_threshold_bytes * chore: remove input_file_count_threshold * test: tests for estimate_arrow_bytes_for_file Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-30 13:44:44 +00:00
Dom Dwyer	2fc0ddbea1	fix: compactor tolerates empty output Changes the compactor code to tolerate a SplitExec yielding an empty partition (with no rows). This raises a WARN as the situation in which this is acceptable is very rare, and is more likely indicative of an opportunity to improve the SplitExec usage (i.e. pruning out unnecessary split points).	2022-08-30 14:52:31 +02:00
Carol (Nichols \|\| Goulding)	58f0b63cdc	refactor: Rename KafkaTopic to Topic or TopicMetadata or topic name as appropriate	2022-08-29 14:27:02 -04:00
Carol (Nichols \|\| Goulding)	74c9529062	fix: Rename KafkaPartition to ShardIndex	2022-08-29 14:07:18 -04:00
Carol (Nichols \|\| Goulding)	c9567cad7d	fix: Rename some more sequencer to shard	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	6443858870	fix: Rename compactor option from sequencer to shard	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	fe9c474620	fix: rustfmt	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	f6c93f7e67	fix: Remove moot comment	2022-08-29 14:06:44 -04:00
Carol (Nichols \|\| Goulding)	698f1a47ff	refactor: Rename test structures from sequencer to shard where appropriate	2022-08-29 14:06:44 -04:00
Jake Goulding	4abf21c724	refactor: Rename Sequencer (and its entourage) to Shard	2022-08-29 14:06:43 -04:00
Nga Tran	3220c6f88b	feat: add file_count_threshold for comapcting cold partitions (#5456 ) * feat: file file_count_threshold for comapcting cold partitions to make it consistent with the hot case and help set up to avoid oom easier * chore: remove unecessary commments	2022-08-23 20:12:21 +00:00
kodiakhq[bot]	2b3ca54168	Merge branch 'main' into cn/upgrade-l0-metrics	2022-08-17 16:01:42 +00:00
Andrew Lamb	7f0ae53d6f	chore: Update to (almost) released object_store 0.4.0 (#5419 ) * chore: update object_store * chore: update hakari config * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-17 13:44:48 +00:00
Carol (Nichols \|\| Goulding)	ef716a5b90	fix: Remove compaction level attribute from the compaction_input_file_bytes metric	2022-08-15 10:50:04 -04:00
Carol (Nichols \|\| Goulding)	a9ed32df89	fix: Remove compaction_counter as it's now redundant with the compaction_input_file_bytes histogram	2022-08-15 10:23:29 -04:00
Carol (Nichols \|\| Goulding)	af95ce7ca6	feat: Add a histogram tracking sizes of files used as inputs to compaction Fixes #5348.	2022-08-15 10:13:54 -04:00
Carol (Nichols \|\| Goulding)	cd6c809fe0	fix: Change metric tracking sizes of files selected for compaction to a histogram Connects to #5348.	2022-08-15 10:13:54 -04:00
Carol (Nichols \|\| Goulding)	b982bdaf2f	fix: Derive Eq when we derive PartialEq and members can derive Eq Allow this in generated code that we don't control, though. Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq	2022-08-11 15:04:06 -04:00
Marco Neumann	90fec1365f	feat: intern schemas during query planning (#5215 ) * feat: intern schemas during query planning Helps with #5202. * refactor: `SchemaMerger::build` shall return an `Arc` * feat: `SchemaMerger::with_interner` * refactor: hash-based schema interning	2022-08-11 12:28:51 +00:00
Jake Goulding	68e64af4d1	refactor: extract compactor loop body to call it separately	2022-08-10 11:28:51 -04:00
Jake Goulding	49c5281454	refactor: Supersede old CompactorHandlerImpl constructor	2022-08-10 11:28:51 -04:00
Jake Goulding	cc061b6ce9	refactor: add CompactorHandlerImpl::new_with_compactor This will allow us to refactor the code a level up to create a `Compactor` directly.	2022-08-10 11:28:51 -04:00
Andrew Lamb	c0fc91c627	chore: Warn if a parquet file has no sort key (#5368 )	2022-08-10 11:56:50 +00:00

1 2 3 4 5 ...

480 Commits (fad34c375ef2cc9abda28713b2cf8d0675dc0d2d)