Commit Graph

265 Commits (d24fb0eae7159a986a92c4e8be04e99321f4fc3f)

Author SHA1 Message Date
Marco Neumann adeacf416c
ci: fix (#5569)
* ci: use same feature set in `build_dev` and `build_release`

* ci: also enable unstable tokio for `build_dev`

* chore: update tokio to 1.21 (to fix console-subscriber 0.1.8

* fix: "must use"
2022-09-06 14:13:28 +00:00
Marco Neumann 064f0e9b29
refactor: use DataFusion to read parquet files (#5531)
Remove our own hand-rolled logic and let DataFusion read the parquet
files.

As a bonus, this now supports predicate pushdown to the deserialization
step, so we can use parquets as in in-mem buffer.

Note that this currently uses some "nested" DataFusion hack due to the
way the `QueryChunk` interface works. Midterm I'll change the interface
so that the `ParquetExec` nodes are directly visible to DataFusion
instead of some opaque `SendableRecordBatchStream`.
2022-09-05 09:25:04 +00:00
Marco Neumann f45cbfb88d
refactor: fine-grained file size mocking (#5541)
* refactor: do not override parquet file size in querier

This is going to be an issue when we actually rely on the size for
reading, see #5531.

* refactor: use selected file size mocking in compactor

Do not blindly override parquet file sizes for all subsystems.

This is going to be an issue when we actually rely on the size for
reading, see #5531.

* refactor: remove ability to override file sizes in catalog

Blindly overriding data for all subsystems is dangerous, because some
parts of our stack actually rely on the actual file size. See #5531.

* docs: explain `size_overrides`
2022-09-05 08:50:04 +00:00
Nga Tran dde65fa7ef
fix: remove timestamp functions from SQLs to be able to use index for improving performance (#5547) 2022-09-02 19:43:52 +00:00
kodiakhq[bot] b9959fa2d8
Merge branch 'main' into cn/even-more-compactor-tests 2022-09-01 21:02:04 +00:00
Nga Tran c8cbc5299b
feat: make compactors to select candidates based on the last n minutes (#5535)
* feat: make compactors to select candidates based on the last n minutes to reduce workload for postgres catalog query

* refactor: remove 1-minute case per review comment
2022-09-01 20:07:26 +00:00
Carol (Nichols || Goulding) 16d631a247
test: Add test for current behavior of skipping a table without columns 2022-08-31 16:26:02 -04:00
Carol (Nichols || Goulding) 1120b49821
refactor: Extract the mock compactor function into a type 2022-08-31 16:17:43 -04:00
Carol (Nichols || Goulding) b893251efc
test: Add a test that compacting no candidates compacts nothing 2022-08-31 15:30:25 -04:00
Carol (Nichols || Goulding) b0e871196c
test: Use more iox test utils in this compactor test 2022-08-31 14:37:59 -04:00
Nga Tran a32d5180b3
fix: loop forever in compact_hot_partition_candidates (#5518)
* fix: loop forever in compact_hot_partition_candidates

* chore: cleanup

* fix: avoid using continues that will cause bugs in corner cases

* fix: Pass compaction fn as a closure instead to allow collection of groups in test

* fix: Add Send bound as suggested by clippy

* fix: fix the test to return data of round 3 instead of round 2

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-31 17:46:59 +00:00
Andrew Lamb 6669d85fb4
chore: Update datafusion + arrow/parquet to `21.0.0` (#5519)
* chore: Update arrow/arrow-flight/parquet to 21.0.0

* chore: Update datafusion pin

* chore: Fix arrow update script

* chore: Update Cargo.lock

* chore: Update for new API
2022-08-31 13:30:47 +00:00
Nga Tran cb10a7c6d8
feat: More accurate memory estimate for compaction (#5471)
* feat: initial implementation of memory estimation for a compaction

* feat: estimate size of files and have the right actions for the needed budget

* feat: run candidates in parallel

* fix: have the right name for the column field of the output struct

* feat: add metrics for estimated budgets

* chore: cleanup

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: fix syntax after applying review's suggestions

* refactor: Convert a Vec to VecDeque to go well with pop and push

* chore: remove max_concurrent_size_bytes and input_size_threshold_bytes

* chore: remove input_file_count_threshold

* test: tests for estimate_arrow_bytes_for_file

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-30 13:44:44 +00:00
Dom Dwyer 2fc0ddbea1 fix: compactor tolerates empty output
Changes the compactor code to tolerate a SplitExec yielding an empty
partition (with no rows).

This raises a WARN as the situation in which this is acceptable is very
rare, and is more likely indicative of an opportunity to improve the
SplitExec usage (i.e. pruning out unnecessary split points).
2022-08-30 14:52:31 +02:00
Carol (Nichols || Goulding) 58f0b63cdc
refactor: Rename KafkaTopic to Topic or TopicMetadata or topic name as appropriate 2022-08-29 14:27:02 -04:00
Carol (Nichols || Goulding) 74c9529062
fix: Rename KafkaPartition to ShardIndex 2022-08-29 14:07:18 -04:00
Carol (Nichols || Goulding) c9567cad7d
fix: Rename some more sequencer to shard 2022-08-29 14:06:45 -04:00
Carol (Nichols || Goulding) 6443858870
fix: Rename compactor option from sequencer to shard 2022-08-29 14:06:45 -04:00
Carol (Nichols || Goulding) fe9c474620
fix: rustfmt 2022-08-29 14:06:45 -04:00
Carol (Nichols || Goulding) f6c93f7e67
fix: Remove moot comment 2022-08-29 14:06:44 -04:00
Carol (Nichols || Goulding) 698f1a47ff
refactor: Rename test structures from sequencer to shard where appropriate 2022-08-29 14:06:44 -04:00
Jake Goulding 4abf21c724
refactor: Rename Sequencer (and its entourage) to Shard 2022-08-29 14:06:43 -04:00
Nga Tran 3220c6f88b
feat: add file_count_threshold for comapcting cold partitions (#5456)
* feat: file file_count_threshold for comapcting cold partitions to make it consistent with the hot case and help set up to avoid oom easier

* chore: remove unecessary commments
2022-08-23 20:12:21 +00:00
kodiakhq[bot] 2b3ca54168
Merge branch 'main' into cn/upgrade-l0-metrics 2022-08-17 16:01:42 +00:00
Andrew Lamb 7f0ae53d6f
chore: Update to (almost) released object_store 0.4.0 (#5419)
* chore: update object_store

* chore: update hakari config

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-17 13:44:48 +00:00
Carol (Nichols || Goulding) ef716a5b90
fix: Remove compaction level attribute from the compaction_input_file_bytes metric 2022-08-15 10:50:04 -04:00
Carol (Nichols || Goulding) a9ed32df89
fix: Remove compaction_counter as it's now redundant with the compaction_input_file_bytes histogram 2022-08-15 10:23:29 -04:00
Carol (Nichols || Goulding) af95ce7ca6
feat: Add a histogram tracking sizes of files used as inputs to compaction
Fixes #5348.
2022-08-15 10:13:54 -04:00
Carol (Nichols || Goulding) cd6c809fe0
fix: Change metric tracking sizes of files selected for compaction to a histogram
Connects to #5348.
2022-08-15 10:13:54 -04:00
Carol (Nichols || Goulding) b982bdaf2f
fix: Derive Eq when we derive PartialEq and members can derive Eq
Allow this in generated code that we don't control, though.

Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq
2022-08-11 15:04:06 -04:00
Marco Neumann 90fec1365f
feat: intern schemas during query planning (#5215)
* feat: intern schemas during query planning

Helps with #5202.

* refactor: `SchemaMerger::build` shall return an `Arc`

* feat: `SchemaMerger::with_interner`

* refactor: hash-based schema interning
2022-08-11 12:28:51 +00:00
Jake Goulding 68e64af4d1
refactor: extract compactor loop body to call it separately 2022-08-10 11:28:51 -04:00
Jake Goulding 49c5281454
refactor: Supersede old CompactorHandlerImpl constructor 2022-08-10 11:28:51 -04:00
Jake Goulding cc061b6ce9
refactor: add CompactorHandlerImpl::new_with_compactor
This will allow us to refactor the code a level up to create a
`Compactor` directly.
2022-08-10 11:28:51 -04:00
Andrew Lamb c0fc91c627
chore: Warn if a parquet file has no sort key (#5368) 2022-08-10 11:56:50 +00:00
Andrew Lamb 16ddc5efc6
chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360)
* chore: Update datafusion and arrow

* chore: Update Cargo.lock

* chore: update to Decimal128

* chore: Update tonic/prost/pbjson/etc

* chore: Run cargo hakari tasks

* fix: doctest in generated types

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-09 17:30:44 +00:00
Nga Tran b71c1a09ea
feat: only sleep when there are neither hot nor cold partitions to compact (#5329)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-05 16:36:36 +00:00
Carol (Nichols || Goulding) facc967320
fix: Specify hot or cold in more log messages 2022-08-04 16:55:48 -04:00
Carol (Nichols || Goulding) c9d66c30b1
fix: Make this field name consistent
With the other fields on this struct and with the corresponding field on
the clap block struct.
2022-08-04 16:55:48 -04:00
Carol (Nichols || Goulding) da0b031c44
feat: Add parameters to limit total memory usage of cold partition compaction 2022-08-04 16:55:48 -04:00
Carol (Nichols || Goulding) 9d8f94d0d7
fix: Remove an unneeded sleep
The cold case won't make a hot busy loop (hah), we'll just go back to
working on the hot partitions if there's no cold partitions to do.
2022-08-04 16:55:48 -04:00
Carol (Nichols || Goulding) e1c45e836a
test: Remove copypastaed assertions that duplicate a different test 2022-08-04 16:55:48 -04:00
Carol (Nichols || Goulding) cb6442018e
test: Add more test cases varying number of partitions per sequencer 2022-08-04 16:55:48 -04:00
Carol (Nichols || Goulding) d55f45a5c2
feat: Run compaction of hot partitions a configurable number of times more than cold 2022-08-04 16:55:48 -04:00
Carol (Nichols || Goulding) 827e82cfb8
feat: Upgrade one level 0, non-overlapping file without compacting
Fixes #1078.
2022-08-04 16:55:47 -04:00
Carol (Nichols || Goulding) c1d016a00a
feat: Upgrade cold level 0 files when they have no overlaps 2022-08-04 16:55:47 -04:00
Carol (Nichols || Goulding) 9052eabe50
feat: Separate out hot/cold partition compaction and filtering
Cold partition compaction will (in the next commit) upgrade a level 0
file without any overlaps rather than running compaction.

Cold partition filtering gathers all level 0 files in the (already
deemed cold) partition with all overlapping level 1 files, and does not
limit the set of files being compacted by their number or size.
2022-08-04 16:55:47 -04:00
Carol (Nichols || Goulding) fc62c82722
feat: Select cold partitions 2022-08-04 16:55:47 -04:00
Carol (Nichols || Goulding) 6e9c752230
refactor: Extract current compaction into a fn for 'hot' partitions 2022-08-04 16:55:47 -04:00
Marco Neumann eea8270e83
fix: `compute_split_time` with small step sizes (#5309)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-04 13:40:30 +00:00