We don't need a validated IOx schema in this method. This will simplify
some work on #6098.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
If a partition takes longer than `partition_timeout` to compact, but it
did make _some_ progress, let the compactor try that partition again at
a later time so that compaction for the partition will eventually
complete.
If a partition times out and _no_ progress has been made, then still add
it to the skipped_compactions table because it's either too big to ever
compact or is otherwise stuck.
Closesinfluxdata/idpe#17234.
* feat: implement gap fill with previous value
* test: update fill prev test to include null value
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
When the ingester handles a query for a table/namespace
that has been persisted and the data is not in the
buffer it logs at error level that the table/namespace
could not be found. This is a valid state and an
expected error (the data can be queried from object
storage), so we can make this less noisy.
Adds three metrics to expose the internal state of the WAL file
reference tracker.
These metrics are mainly useful to identify why WAL files are not being
deleted, if any.
* test: common comapctor use cases
* test: add 3 L0 files during last comapction
* chore: clearer comments
* test: add intermediate test results per review request
* chore: comment only change to trigger circle CI
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* test: add test with less ingested data and fix output after main merge
* chore: run format after pulling suggestions
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
The assert_counter! and assert_histogram! macros use items in the metric
crate, but the macros can be called from other crates/modules that may
not have those items in scope.
* fix: Remove the max_compact_size knob and hardcode a multiple
Rather than panic if the user hasn't set this knob in a particular way,
set the max_compact_size to the minimum value we need by multiplying
max_desired_file_size_bytes by MIN_COMPACT_SIZE_MULTIPLE.
Fixesinfluxdata/idpe#17259.
* refactor: Move computation of max_compact_size_bytes into compactor config
* test: change test setups to reflect the purposes of the tests
---------
Co-authored-by: NGA-TRAN <nga-tran@live.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: add a panic limit_files_to_compact function to it is used in the right way
* test: provide correct output to the tests
* chore: remove no-longer valid comments
* feat: have the function limit_files_to_compact to also return files_to_further_split if the minimum set to compact is too large to do so
* refactor: rename files_to_split to start_level_files_to_split
* refactor: rename identify_files_to_split to identify_start_level_files_to_split before adding new split function
* feat: split 2 files of minimum set of compacting files if they are over max compact size
* test: since now we may split files in different levels, let us remove the missleading at level from the simulation tests
* chore: clearer comments
* test: add tests for tiny time ranges
* chore: address review comments
Reference count rotated WAL segment files, tracking the number of
unpersisted operations in each and deleting the file once all the data
within them has been persisted to object storage.
This commit contains the implementation of the reference counting logic,
and is currently unused. A follow-up PR will wire this into the various
places needed to feed it the necessary information.
Part of https://github.com/influxdata/influxdb_iox/issues/6566.
Allow an owned SequenceNumberSet to be destructured from an owned
CompletedPersist notification.
Additionally allow an owned SequenceNumberSet to be obtained from a
(potentially) shared Arc<CompletedPersist> in a memory efficient / no
cloning way. A CompletedPersist notification is typically wrapped in an
Arc, and at time of writing, typically referenced only once.
* feat: projection pushdown phys. optimizer
The is by far the largest pass (at least test-wise), because projections
are added last in the naive plan and you have to push them through
everything else. The actual code however isn't that complicated mostly
because we can reuse some DataFusion functionality and the different
variants for the different "child nodes" are very similar.
For #6098.
* feat: projection pushdown for `RecordBatchesExec`
* test: `test_ignore_when_partial_impure_projection_rename`
* test: more dedup projection tests
* test: integration