This is helpful to test changes in our defaults but also for testing.
Required for https://github.com/influxdata/idpe/issues/17474 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update datafusion and arrow/parquet to 37, tonic to 0.9.1
* refactor: Update for FieldRef and other API changes
* fix: Update field size calculation
* fix: Use `NullBuffer` directly
* fix: remove outdated comment
* chore: Update test for tonic
* chore: Run cargo hakari tasks
* chore: cargo update
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: move logic for knowing how much to buffer into GapFiller
* chore: clippy
* chore: add some clarifying comments
* refactor: clean up relationships between gap filling types
* refactor: remove use of RefCell from BufferedInput
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
1. Add loads of tests for `ChunkTableProvider::scan` (= the naive phys.
plan before running any phys. optimizers)
2. Fix interaction of "no de-dup" and predicate pushdown. This might
be used by the ingester at some point and I would like to have this
correct before someone silently introduces a bug by pushing field
predicates into the ingester.
This is mostly prep-work for #7406 so I know that test coverage is
sufficient.
* feat: "parquet sortness" optimizer pass
Trade wider fan-out for the not having to fully sort parquet files.
For #6098.
* test: rename
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
With #6098 our `TableProvider` will declare `supports_filter_pushdown`
as "exact" since we handle the predicate pushdown ourselves. This has
two effects:
1. The phys. plan no longer contains an additional `FilterExec` node
even if we already do all the correct filtering. This will improve
performance.
2. The logical plan no longer contains a `Filter` node but instead the
predicate is part of the `TableScan`. This simplifies the logical
plan.
For (2) we need to adjust the gap fill logical optimizer to find the
time range again. Otherwise the optimizer pass will fail (which is
currently somewhat swallowed by DataFusion even though it is logged) and
the physical plan will contain our placeholder UDFs that are not
executable.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
We should resort properly when performing projection pushdown. Extended
test utils to actually catch this by checking the plan schemas.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: update gap fill planner rule to use LOCF
* chore: cargo fmt
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: projection pushdown should project `ParquetExec` ordering
Bug found while working on the final steps for #6098.
* fix: Update expected output
* test: make test even harder
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Try to combine chunks even when not all Union-arms/inputs are
combinable. This will later help to transform
```yaml
---
union:
- parquet:
files: [f1]
- parquet:
files: [f2]
- dedup:
parquet:
files: [f3]
```
into
```yaml
---
union:
- parquet:
files: [f1, f2]
- dedup:
parquet:
files: [f3]
```
Helps #6098.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update DataFusion
* refactor: Update predicate crate for new transform API
* refactor: Update iox_query crate for new APIs
* refactor: Update influxql for new API
* chore: Run cargo hakari tasks
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
`extract_chunks` never runs after predicate pushdown. However IF this
should ever happen, we would potentially forget the predicates attached
to `ParquetExec`. So let's make sure we refuse chunk extraction in this
case. This is similar to the existing behavior, i.e. we don't support
chunk extraction after filter pushdown (i.e. if there is a filter around
an `RecordBatchesExec`).
For #6098.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This is helpful so that optimizer passes to forget the sort key, esp.
when the run after `DedupNullColumns` and `DedupSortOrder`.
For #6098.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Similar to #7217 there is no need to convert the arrow schema to an IOx
schema. This also makes it easier to handle the chunk order column in #6098.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
We don't need a validated IOx schema in this method. This will simplify
some work on #6098.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: implement gap fill with previous value
* test: update fill prev test to include null value
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: projection pushdown phys. optimizer
The is by far the largest pass (at least test-wise), because projections
are added last in the naive plan and you have to push them through
everything else. The actual code however isn't that complicated mostly
because we can reuse some DataFusion functionality and the different
variants for the different "child nodes" are very similar.
For #6098.
* feat: projection pushdown for `RecordBatchesExec`
* test: `test_ignore_when_partial_impure_projection_rename`
* test: more dedup projection tests
* test: integration