Commit Graph

330 Commits (4638b89d9334a0f28398bd46175e3f72741fc26a)

Author SHA1 Message Date
Marco Neumann 4638b89d93
refactor: migrate retention to proper predicates (#8092)
Do not (ab)use per-chunk delete predicates for the retention policy.
Instead use a per-table predicate.

This makes the code way cleaner, since the scoping is correct (i.e.
delete predicates are a table-wide attribute, not a chunk-based one) and
it is consistent time predicates that the user providers (e.g. via
`WHERE time > x`).

It also allows us to remove delete predicates (in their current,
non-scalable form) from the query path. A potential future version would
likely not use per chunk predicates (and "is processed" markers) but use
the timestamp / chunk order to determine to which data the predicate
should be applied.

Note that the lowering of the retention policy changed slightly from

```text
(time > (now() - retention)) AND (time < MAX)
```

to

```text
time > (now() - retention)
```

Since the `MAX` cut is just an artifact of the lowering and was unnecessary.

Closes #7409.
Closes #7410.
2023-06-29 08:36:37 +00:00
dependabot[bot] b15c6062a9
chore(deps): Bump tokio from 1.28.2 to 1.29.0 (#8100)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.28.2 to 1.29.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.28.2...tokio-1.29.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-28 13:18:08 +00:00
dependabot[bot] 990044dcb2
chore(deps): Bump indexmap from 1.9.3 to 2.0.0 (#8073)
* chore(deps): Bump indexmap from 1.9.3 to 2.0.0

Bumps [indexmap](https://github.com/bluss/indexmap) from 1.9.3 to 2.0.0.
- [Changelog](https://github.com/bluss/indexmap/blob/master/RELEASES.md)
- [Commits](https://github.com/bluss/indexmap/compare/1.9.3...2.0.0)

---
updated-dependencies:
- dependency-name: indexmap
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-26 08:52:51 +00:00
Marco Neumann 7322f238fb
docs: query processing (#8033)
* docs: query processing

Closes https://github.com/influxdata/idpe/issues/17770 .

* docs: apply recommendations

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* docs: improve description of the flight protocol

* docs: link `LogicalPlan`

* docs: link `ExecutionPlan`

* docs: improve wording

* docs: improve query planning docs

---------

Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2023-06-23 09:13:14 +00:00
dependabot[bot] 74a48a8f63
chore(deps): Bump itertools from 0.10.5 to 0.11.0 (#8060)
* chore(deps): Bump itertools from 0.10.5 to 0.11.0

Bumps [itertools](https://github.com/rust-itertools/itertools) from 0.10.5 to 0.11.0.
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-itertools/itertools/compare/v0.10.5...v0.11.0)

---
updated-dependencies:
- dependency-name: itertools
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 08:11:56 +00:00
Andrew Lamb fb0674fc01
Revert "chore: Update datafusion + arrow/arrow-flight/parquet to version `42.0.0` (#8036)" (#8049)
This reverts commit 70ffedadc7.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-22 11:03:25 +00:00
Andrew Lamb 70ffedadc7
chore: Update datafusion + arrow/arrow-flight/parquet to version `42.0.0` (#8036)
* chore: Update datafusion + arrow/arrow-flight/parquet to version `42.0.0`

* chore: Update for new APIs

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-21 16:11:36 +00:00
Stuart Carnie e10b8c93c8
chore: Update DataFusion and other dependencies (#8014)
* chore: Update DataFusion pin

* chore: Update API changes

* chore: Don't use deprecated API

* chore: Run cargo hakari tasks

* chore: Update tests due to changes in logical plan nodes from DF update

* chore: Fix broken links in docs

* chore: Adjust changes to expected output

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2023-06-16 10:39:36 +00:00
Andrew Lamb 5889c96501
chore: Update `datafusion` and other dependencies (#7981)
* chore: Update DatFaFusion pin

* chore: Update other dependencies

* chore: Update hakari

* fix: Update for API changes

* fix: Update explain plan

* fix: Update influxql plans

* fix: rustdoc links

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-16 09:48:55 +00:00
Marco Neumann 8ef1b64f6a
fix: remove de-dup even if we have many partitions (#8004)
See optimizer pipeline here:

5d0bb68c5b/iox_query/src/physical_optimizer/mod.rs (L33-L35)

After generating the naive initial plan w/ many partitions, we must
consider more than 100 partitions to split the key space.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-15 15:25:25 +00:00
Andrew Lamb 17c0d837b3
chore: Update DataFusion, arrow, object_store pins (#7942)
* chore: Update DataFusion, arrow, object_store pins

* chore: Update for hakari

* chore: Update for new APIs

* fix: update test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-07 17:08:31 +00:00
Andrew Lamb f571aeb445
chore: Update DataFusion pin (#7916)
* chore: Update DataFusion pin

* chore: Update cargo

* fix: update for API changes

* fix: Update plans

* chore: Update for new api

* fix: Update plans

* chore: Update for API changes more

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-05 18:38:59 +00:00
Stuart Carnie da682d8c53
chore: clippy 🧠 2023-06-04 07:23:11 +10:00
Marco Neumann fa5011197c
refactor: migrate `iox_query` to use DataFusion statistics (#7908)
This is the major part of #7470. Additional clean ups (e.g. to remove
the actual types from `data_types`) will follow.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-02 09:18:59 +00:00
Marco Neumann 72ff001d33
feat: aggregator for DataFusion statistics (#7904)
* feat: aggregator for DataFusion statistics

Required to implement #7470, esp. to implement the statistics folding
done within `RecordBatchesExec`.

* docs: improve
2023-06-01 16:11:30 +00:00
Andrew Lamb a48f681e56
feat(parquet): reduce and limit buffering when writing parquet files (#7880)
* feat: limit buffering when writing parquet files ("combined solution")

* chore: Run cargo hakari tasks

---------

Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-31 13:27:32 +00:00
Andrew Lamb 1ff76b7bf2 chore: use workspace dependencies for `object_store` 2023-05-26 07:03:42 -04:00
Marco Neumann bc18c6dc5f
refactor: re-land #7815. (#7852)
* refactor: consolidate pruning code

Let's have a single chunk pruning implementation in our code, not two.

Also removes a bit of crust from `QueryChunk` since it is technically no
longer responsible for pruning (this part has been pushed into the
querier for early pruning and bits for the `iox_query_influxrpc` for
some RPC shenanigans).

* test: regression test for incident

* fix: chunk pruning

* docs: add some test notes
2023-05-24 09:46:49 +00:00
Dom Dwyer 928a4d163e
build: remove unused dependencies from crates
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).

I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!

    https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html

This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
2023-05-23 14:55:43 +02:00
Marco Neumann 6c0f50a473
revert: refactor: consolidate pruning code (#7815) (#7847)
This reverts commit db9fe92981.

Likely causing an incident, see https://app.incident.io/incidents/267 .
2023-05-23 08:01:53 +00:00
Marco Neumann db9fe92981
refactor: consolidate pruning code (#7815)
Let's have a single chunk pruning implementation in our code, not two.

Also removes a bit of crust from `QueryChunk` since it is technically no
longer responsible for pruning (this part has been pushed into the
querier for early pruning and bits for the `iox_query_influxrpc` for
some RPC shenanigans).
2023-05-22 08:42:20 +00:00
Andrew Lamb 6344fe8c3f
chore: Add rationale for `clippy::future_not_send` (#7822)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-18 16:58:56 +00:00
Marco Neumann 7e64264eef
refactor: remove `RedudantSort` optimizer pass (#7809)
* test: add dedup test for multiple partitions and ranges

* refactor: remove `RedudantSort` optimizer pass

Similar to #7807 this is now covered by DataFusion, as demonstrated by
the fact that all query tests (incl. explain tests) still pass.

The good thing is: passes that are no longer required don't require any
upstreaming, so this also closes #7411.
2023-05-17 09:30:04 +00:00
Marco Neumann 931b4488bd
refactor: remove `SortPushdown` optimizer pass (#7807)
DataFusion is now smart enough to do that using the builtin passes. No
`EXPLAIN` tests regressed.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-17 06:26:31 +00:00
Nga Tran ca12f1c03d
fix: correctly recurse in `ParquetSortness` (#7778)
* test: reproducer for idpe_17556

* fix: `ParquetSortness` and partial opt

1. correctly handle cases where `ParquetSortness` would optimize one
   child branch but not the other
2. handle cases where `ParquetSortness` recusion should stop a bit
   clearer (using `TreeNodeRewriter`)
3. rename query tests to be a bit clearer
4. add test case with many (but not too many) duplicate files and an
   ingester (basically a prod use case where the compactor is slightly
   behind)

---------

Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-17 06:09:23 +00:00
Marco Neumann d3ff945117
refactor: remove output sorting from scan provider (#7798)
This is somewhat a left-over from the old phys. plan construction where
we tried to fold in the sorts at the right place. Now the optimizer
takes care of that, so we can just express this as a standard logical
node (the same as SQL and InfluxQL). This makes the plan construction a
bit cleaner since the actual scan provider only performs the minimal
work that is required by DataFusion and the users (SQL, InfluxQL, reorg)
request what they actually need.

The tests in `iox_query::frontend::reorg::tests` that assert the tests
still pass and proof that the actual physical plans are identical w/
this approach.

Closes #7785.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-17 05:59:04 +00:00
Chunchun Ye 2bb6445668
chore: update DataFusion and arrow / arrow-flight / parquet to `39.0.0` (#7793)
* chore: update DataFusion and arrow/parquet/arrow-flight to 39.0.0

* chore: update DataFusion and arrow/parquet/arrow-flight to 39.0.0 in workspace-hack/Cargo.toml

* chore: Run cargo hakari tasks

* chore: fix CI test and lint

* chore: update csv schema

* refactor: remove type-annotate for `Arc`

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-16 13:42:26 +00:00
Andrew Lamb 7735e7c95b
chore: Update DataFusion again (#7777)
* chore: Update datafusion again

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-15 12:38:45 +00:00
Andrew Lamb 2860d87fe1
chore: Update DataFusion (#7756)
* chore: Update DataFusion pin

* chore: Update explain plans

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-05-05 18:58:18 +00:00
Christopher M. Wolff 55b35367ac
test: add test for gap fill query missing time bounds (#7747)
* test: add test for gap fill query missing time bounds

* chore: update unit test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-04 21:01:45 +00:00
Christopher M. Wolff 05688799c4
fix: handle aliases in gapfill aggregate columns (#7725)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-03 15:20:14 +00:00
Andrew Lamb 2b1f8b56e2
chore: Update DataFusion (#7719)
* chore: Update DataFusion

* chore: update for API change

* chore: update some tests

* fix: Update plans in optimizer

* chore: Update plans

* chore: Update error messages

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-05-02 17:55:04 +00:00
Andrew Lamb 530ee94558
fix: use correct sort key in projection_pushdown (#7718)
* fix: use correct sort key in projection_pushdown

* fix: tabs in docs

* refactor: Use Serde to format test results
2023-05-02 16:50:04 +00:00
Christopher M. Wolff 493b26831d
fix: make influx RPC interface break up series into multiple frames (#7691)
* fix: make influx RPC interface break up series into multiple frames

* refactor: code review feedback

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-01 20:18:05 +00:00
Marco Neumann 0556fdae53
refactor: remove `QueryChunk::partition_sort_key` (#7680)
As of #7250 / #7449 the partition sort key is no longer required for
query planning. Instead we use a combination of
`QueryChunk::partition_id` and `QueryChunk::sort_key` which is more
robust and easier to reason about.

Removing it simplifies the querier code a lot since we no longer need to
have a sort key for the ingester chunks and also don't need to "sync"
the sort key between chunks for consistency.
2023-04-27 10:54:41 +00:00
dependabot[bot] bdf7f316d7
chore(deps): Bump tokio from 1.27.0 to 1.28.0 (#7667)
* chore(deps): Bump tokio from 1.27.0 to 1.28.0

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.27.0 to 1.28.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.27.0...tokio-1.28.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Dom <dom@itsallbroken.com>
2023-04-26 12:53:26 +00:00
Christopher M. Wolff 7a6862ee3a
refactor: let date_bin_gapfill allow omitted origin (#7595)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-19 14:56:52 +00:00
Marco Neumann d7dc305972
feat: allow overwriting DataFusion's default config (#7586)
This is helpful to test changes in our defaults but also for testing.

Required for https://github.com/influxdata/idpe/issues/17474 .

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-18 11:28:45 +00:00
Andrew Lamb f46d06d56f
chore: Update DataFusion + arrow ecosystem to 37 (#7544)
* chore: Update datafusion and arrow/parquet to 37, tonic to 0.9.1

* refactor: Update for FieldRef and other API changes

* fix: Update field size calculation

* fix: Use `NullBuffer` directly

* fix: remove outdated comment

* chore: Update test for tonic

* chore: Run cargo hakari tasks

* chore: cargo update

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-14 12:43:01 +00:00
Andrew Lamb 134ff2ef83
chore: update DataFusion pin (right before arrow 37 update) (#7540)
* chore: update DataFusion pin

* refactor: Update for deprecated API

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-13 17:25:24 +00:00
Andrew Lamb 3ebd07358b
chore: Update DataFusion pin, upgrade `date_bin` and `InfluxQL` to use `Interval(MonthDayNano)` (#7516)
* chore: Update datafusion

* chore: Update for change in PhysicalSortExpr

* refactor: Update date_bin_gapfill to take IntervalMonthDayNano, fix FlightSQL

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-13 10:43:32 +00:00
Christopher M. Wolff cbd747db44
feat: update gap fill planner rule to use `interpolate` (#7494)
* feat: add INTERPOLATE fn and update planner gap-fill planner rule

* test: add an end-to-end test for interpolate()
2023-04-12 21:51:44 +00:00
Christopher M. Wolff 0937615dba
fix: make interpolate() fill null values in input (#7490)
* fix: make interpolate() fill null values in input

* chore: cargo doc
2023-04-12 21:41:11 +00:00
Christopher M. Wolff 3e60369eff
refactor: input buffering for gap filling interpolate null-as-missing (#7478)
* refactor: move logic for knowing how much to buffer into GapFiller

* chore: clippy

* chore: add some clarifying comments

* refactor: clean up relationships between gap filling types

* refactor: remove use of RefCell from BufferedInput

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-12 21:08:51 +00:00
Andrew Lamb 8c42fedf33
chore: Remove dead code (#7475)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-11 10:44:49 +00:00
Andrew Lamb 1a80b8073c
fix: Improve span names for query access (#7476)
* fix: Improve span names for query access

* fix: update test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-11 10:34:09 +00:00
Marco Neumann 5f43f2a719
refactor: remove old query planning code (#7449)
Closes #7406.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-06 16:05:08 +00:00
Marco Neumann 30b1878171
test: `ChunkTableProvider::scan` + fix "not dedup" (#7448)
1. Add loads of tests for `ChunkTableProvider::scan` (= the naive phys.
   plan before running any phys. optimizers)
2. Fix interaction of "no de-dup" and predicate pushdown. This might
   be used by the ingester at some point and I would like to have this
   correct before someone silently introduces a bug by pushing field
   predicates into the ingester.

This is mostly prep-work for #7406 so I know that test coverage is
sufficient.
2023-04-06 08:39:53 +00:00
Andrew Lamb e8b7d69b0f
chore: Update datafusion again (#7442)
* chore: Update datafusion

* chore: Fix up plans for datafusion API change

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-05 18:21:53 +00:00
Andrew Lamb 94d390f31e
test: Add additional tests for reorg plans (#7444)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-05 11:15:23 +00:00