Commit Graph

384 Commits (b555ddf18b19cb57ddf2f71596cd3409354caf2e)

Author SHA1 Message Date
Michael Gattozzi ff567cd33f
chore(deps): Update arrow and datafusion to 49.0.0 (#24605)
* chore(deps): Update arrow and datafusion to 49.0.0

This commit copies in our dependency code from influxdb_iox in order for
us to be able to upgrade from a forked version of 46.0.0 to 49.0.0 of
both arrow and datafusion. Most of the important changes were around how
we consumed the crates in influxdb3(_server/_write). Those diffs are
particularly worth looking at as the rest was a straight copy and we
don't touch those crates in our development currently for influxdb3
edge.

* fix: regenerate workspace hack crate

* fix: Protobuf issues with incompatibility labels

* fix: Broken CI yaml

* fix: buf version

* fix: Only check IOx repo

* fix: Remove protobuf lint

* fix: Comment out call to protobuf-lint
2024-01-31 19:18:51 -05:00
Michael Gattozzi 8ee13bca48
fix: Failing CI on main (#24562)
* fix: build, upgrade rustc, and deps

This commit upgrades Rust to 1.75.0, the latest release. We also
upgraded our dependencies to stay up to date and to clear out any
uneeded deps from the lockfile. In order to make sure everything works
this also fixes the build by upgrading the workspace-hack crate using
cargo hikari and removing the `workspace.lint` that was in
influxdb3_write that didn't need to be there, probably from a merge
issue.

With this we can build influxdb3 as our default on main, but this alone
is not enough to fix CI and will be addressed in future commits.

* fix: warnings for influxdb3 build

This commit fixes the warnings emitted by `cargo build` when compiling
influxdb3. Mainly it adds needed lifetimes and removes uneccesary
imports and functions calls.

* fix: all of the clippy lints

This for the most part just applies suggested fixes by clippy with a few
exceptions:

- Generated type crates had additional allows added since we can't
  control what code gets made
- Things that couldn't be automatically fixed were done so manually in
  particular adding a Send bound for traits that created a Future that
  should be Send

We also had to fix a build issue by adding a feature for tokio-compat
due to the upgrade of deps. The workspace crate was updated accordingly.

* fix: failing test due to rust panic message change

Inbetween rustc 1.72 and rustc 1.75 the way that error messages were
displayed when panicing changed. One of our tests depended on the output
of that behavior and this commit updates the error message to the new
form so that tests will pass.

* fix: broken cargo doc link

* fix: cargo formatting run

* fix: add workspace-hack to influxdb3 crates

This was the last change needed to make sure that the workspace-hack
crate CI lint would pass.

* fix: remove tests that can not run anymore

We removed iox code from this code base and as a result some tests
cannot be run anymore and so this commit removes them from the code base
so that we can get a green build.
2024-01-09 15:11:35 -05:00
Marco Neumann 7b4dbb570d
refactor: clean up query log impl (#8775)
- take span ctx directly instead of the execution context (see point
  below)
- use the original trace ID (i.e. the one that we get via HTTP header),
  NOT some internal span/trace because the latter is only available for
  sampled requests, while the former one is generally more available (we
  also do that for the stdout logs btw.)
- minor code clean ups

This is prep work for #8774.
2023-09-20 09:20:19 +00:00
kodiakhq[bot] 809e0f4a42
Merge branch 'main' into crepererum/issue8350b 2023-09-20 08:21:04 +00:00
Andrew Lamb 65d0ea2055
chore: Update DataFusion (#8765)
* chore: Update DataFusion pin again

* chore: update for different type

* fix: statistics

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-19 22:26:53 +00:00
Marco Neumann 74b1a5e368 refactor: allow streaming record batches into query
For #8350, we won't have all the record batches from the ingester during
planning but we'll stream them during the execution. Technically the
DF plan is already based on streams, it's just `QueryChunkData` that
required a materialized `Vec<RecordBatch>`. This change moves the stream
creation up so a chunk can decide to either use `QueryChunkData::in_mem`
(which conveniently creates the stream) or it can provide its own
stream.
2023-09-19 13:53:37 +02:00
Marco Neumann ca791386eb
refactor: clean up chunk pruning metrics/observers (#8766)
There where like 3 layers (metrics, observer, pruner) that all only had
a single implementation. IIRC this is a leftover from older code where
`iox_query` was more involved in query pruning. With #8705 however the
chunk pruning is pushed even closer to the source (i.e. the querier
code) and it is just more practical to perform the metric management
directly in the querier code (this was the case already, it was just
somewhat hidden by the interfaces). This also allows us to add metrics
for #8705 more easily.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-19 10:53:14 +00:00
Andrew Lamb 58d892fcdf
chore: Update DataFusion pin (#8749)
* chore: Update DataFusion pin and `chrono`

* chore: Update for deprecation

* chore: Update plans

* fix: fix update logic in percentile

* chore: update to avoid deprecated from_exprs api

* fix: Update arrow pin, fix plan errors

* test: for describe

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-18 18:11:23 +00:00
dependabot[bot] 1760fe7736
chore(deps): Bump chrono from 0.4.30 to 0.4.31 (#8752)
* chore(deps): Bump chrono from 0.4.30 to 0.4.31

Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.30 to 0.4.31.
- [Release notes](https://github.com/chronotope/chrono/releases)
- [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md)
- [Commits](https://github.com/chronotope/chrono/compare/v0.4.30...v0.4.31)

---
updated-dependencies:
- dependency-name: chrono
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix: chrono ts -> nanos can fail, fix deprecation warning

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-18 12:57:48 +00:00
Martin Hilton 421b78e48b
feat(iox_query): support timezone in gap-filling (#8745)
When gap-filling make the output time array have the same timezone
as the imput time array.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-15 14:55:16 +00:00
Marco Neumann b5c0c9c167
feat: allow fallback to generic TS column range for chunk stats (#8724)
This will be useful for #8705.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-14 08:37:50 +00:00
Andrew Lamb ed2da2a831
Revert "chore: Update DataFusion pin (#8698)" (#8714)
This reverts commit 74c0851fc2.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-11 17:19:04 +00:00
Andrew Lamb 74c0851fc2
chore: Update DataFusion pin (#8698)
* chore: Update DataFusion pin

* chore: Update for new API

* fix: fix test

* fix: only check error messages

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-11 13:54:24 +00:00
Andrew Lamb 45c6bfea9c
chore: Update datafusion, arrow/flight/parquet to `46.0.0` , object_store to `0.7.0` (#8577)
* chore: Update DataFusion pin

* chore: Update for new API

* fix: Update for API

* fix: update compactor test

* fix: Update to patched version of arrow 46.0.0

* fix: map  `DataFusionError::Configuration` to an internal error

* fix: do not use deprecated API

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-08 12:49:57 +00:00
Martin Hilton 6056571e74
fix(influxql): FILL(linear) for selectors (#8396)
* fix(influxql): FILL(linear) for selectors

Ensure that selector functions such as FIRST, LAST, MIN and MAX can
use LINEAR filling in the same way as influxdb 1.8.

* chore: review suggestions

Apply suggestions from the review. This adds more tests and support
for interpolation in SQL.

* fix: lint

* fix: lint

* chore: buffered input for struct arrays

Ensure that for linear interpolation the buffered input of a struct
field ensures that buffering only stops when there is a non-null
struct containing a non-null value.

* fix: integration test

* fix(iox_query): make clippy happy

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-06 09:44:28 +00:00
Marco Neumann d0d355ba4d
refactor: unpack record batches later during query (#8663)
For #8350 we want to be able to stream record batches from the ingester
instead of waiting to buffer them fully before the query starts. Hence
we can no longer inspect the batches in the "display" implementation of
the plan.

This change mostly contains the display change, not the actual streaming
part. I'll do that in a follow-up.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-06 08:08:54 +00:00
Carol (Nichols || Goulding) 12b8095c46
feat: Upgrade to Rust 1.72.0 (#8589)
* feat: Upgrade to Rust 1.72.0

* fix: Allow a warning about an error we're intentionally creating

This is a test for an error. This lint warns that this code will cause
an error. Thanks lint, that's what we wanted!

* chore: rustfmt 1.72

* fix: Remove unnecessary hashes in raw string literals

Thanks Clippy!
https://rust-lang.github.io/rust-clippy/master/index.html#/needless_raw_string_hashes

Note that there are a number of false negatives with this lint; see
https://github.com/rust-lang/rust-clippy/issues/11420

* fix: Remove unnecessary explicit iteration

Looks like clippy::explicit_iter_loop was improved.
https://rust-lang.github.io/rust-clippy/master/index.html#/explicit_iter_loop

* fix: Allow clippy::manual_try_fold in a few places

Some of these might not be possible to rewrite with try_fold, or at
least not trivially. I don't feel confident enough to change these, in
any case. I think the lint is good to have on for future code though, so
that new code can be written with try_fold.

* fix: Remove useless creation of vectors when an array will do

Mostly in tests. Also fix some long lines.

Thanks Clippy!
https://rust-lang.github.io/rust-clippy/master/index.html#/useless_vec

* fix: Allow a single range in a vec init, which is actually what we want

Looks like Clippy's trying to catch a common mistake here, but for realz
we actually want `Vec<Range<usize>>` not `Vec<usize>`

https://rust-lang.github.io/rust-clippy/master/index.html#/single_range_in_vec_init

* fix: Remove a useless conversion

This looks like removing explicit iteration, but it's actually caught by
useless_conversion.

https://rust-lang.github.io/rust-clippy/master/index.html#/useless_conversion

* fix: Remove redundant pattern matching

Thanks Clippy!
https://rust-lang.github.io/rust-clippy/master/index.html#/redundant_pat

* fix: Allow an unwrap on a literal None in a test

This matches with the other tests better, and also when I tried to
remove the `unwrap_or_default` it changed the JSON sent from something
with an empty value to `null`, so I think the `or_default` part is
actually changing from one `None` to another `None`.

https://rust-lang.github.io/rust-clippy/master/index.html#/unnecessary_literal_unwrap
2023-08-29 05:57:38 +00:00
Andrew Lamb e4505912a1
chore: Update DataFusion pin (#8544)
* chore: Update DataFusion pin

* refactor: Use upstream check

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-24 18:31:33 +00:00
Andrew Lamb 967aef0e9d
chore: Update datafusion (#8515)
* chore: Update datafusion

* fix: update for API

* fix: Verify unsupported statements, with tests

* fix: update tests

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-21 17:49:21 +00:00
Andrew Lamb 0a0ef66a05
chore: Make it clear there is only a single DataFusion memory pool (#8501)
* chore: Make it clear there is only a single DataFusion memory pool

* fix: assert there is a single bridge

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-17 12:35:31 +00:00
Marco Neumann 535ff5f0c8
refactor: extract InfluxRPC-specific code to `iox_query_influxrpc` (part 1) (#8508)
* refactor: replace test usage of `Predicate`

* refactor: remove dead code

* refactor: decouple recorg planning from InfluxRPC planning

* refactor: move InfluxRPC-specific scan plan construction

* refactor: move InfluxRPC-specific "missing columns" handling to `iox_query_influxrpc`

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-17 11:59:04 +00:00
Marco Neumann 3612b1c482
refactor: use DF `Expr` instead of `Predicate` for chunk pruning (#8500)
`Predicate` is InfluxRPC specific and contains way more than just
filter expression.

Ref #8097.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-17 08:18:45 +00:00
dependabot[bot] 7094189004
chore(deps): Bump tokio from 1.31.0 to 1.32.0 (#8507)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.31.0 to 1.32.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.31.0...tokio-1.32.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-17 08:06:29 +00:00
Marco Neumann 39a08fab69
feat: expose DataFusion mem pool metrics (#8492)
Closes #8466.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-15 15:19:11 +00:00
Marco Neumann ad4068bbea
refactor: decouple `QueryNamespace` from synchronous schema interface (`QueryNamespaceMeta`) (#8472)
* refactor: remove unused impl

* refactor: inline `ExecutionContextProvider` into `QueryNamespace`

* refactor: use global `DEFAULT_SCHEMA`

* refactor: decouple `QueryNamespace` from `QueryNamespaceMeta`

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-14 08:09:08 +00:00
dependabot[bot] 34b8585931
chore(deps): Bump tokio from 1.30.0 to 1.31.0 (#8482)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.30.0 to 1.31.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.30.0...tokio-1.31.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-14 06:32:34 +00:00
Marco Neumann 9358ec74db
refactor: remove `Predicate` usage from `QueryNamespace` (#8468)
For #8097.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-10 16:32:55 +00:00
Andrew Lamb 232eee059f
chore: Update DataFusion (#8460)
* chore: Update DataFusion

* chore: update for API changes

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-10 14:54:52 +00:00
Marco Neumann 71e6b66476
refactor: replace `Predicate` w/ `&[Expr]` in querier internals (#8465)
* refactor: replace `Predicate` w/ `&[Expr]` in querier internals

First step towards #8097. This replaces most internal usages of
`Predicate` with the more appropriate `&[Expr]` within the querier code.

This is also triggered by #8443 because the new ingester protocol shall
not use `Predicate` anymore.

Note that the querier still uses `Predicate` for a few interfaces. These
will be fixed later:

- the current ingester RPC version
- chunk pruning
- `QuerierNamespace::chunks`

* fix: docs
2023-08-10 13:00:43 +00:00
dependabot[bot] 3675043585
chore(deps): Bump tokio from 1.29.1 to 1.30.0 (#8464)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.29.1 to 1.30.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.29.1...tokio-1.30.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-08-10 07:50:18 +00:00
Christopher M. Wolff 3d972561d5
feat: push sort through union (#8423)
* feat: push sort through union

* test: add a few more tests based on review feedback

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-08-08 15:00:34 +00:00
Nga Tran a79cddf942
chore: remove dead code compute_sort_key_for_chunks (#8399) 2023-08-02 20:45:38 +00:00
Carol (Nichols || Goulding) 92ae8e4084
refactor: Extract a convenience constructor for Deterministic transition ids 2023-08-02 10:17:23 -04:00
Carol (Nichols || Goulding) fd147f871b
fix: Have QueryChunk return a TransitionPartitionId
Thus using the PartitionHashId if one is available. This does not
compile yet because of all the uses of QueryChunk, but iox_query
compiles and passes its tests.
2023-08-02 10:17:22 -04:00
Carol (Nichols || Goulding) e4b9455344
feat: Have QueryChunk return a reference from partition_id() 2023-08-02 10:17:22 -04:00
Andrew Lamb de79619e71
chore: Update datafusion (#8355)
* chore: Update datafusion pin

* fix: Update for change in API

* chore: Update plan

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-31 15:41:00 +00:00
Joe-Blount 1bed99567c
chore: add DF metrics to compaction spans (#8270)
* chore: add DF metrics to compaction spans

* chore: update string for test verification

* chore: update comment

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-20 15:00:22 +00:00
Marco Neumann 0173c50ba1
fix: use correct error code when querier is shutting down (#8282)
When a long running query is in process and the querier is shutting
down, it might happen that the executor (= thread pool and tokio
executor responsible for the CPU-bound DataFusion execution) is shut
down while the query is running. From a "systems interaction" PoV I
think this is totally fine and I would like to avoid some weird
ref-counting. Or in other words: if the system is shutting down, shut it
down.

However the error was treated as "internal" which is not useful. The
client should rather be informed that its server was gone and that it is
OK (and desired) to retry. So as per
<https://grpc.github.io/grpc/core/md_doc_statuscodes.html> I think this
should signal "unavailable".

This change wires the error code in such a way that the gRPC service
layer can properly inspect it and then changes the error mapping.

Ref https://github.com/influxdata/idpe/issues/17917 .

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-20 12:08:22 +00:00
Christopher M. Wolff 33e41fc5cb
fix: improve error for malformed gap fill query (#8252)
* fix: improve error for malformed gap fill query

* fix: code review feedback
2023-07-17 21:20:34 +00:00
Christopher M. Wolff b916a89159
fix: recurse through SubqueryAlias when finding gap fill time range (#8249) 2023-07-17 19:39:30 +00:00
Carol (Nichols || Goulding) a9b788b58f
feat: Collate chunks based on their partition hash id if they have it 2023-07-17 10:34:01 -04:00
Carol (Nichols || Goulding) 313baca8b6
fix: Use sort_by rather than sort_by_key to use references
These places are sorting by `PartitionId` currently, which implements
`Copy`, but are about to be changed to be sorted on `PartitionHashId`,
which does not implement `Copy`.
2023-07-17 09:56:55 -04:00
Carol (Nichols || Goulding) 10a0f8e3bf
fix: Remove ::default() when constructing unit structs
As recommended by https://rust-lang.github.io/rust-clippy/master/index.html#default_constructed_unit_structs
2023-07-14 10:50:55 -04:00
Dom Dwyer 7f7d1f2ee7
fix(ingester): projection without time column
The ingester can project arbitrary columns at query time, and has no
special requirement that the "time" column be part of that projection.

Because the timestamp summary generation explicitly requires the time
column to exist, it panics when there's no "time" column in the
projection - this is a bit of a modelling mismatch more than anything.
2023-07-13 14:22:48 +02:00
Andrew Lamb b24f9c81ba
chore: Update DataFusion pin, updates for API changed (#8199) 2023-07-11 13:36:38 +00:00
Andrew Lamb 3ce11d8d66
chore: Update DataFusion (#8190)
* chore: Update DataFusion

* chore: Run cargo hakari tasks

* fix: Update for API changes

* fix: use display format

* chore: Update explain plan output

* fix: update plans

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-10 09:54:50 +00:00
Marco Neumann 0bcf85d48c
refactor: de-dup code 2023-07-03 17:24:59 +02:00
Carol (Nichols || Goulding) 8ebf390d9c
feat: Try to prune ingester partitions by partition key
This is hacktastic.
2023-07-03 17:24:58 +02:00
Carol (Nichols || Goulding) b76fdab1a4
refactor: Move querier::df_stats to iox_query::chunk_statistics so it can be shared with ingester 2023-07-03 17:24:55 +02:00
Marco Neumann ce6a2fb613
refactor: remove `QueryChunk::column_values` (#8111)
Similar to #8109.

This was once implemented by the RUB but as it stands right now, no
chunk implements this anymore.

If we ever want to bring this back, we should use the output of
`QueryChunk::data` instead (i.e. use a data-based implementation instead
of a per-chunk one).

Closes #8096.
2023-07-03 09:03:21 +00:00