Commit Graph

499 Commits (651b7a1ce64df19c730e1fd721a1dcf4ecf618cf)

Author SHA1 Message Date
Nga Tran 52d70b060a
test: retention test for querier inthe query_tests (#6220) 2022-11-23 17:04:14 +00:00
Andrew Lamb 9fb1de0428
chore: Update datafusion (2 of N) right before arrow 27 upgrade (#6207)
* chore: Update datafusion (2 of N) right before arrow 27 upgrade

* fix: Update tests for better unsigned pushdown

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-11-23 11:04:14 +00:00
Andrew Lamb 1a1ea74cb7
chore: Upgrade datafusion again (#6160)
* Revert "Revert "chore: Update datafusion again (#6108)""

This reverts commit 766b3bbeb440618cfe332f6ee7d4f8a8217acc48.

* fix: Respect the partition sort key

* chore: update plans

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-22 19:28:26 +00:00
Nga Tran dd1755b23a
feat: querier filters data outsude retnetion period (#6209) 2022-11-22 15:41:00 +00:00
dependabot[bot] a9db7581cd
chore(deps): Bump tokio from 1.21.2 to 1.22.0 (#6183)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.21.2 to 1.22.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.21.2...tokio-1.22.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-21 10:21:24 +00:00
Andrew Lamb 4630bbb956
feat: push down all predicates (#6042)
* feat: push down all predicates

* fix: fmt

* fix: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-18 16:22:01 +00:00
Carol (Nichols || Goulding) 02c3083192
fix: Remove table names from Dml operations 2022-11-18 10:40:38 -05:00
Nga Tran 49a9565240
feat: gRPC that creates namespace (#6103)
* feat: create namespace API call in router

Co-authored-by: Nga Tran <nga-tran@live.com>

* chore: treat retention as ns except in CLI

* fix: overflow in nanosecond calc

* fix: retention test after changing it from hours to ns

* chore: comment clarification in cli; better response type for error in ns API

* fix: correct some rebase mistakes

* chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const

* fix: ns autocreation test was wrong after rebase

* fix: mem catalog has default 1hr retention, accidently removed in rebase

* chore: remove mem catalogs default 1hr retention; make it settable in sets & router

Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-18 13:02:12 +00:00
Marco Neumann 71ffc92559
fix: only push safe select expression through de-dup (#6156)
* fix: only push safe select expression through de-dup

Fixes #6066.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* fix: rebase

* test: ensure we do not split ORs

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-11-18 09:56:11 +00:00
Andrew Lamb 67712b595c
Revert "chore: Update datafusion again (#6108)" (#6159)
This reverts commit fbe9f27f10.
2022-11-16 21:14:55 +00:00
Andrew Lamb fbe9f27f10
chore: Update datafusion again (#6108)
* chore: Update datafusion pin + api code

* chore: Run cargo hakari tasks

* refactor: combine_sort_key is more idomatic and add rationale comments

* refactor: satisfy borrow checker and updated comments

* fix: Add test case for combine_sort_key

* fix:  Apply suggestions from code review

Co-authored-by: Marco Neumann <marco@crepererum.net>

* fix: Add back test for deeply nested expression

* fix: Update output ordering

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-16 14:41:52 +00:00
Andrew Lamb 20f1ae1c8f
test: tests in the reorg planner and query tests for merging parquet files (#6137)
* test: tests in the reorg planner and query tests for merging parquet files

* fix: use 20 files

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-15 20:29:44 +00:00
Carol (Nichols || Goulding) 3943faf998
fix: Remove namespace from DmlWrite and DmlDelete constructors 2022-11-14 16:46:04 -05:00
Dom Dwyer 9e97866b48
refactor: internalise PartitionProvider
Removes the need to leak the PartitionProvider outside of the ingester
crate.

This will allow the PartitionProvider to utilise a
DeferredLoad<TableName> without having to make the DeferredLoad and
TableName pub.
2022-11-14 10:50:05 +01:00
Carol (Nichols || Goulding) 0657ad9600
fix: Rename QueryDatabase to QueryNamespace 2022-11-11 16:14:12 -05:00
Carol (Nichols || Goulding) fa46951524
fix: Remove needless deref done by auto deref, thanks Clippy! 2022-11-09 10:54:18 -05:00
Marco Neumann 1a5fc3d772
test: use `EXPLAIN ANALYZE` for SQL metric tests (#6084)
* test: use `EXPLAIN ANALYZE` for SQL metric tests

Needs a bit more infra (due to normalization), but this seems to be
worth it so we can easily hook up more metrics in the future.

* docs: explain regexes
2022-11-09 09:00:27 +00:00
Marco Neumann 903f7bafa7
refactor: expose `ParquetExec` directly to DataFusion phys. plan (#6072)
* refactor: expose `ParquetExec` directly to DataFusion phys. plan

Closes #5897.

* fix: update tracing tests

* refactor: use `EmptyExec`

* refactor: use `target_partitions`

* refactor: improve UUID normalization in query tests

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-11-08 12:19:28 +00:00
Andrew Lamb 034d9b371d
chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0` (#6061)
* chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0`

* fix: Update query_functions

* fix: update for TimestampNanosecondArray API changes

* fix: update for TimestampNanosecondArray API changes

* chore: Update flatbuffers and remove rustsec warning

* chore: Update text

* fix: update more test

* fix: Lock ahash to exactly 0.8.0

* fix: Update datafusion pin

* chore: Run cargo hakari tasks

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 11:01:58 +00:00
Marco Neumann f511db380c
refactor: remove table name from chunks (#6063)
It should be always clear from the context to which table a chunk
belongs.

I think having a table name bound to a chunk goes back to a time where
chunks had multiple tables.

Helps with #6049.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 10:42:57 +00:00
Carol (Nichols || Goulding) 09e9b69b85
Merge remote-tracking branch 'origin/main' into dom/dml-delete-namespace-id 2022-11-04 14:56:10 -04:00
Andrew Lamb 8c8e607dca
chore: Update datafusion pin (#6054)
* chore: Update datafusion pin

* chore: Run cargo hakari tasks

* chore: Update expected error

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-11-03 19:41:31 +00:00
Dom Dwyer 6fa48731aa feat: NamespaceId in DmlDelete
Changes the DmlDelete to contain the NamespaceId for which it should be
applied, propagating this value over the wire.

Like the existing IDs within the DmlWrite, these values are marked
unsafe to use due to avoid the consumers utilising them accidentally
during deployment. Unlike DmlWrite, the DmlDelete is completely unused,
so this is less of an issue.
2022-11-03 13:57:40 +01:00
Andrew Lamb 4fb2843d05
refactor: Rename `schema::selection::Selection` to `schema::projection::Projection` (#6037)
* chore: Rename `schema::selection::Selection` to `schema::projection::Projection`

* fix: docs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 18:15:04 +00:00
Andrew Lamb 58838e214e
feat: enable parquet predicate pushdown in IOx (#5930) 2022-11-02 18:00:47 +00:00
Dom Dwyer ddd6ab0ba4 refactor(write_buffer): pass IDs in wire format
This commit is part of a two-part change in order to add the table &
namespace IDs to the write buffer wire format. This commit forms the
first half; changing the producer to send the IDs.

In this commit the new ID values are never read on the consumer side,
ensuring there is no consumer dependency on them. This ensures they
remain operational during a rollout, where the consumer may be updated
to the latest code dependent on the IDs before the producer is updated
to send them. This also ensures we have a window of time where where the
consumers can be rolled back after being updated, and still handle
replaying messages in Kafka.
2022-11-02 13:28:56 +01:00
dependabot[bot] b1572c50a6
chore(deps): Bump once_cell from 1.15.0 to 1.16.0 (#6009)
Bumps [once_cell](https://github.com/matklad/once_cell) from 1.15.0 to 1.16.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.15.0...v1.16.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-31 16:23:40 +00:00
Dom Dwyer 72a358e52f refactor(dml): PartitionKey required for writes
Changes the DmlWrite type to require a PartitionKey be specified,
instead of accepting an Option.

This requirement was already in place - the write buffer upheld an
invariant that all writes contained a partition key value (was not
"None") or it panicked at runtime when attempting to enqueue the write.

It is now possible to encode this invariant in the type system, which is
what this change does.
2022-10-28 10:57:30 +02:00
Carol (Nichols || Goulding) 3145e2c05b
feat: Use workspace dep inheritance for the arrow crate 2022-10-26 10:34:29 -04:00
Carol (Nichols || Goulding) 44936f661a
feat: Use workspace dep inheritance for datafusion instead of shim crate 2022-10-26 10:33:56 -04:00
Andrew Lamb 474620f4a7
chore: Update datafusion and other dependencies (#5976)
* chore: Update datafusion and other dependencies

* chore: Update expected plan

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-26 14:14:13 +00:00
Marco Neumann 9b48437711
refactor: make influx column type mandatory (#5978)
We basically assume everywhere that a column falls into one of the three
known categories (time, tag, field), so lets encode this in our type
system instead of defining "unknown" as "undefined behavior, may or may
not crash".

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-26 11:20:29 +00:00
Carol (Nichols || Goulding) 2e83e04eab
feat: Use workspace package metadata to reduce differences and repetition 2022-10-24 13:04:09 -04:00
Marco Neumann 3e4db81bc6 refactor: make `SchemaBuilder::field` fallible
It would be nice if the IOx data type would not be optional and this is
a prep clean-up to achieve that.
2022-10-24 18:12:42 +02:00
Marco Neumann 1d440ddb2d
refactor: `IOxReadFilterNode` can always accumulate statistics (#5954)
* refactor: `IOxReadFilterNode` can always accumulate statistics

`IOxReadFilterNode` used to not emit statistics if one chunk has
duplicates or delete predicates. This is wrong (or at least overly
conservative), because the node itself (or the chunks themselves) do NOT
perform dedup or delete predicate filtering. Instead this is done is
done by parent nodes (`DeduplicateExec` and `FilterExec`) and its their
job to propagate statistics correctly.

Helps w/ #5897.

* test: explain setup

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-10-24 13:34:22 +00:00
Marco Neumann e0062f2d40
refactor: do NOT use fake DF context for parquet reading (#5942)
Use the proper top-level DataFusion context and register the object
store there.

Note that we still hide the `ParquetExec` behind an opaque record batch
stream. Fixing that is next on my list.

Helps with #5897.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-24 08:20:26 +00:00
kodiakhq[bot] 9b67db3c06
Merge branch 'main' into cn/ingester-tracing 2022-10-21 13:13:13 +00:00
Andrew Lamb 7781ed0455
chore: Update datafusion (#5928)
* chore: Upgrade datafusion

* chore: Update for new API

* chore: Update expected output

* fix: Update comment

* fix: compilation

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-20 14:37:49 +00:00
Carol (Nichols || Goulding) 444eaec319
fix: Propagate Span correctly in MockIngester 2022-10-20 09:18:09 -04:00
Carol (Nichols || Goulding) 59e1c1d5b9
feat: Pass trace id through Flight requests from querier to ingester
Fixes #5723.
2022-10-20 08:55:30 -04:00
Carol (Nichols || Goulding) 5c15c93fc2
fix: Clean up query test dependencies
Alphabetize dependencies; remove dev deps that are already regular deps
2022-10-20 08:55:29 -04:00
Andrew Lamb 82d6fc3bda
feat: support queries via influxrpc with periods in field names (#5919)
* feat: support queries via influxrpc with periods in field names

* fix: update comments

* fix: more tests

* fix: more tests
2022-10-19 20:09:55 +00:00
Marco Neumann e1b50227f8
refactor: avoid some clones while caching ns schema (#5896)
Found while reviewing the code.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-19 06:28:15 +00:00
Andrew Lamb d706f8221d
chore: Update datafusion and arrow / parquet / arrow-flight 25.0.0 (#5900)
* chore: Update datafusion and  `arrow` / `parquet` / `arrow-flight` 25.0.0

* chore: Update for structure changes

* chore: Update for new projection pushdown

* chore: Run cargo hakari tasks

* fix: fmt

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-18 20:58:47 +00:00
Marco Neumann 819dbe9e0c
refactor: remove querier chunk load settings (#5888)
We no longer use dual-state ReadBuffer/Parquet chunks.
2022-10-18 10:22:46 +00:00
Andrew Lamb 6f931411f3
feat: read from parquet and only parquet (#5879)
* feat: query only from parquet

* Revert "feat: query only from parquet"

This reverts commit 5ce3c3449c0b9c90154c8c6ece4a40a9c083b7ba.

* Revert "revert: disable read buffer usage in querier (#5579) (#5603)"

This reverts commit df5ef875b4.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-18 10:09:48 +00:00
Luke Bond 475c8a0704
fix: only emit ttbr metric for applied ops (#5854)
* fix: only emit ttbr metric for applied ops

* fix: move DmlApplyAction to s/w accessible

* chore: test for skipped ingest; comments and log improvements

* fix: fixed ingester test re skipping write

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-14 12:06:49 +00:00
Andrew Lamb 9134ccd6c3
chore: Update datafusion again (#5855)
* chore: Update datafusion

* chore: Updates for changes in datafusion

* chore: more updates

* fix: update doc example

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-13 19:18:57 +00:00
Andrew Lamb d57c99638c
chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0 (#5792)
* chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0

* fix: Update for coercion, fix explain plans for change in column name display

* chore: Update datafusion lock

* fix: Update for other API changes

* chore: Update to latest datafusion pin

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-12 16:19:14 +00:00
Dom Dwyer b294bb98aa refactor: move query types to query_handler
Moves types that are only used for handling queries to the query_handler
module.
2022-10-11 17:58:55 +02:00