Commit Graph

549 Commits (506cdebf384259c166740d910abcafe864d4ffa0)

Author SHA1 Message Date
Carol (Nichols || Goulding) 9043966443
docs: Fix some typos in comments as I noticed them 2022-03-31 16:34:47 -04:00
Andrew Lamb 22b24bdab3
chore: Update datafusion again (#4148)
* chore: update datafusoon

* refactor: Update for DataFusion API changes

* chore: TEMP TEMP change df to local copy

* chore: Update to datafusion again

* fix: Update Cargo.lock

* fix: logical conflict
2022-03-30 16:51:48 +00:00
Marco Neumann 20bbb88dc5
refactor: remove table name from `TableSummary` (#4170)
This allows us to remove the table name from the low-level chunk
representations (like `ParquetFile`, RUB, ...) since table names are
already tracked by the higher-level data structures (e.g. catalog,
catalog chunk) that manage the low-level chunk representations.

This is similar to #4167.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 13:24:00 +00:00
Marco Neumann 2b76c31157
refactor: make statistics null counts optional (#4160)
Min/max values and distinct counts are already optional, so let's make
the null counts optional as well. This will be helpful for NG to deal w/
partial statistics (e.g. we only populate stats for the time column).

Note that the total count is still mandatory, but we normally have the
chunk/file-level row count at hand.
2022-03-29 17:47:57 +00:00
dependabot[bot] 17af5fcbd1
chore(deps): Bump tokio-util from 0.7.0 to 0.7.1 (#4154)
* chore(deps): Bump tokio-util from 0.7.0 to 0.7.1

Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.0 to 0.7.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.0...tokio-util-0.7.1)

---
updated-dependencies:
- dependency-name: tokio-util
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-29 08:39:02 +00:00
Andrew Lamb 5c69a3f43b
chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 (#4119)
* chore: update datafusion

* chore(deps): Bump arrow from 10.0.0 to 11.0.0

Bumps [arrow](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0)

---
updated-dependencies:
- dependency-name: arrow
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0

Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0)

---
updated-dependencies:
- dependency-name: arrow-flight
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: update parquet to 11.0.0

* fix: error on create schema, test for same

* fix: upgrade zstd

* chore: Run cargo hakari tasks

* fix: fix logical merge conflict

* fix: hakari

* fix: hakari

* fix: update newly introduced dep

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 15:27:36 +00:00
Andrew Lamb b83b000590
chore: Update datafusion (#4071)
* chore: update to datafusion 5936edc2a94d5fb20702a41eab2b80695961b9dc

* chore: Update apis to match datafusion changes
2022-03-22 13:17:41 +00:00
Marco Neumann c9908b260c
refactor: dyn-dispatch database in query subsystem (#4083)
* refactor: dyn-dispatch database in query subsystem

This is similar to #4080 but concerns the database itself.

For #3934.

* docs: improve wording

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-22 09:15:52 +00:00
Marco Neumann d1df95df87 refactor: dyn-dispatch chunks in query subsystem
- this is what DataFusion is doing as well; it's also fast enough
  because the number of chunks in a query is not THAT massive (it's not
  like we are doing row-level dyn dispatching)
- it simplifies abstracting over different databases
- it allows us to drop our enum-based dispatching that we have for
  `DbChunk` and that we would also need for the querier (e.g. depending
  on if a chunk is backed by a parquet file or ingester data)
- it likely speeds up compile times because the `query` is no longer
  contains massive amounts of generic code

For #3934.
2022-03-21 12:47:54 +01:00
Marco Neumann ca152e7934 refactor: avoid generics in `QueryDatabase`
A step to make this trait object-safe.

Ref #3934.
2022-03-21 10:45:05 +01:00
Marco Neumann 0071b85c22 refactor: make `ExecutionContextProvider` object-safe
Ref #3934.
2022-03-21 10:40:53 +01:00
Marco Neumann 169fa2fb2f refactor: make `QueryChunk` object-safe
This makes it way easier to dyn-type database implementations. The only
real change is that we make `QueryChunk::Error` opaque. Nobody is going
to inspect that anyways, it's just printed to the user.

This is a follow-up of #4053.

Ref #3934.
2022-03-18 11:40:31 +01:00
Marco Neumann 0850a93f20
refactor: make `QueryDatabase::chunks` async (#4047)
For OG we can determine the chunks w/o any IO, for NG however this might
require a few catalog queries.

This is likely not the last change of this sort, i.e. the whole schema
handling is currently sync as well.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-17 12:55:25 +00:00
Nga Tran 5a29d070ea
feat: Implement the compact function for NG Compactor (#4001)
* feat: initial implementation of compact a given list of overlapped parquet files

* feat: Add QueryableParquetChunk and some refactoring

* feat:  build queryable parquet chunks for parquet files with tombstones

* feat: second half the implementation for Compactor's compact. Tests will be next

* fix: comments for trait funnctions fof QueryChunkMeta

* test: add tests for compactor's compact function

* fix: typos

* refactor: address Jake's review comments

* refactor: address Andrew's comments and add one more test for files in different order in the vector

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-11 20:25:19 +00:00
Andrew Lamb 2c3d30ca32
chore: Update datafusion, arrow, flight and parquet (#4000)
* chore: Update datafusion, arrow, flight and parquet

* fix: api change

* fix: fmt

* fix: update test metadata size

* fix: Update sizes in parquet test

* fix: more metadata size update
2022-03-10 12:24:47 +00:00
Marco Neumann 77f6153f72
refactor: remove `QueryDatabase::chunk_summaries` (#3977)
- This is not used by the query engine at all.
- The query engine should not care about ALL chunks but only about the
  chunks it gets via `QueryDatabase::chunks` (which includes a table
  name and a predicate).
- All other users of that API are NOT really query-related.
2022-03-08 11:34:26 +00:00
Marco Neumann 5cc1c697fc
refactor: remove `QueryDatabase::partition_addr` (#3976)
- This was not actually used by the query engine.
- The query engine doesn't have a concept of a "partition", it only
  cares about chunks.
- Unbound access to all partitions in the database is quite expensive
  (esp. on NG).
2022-03-08 11:17:31 +00:00
Raphael Taylor-Davies 80fb75d90b
feat: add a flag to enable per-partition tracing (#3928)
* feat: add a flag to enable per-partition tracing

* chore: rename constant

* feat: use BooleanFlag and cache result
2022-03-07 13:49:23 +00:00
Raphael Taylor-Davies 7b28fb4366
feat: improve trace naming (#3931)
* feat: improve trace naming

* test: test span description

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-07 11:49:19 +00:00
Andrew Lamb 9d8bceccbf
test: Add test to verify deduplicating is working (#3937)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-04 20:05:17 +00:00
Andrew Lamb e09f39d6a0
chore: Update datafusion (#3943)
* chore: Update datafusion

* refactor: update for new datafusion

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-03-04 19:37:46 +00:00
Raphael Taylor-Davies e304613546
feat: include trace ID in query log (#3912) (#3923)
* feat: include trace ID in query log (#3912)

* chore: fmt

* chore: lint
2022-03-03 17:50:49 +00:00
Edd Robinson de7c46c9bb feat: add read_window_aggregate tracing 2022-03-03 14:30:27 +00:00
Edd Robinson ea32bc366a feat: add read_group tracing 2022-03-03 14:27:01 +00:00
Edd Robinson 32baaa1ee7 feat: add tracing to field_columns 2022-03-03 14:27:01 +00:00
Edd Robinson 787a848bf5 feat: add tracing for tag_values 2022-03-03 14:27:01 +00:00
Edd Robinson 6a6fbf73ae feat: add tracing support tag_keys 2022-03-03 14:27:01 +00:00
Edd Robinson 998e205c2c feat: trace table_names 2022-03-03 14:27:01 +00:00
Edd Robinson 301ae886ce feat: add tracing down to the chunk level (#3804)
* refactor: wire exectution context to Deduplicator

* feat: example trace to chunk read_filter

* refactor: make execution context required

* refactor: expose metadata API

* refactor: more span context for chunk read_filter

* refactor: fix build

* refactor: push context into result stream

* refactor: make executor optional
2022-03-03 14:27:00 +00:00
Edd Robinson f2126c76d3
refactor: address PR feedback (#3906)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-03 11:04:05 +00:00
Edd Robinson 3d047073b9
feat: add tracing down to the chunk level (#3804)
* refactor: wire exectution context to Deduplicator

* feat: example trace to chunk read_filter

* refactor: make execution context required

* refactor: expose metadata API

* refactor: more span context for chunk read_filter

* refactor: fix build

* refactor: push context into result stream

* refactor: make executor optional
2022-03-02 19:08:22 +00:00
Andrew Lamb 286d5f7b2b
feat: add `success` column to system.queries (#3891)
* feat: add `success` column to system.queries

* refactor: Remove lifetime from QueryCompletedToken and thread through flight

* test: update test to make incomplete query clearer

* refactor: use better patter to set complete

* fix: logical merge conflict
2022-03-02 15:05:06 +00:00
Marco Neumann ace4af1b66
feat: `DedicatedExecutor` async `join` and job `detach`. (#3835)
* feat: detach dedicated exec jobs

* feat: async `DedicatedExecutor::join`

Now `DedicatedExecutor` follows the system we use for other server
components:

- `shutdown`:  a quick sync call that signals the shutdown but doesn't
  drop
- `join`: async awaits until the executor has finished shutdown
- `drop`: warn but still try to shut down

* test: irmpove `detach_receiver` test

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-03-01 15:25:31 +00:00
Marco Neumann 33851be3a5
chore: upgrade Rust to 1.59 (#3875)
Mostly a few new clippy crates around `flat_map`, `and_then`, and
"underscore locks" (!!!):
https://rust-lang.github.io/rust-clippy/master/index.html#let_underscore_lock

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-28 15:14:19 +00:00
Raphael Taylor-Davies 2a842fbb1a
feat: correctly sort data and store in catalog metadata (#3864)
* feat: respect sort order in ChunkTableProvider (#3214)

feat: persist sort order in catalog (#3845)

refactor: owned SortKey (#3845)

* fix: size tests

* refactor: immutable SortKey

* test: test sort order restart (#3845)

* chore: explicit None for sort key

* chore: test cleanup

* fix: handling of sort keys containing fields

* chore: remove unused selected_sort_key

* chore: more docs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-25 17:56:27 +00:00
Nga Tran 8edc462c37
fix: while executing deduplication, do not return empty record batches as a result of deduplication (#3861)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-25 15:00:13 +00:00
Raphael Taylor-Davies a32b952104
fix: chunk overlap missing columns (#3844)
* fix: chunk overlap missing columns

* chore: more tests

Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-25 09:46:03 +00:00
dependabot[bot] 3b7d31c88a
chore(deps): Bump arrow from 9.0.2 to 9.1.0 (#3826)
Bumps [arrow](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0)

---
updated-dependencies:
- dependency-name: arrow
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-23 09:25:46 +00:00
Marco Neumann 459974d1b8
refactor: move `DedicatedExecutor` from `query` to `executor` (#3819)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-23 09:10:43 +00:00
dependabot[bot] ad3868ed7c
chore(deps): Bump tokio from 1.16.1 to 1.17.0 (#3814)
* chore(deps): Bump tokio from 1.16.1 to 1.17.0

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.16.1 to 1.17.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build: update workspace-hack

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 16:27:43 +00:00
Andrew Lamb 9588b43a90
fix: Make errors in rewriting return `Error` rather than a `panic` (#3767)
* test: add test for predicate errors

* fix: Return errors properly rather than panic

* fix: handle errors in influxrpc planner

* fix: appease clippy

* fix: tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-17 15:39:14 +00:00
Nga Tran 0b3f76462d
feat: build Query Plan that queries QueryableBatch with filters (#3742)
* feat: initial implementaion the Query Plan that query QueryableBatch with filters

* fix: read_filter of QueryableBatch should provide the shema of the columns/projection it needs

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* chore: address review comment

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-02-15 16:06:26 +00:00
Andrew Lamb a30803e692
chore: Update datafusion, update `arrow`/`parquet`/`arrow-flight` to 9.0 (#3733)
* chore: Update datafusion

* chore: Update arrow

* fix: missing updates

* chore: Update cargo.lock

* fix: update for smaller parquet size

* fix: update test for smaller parquet files

* test: ensure parquet_file tests write multiple row groups

* fix: update callsite

* fix: Update for tests

* fix: harkari

* fix: use IoxObjectStore::existing

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-15 12:10:24 +00:00
dependabot[bot] 89105ccfab
chore(deps): Bump tokio-util from 0.6.9 to 0.7.0 (#3743)
Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.6.9 to 0.7.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/commits)

---
updated-dependencies:
- dependency-name: tokio-util
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-15 11:33:41 +00:00
Nga Tran d1c71ba5d8
feat: predicate pushdown for Ingester's QueryableBatch (#3728)
* feat: predicate pushdown for Ingester's QueryableBatch

* chore: comment cleanup

* chore: Apply suggestions from code review

Co-authored-by: Edd Robinson <me@edd.io>

* refactor: address review comments

Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-14 17:28:52 +00:00
Andrew Lamb d9f331ba2a
chore: update datafusion, stop repartitioning so aggressively (#3633)
* chore: update datafusion

* fix: Update to use new datafusion api

* chore: update expected plans

* fix: support zero output partitions

* fix: update test

* fix: Update for new DataFusion API

* fix: newly added system table

* fix: update cargo lock
2022-02-09 19:53:41 +00:00
Raphael Taylor-Davies c18ad4ac97
feat: special case max timestamp range for table_names and field_columns (#3642) 2022-02-08 16:09:36 +00:00
Raphael Taylor-Davies be662ec731
feat: lazy query log! (#3654)
* feat: lazy query log

* chore: fmt

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 13:07:28 +00:00
Andrew Lamb e6ec8ef5f3
test: tests to show predicate simplification on chunks (#3649)
* test: tests to show predicate simplification on chunks

* fix: clippy

* refactor: less Box

* refactor: make typealias + add comments, hopefully to improve clarity

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 15:04:20 +00:00
Carol (Nichols || Goulding) 2e30483f1f
refactor: Remove predicate module from predicate crate (#3648)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 14:54:07 +00:00