Commit Graph

591 Commits (f23d5690546c73f6bdd29e2bc20a7b2cf32c3b5c)

Author SHA1 Message Date
Jake Goulding e07bcd40c2 refactor: Remove unused dependencies
These were found by iterating over all of the dependencies of each
Cargo.toml, then grepping that crate for the dependency's name. If it
didn't show up, I attempted to remove it.

I left a few dependencies that this process flagged:

* generated_types
  - `pbjson`,`serde`. Apparently used by the generated code.

* grpc-router-test-gen
  - `prost`. Apparently used by the generated code.

* influxdb_iox
  - `heappy`. Doesn't appear used, but is behind enough feature
    flags that I don't care to reason about and it's already optional.
  - `tikv_jemalloc_sys`. Appears to be setting a feature flag of an
    indirect dependency.

* iox_gitops_adapter
  - `k8s_openapi`. Appears to be setting a feature flag of an indirect
    dependency.
2022-05-06 15:57:58 -04:00
Carol (Nichols || Goulding) 068096e7e1
fix: Rename data_types2 to data_types 2022-05-06 14:45:39 -04:00
Carol (Nichols || Goulding) 0541c6e40f
fix: Remove data_types crate where it's no longer used 2022-05-06 14:45:39 -04:00
Carol (Nichols || Goulding) 94be7407ba
refactor: Move BooleanFlag to the only place it's used 2022-05-06 14:45:38 -04:00
Carol (Nichols || Goulding) d2671355c3
fix: Move partition metadata types to data_types2 2022-05-06 14:45:37 -04:00
Carol (Nichols || Goulding) c221960ebd
fix: Move chunk metadata types to data_types2 2022-05-06 14:45:36 -04:00
Andrew Lamb 37c7ce793c
chore: Update datafusion (again) (#4518)
* chore: Update datafusion (again)

* refactor: Update ExecutionPlan:execute to not be async
2022-05-05 15:43:41 +00:00
Andrew Lamb bc5725b1fc
fix: Move schema pivot work out of `execute` (#4520)
* refactor: Extract watch_task into datafusion_util

* refactor: do work in a separate task

* fix: cleanup

* fix: update test
2022-05-05 15:11:14 +00:00
Andrew Lamb 02893e598c
chore: Update datafusion and upgrade arrow/parquet/arrow-flight to 13 (#4516)
* chore: Tool for automating arrow version update

* chore: Update datafusion and arrow/parquet/arrow-flight

* fix: update for changes in Arrow API

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-05 00:21:02 +00:00
Nga Tran 4813cc8332
test: Added explain tests for querier. Found and fixed #4468 (#4469)
* test: Added explain tests for querier. Found and fixed #4468

* chore: cleanup

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-29 14:15:30 +00:00
Nga Tran 5688bd63a3
docs: document for deduplication and sort key (#4452)
* docs: document for deduplication and sort key

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* chore: Update query/src/provider.rs

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* chore: address review comments

* fix: fix comment

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-28 16:44:40 +00:00
Andrew Lamb 6ec0b401e9
fix: Support for computing series across multiple record batches (#4444)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-28 13:12:27 +00:00
dependabot[bot] 420c306caa
chore(deps): Bump tokio from 1.17.0 to 1.18.0 (#4453)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.17.0 to 1.18.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.17.0...tokio-1.18.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-28 08:21:17 +00:00
Andrew Lamb 115f007317
refactor: Use DataFusion `Expr` instead of our own custom wrapper for `ValueExpr` (#4440)
* refactor: Use DataFusion `Expr` instead of custom wrapper for BinaryExprs

* fix: apply code review suggestions

* fix: more code review suggestions
2022-04-27 19:20:15 +00:00
Andrew Lamb 6d2a8256ba
feat: Add window_bounds to IOx Function Registry (2.5/3) (#4432)
* feat: Add window_bounds to IOx Function Registry

* refactor: Prepare for lib tests

* test: Add explicit tests for plumbing

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-27 15:08:38 +00:00
Nga Tran fa2c1febf4
feat: use stored partition sort key to deduplicate data (#4360)
* feat: use stored sort key to deduplicate data

* refactor: verify if one is a super sort key of the other

* test: unit tests for scan and deduplication plans

* fix: typo

* refactor: refactor and add comments

* feat: cache partition sort key to read during planning as needed

* test: tests for query plans with different overlap groups

* chore: cleanup

* chore: resolve merge conflicts

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-26 20:36:32 +00:00
Andrew Lamb 9e91af4501
refactor: Move IOx UDfs into a Function Registry (1/3) (#4428)
* refactor: Move all UDF implementations to query_function crate

* refactor: Move regex udf to query_functions

* refactor: Move functions out of query

* fix: lints, imports

* chore: Run cargo hakari tasks

* fix: clipy + benches

* fix: reduce borrowing and fix clippy

* fix: moar clippy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-26 17:30:27 +00:00
Marco Neumann 2337935660
test: chunks in ingester stage (#4415)
* refactor: document and improve `MockIngesterConnection`

* refactor: split `OldOneMeasurementFourChunksWithDuplicates` for `EXPLAIN` queries

* fix: mark "IngsterPartition" chunks as unsorted

* fix: "group by" queries may require sorted comparison

* refactor: re-export a few more types from querier

* fix: ensure that test parquet files are de-duped

* test: chunks in ingester stage

* docs: explain test code
2022-04-26 07:55:19 +00:00
Nga Tran 0a440bb638
refactor: grouping overlaps now uses the same overlap function in both compactor and deduplication (#4420)
* refactor: grouping overlaps is now use the same overlap function in both compactor and deduplication

* chore: commit missing file

* chore: address review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-25 20:32:51 +00:00
dependabot[bot] 4c94e46642
chore(deps): Bump croaring from 0.5.2 to 0.6.0 (#4408)
* chore(deps): Bump croaring from 0.5.2 to 0.6.0

Bumps [croaring](https://github.com/saulius/croaring-rs) from 0.5.2 to 0.6.0.
- [Release notes](https://github.com/saulius/croaring-rs/releases)
- [Commits](https://github.com/saulius/croaring-rs/compare/0.5.2...0.6.0)

---
updated-dependencies:
- dependency-name: croaring
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix: croaring 0.6.0 compat

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
2022-04-25 16:41:08 +00:00
Nga Tran d963110842
feat: group chunk overlaps based on time range only (#4389)
* feat: overlap for NG querier

* chore: cleanup

* refactor: address review comments

* fix: typo

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-25 13:32:07 +00:00
Marco Neumann 7907a2bae3
fix: column summary conversion for "unknown" TS (#4379)
* fix: column summary conversion for "unknown" TS

Both IOx and DataFusion have the same data model for min/max statistics:

`Option<Option<i64>>` (or any other inner type)

The interpretation is:

1. **`None`:** Value unknown.
2. **`Some(None)`:** Value known to be NULL.
3. **`Some(Some(x))`:** Value known and non NULL.

The bug was that during the conversion from the IOx statistics type to
the DataFusion statistics type for timestamps, case 1 was converted into
case 2.

Up until now this didn't make a difference between timestamps were
basically known all the time, but during the development of NG there are
cases where the timestamps are unknown (this might change, but the query
engine should be correct w/o assuming that).

* docs: explain test

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-22 07:44:55 +00:00
Andrew Lamb e67cc9dbce
chore: Update datafusion again (#4385)
* chore: Update datafusion

* fix: Update imports

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-21 21:05:16 +00:00
Carol (Nichols || Goulding) c7a1c496cf
fix: incorrect overlapped grouping (#4082)
* test: Failing test for finding overlapped groups

* test: Failing test for query overlap too :(

* fix: Group parquet files overlapped by time correctly

Inspired by https://towardsdatascience.com/overlapping-time-period-problem-b7f1719347db

Not sure what the real name for this algorithm is

* refactor: Group items without an intermediate hashmap needed

* chore: cleanup

Co-authored-by: NGA-TRAN <nga-tran@live.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-21 18:51:30 +00:00
Andrew Lamb 73bed810da
chore: Update arrow, arrow-flight, parquet, tonic, prost, etc (#4357)
* chore: Update datafusion

* chore: Update arrow/arrow-flight/parquet to 12

* chore: update datafusion correctly

* chore: Update prost, tonic, and dependents

* fix: Fixup some api changes

* fix: Update test output in db

* fix: Update test output in parquet_file

* fix: remove old pbjson types

* fix: Add "--experimental_allow_proto3_optional" flag

* chore: Run cargo hakari tasks

* fix: compile error

* chore: Update heappy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-20 11:12:17 +00:00
Andrew Lamb e3d83fe757
chore: update datafusion (#4342)
* chore: update datafusion

* fix: Update imports for change in datafusion organization
2022-04-19 13:38:12 +00:00
Nga Tran 2a601c3099
fix: Revert "chore: Revert "fx: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299)" (#4303)" (#4327)" (#4328)
* fix: Revert "chore: Revert "fx: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299)" (#4303)" (#4327)"

This reverts commit 7e5d719027.

* chore: resolve merge conflict

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-18 15:27:39 +00:00
Nga Tran 8e2d158a37
test: deadlock test and add more debug log (#4319)
* test: use Paul deadlock reproducer and add more debug log

* test: remove compare many output rows

* test: verify the test putput

* chore: cleanup

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-14 18:06:22 +00:00
Nga Tran 7e5d719027
chore: Revert "fix: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299)" (#4303)" (#4327)
This reverts commit fe8d9948d5.
2022-04-14 17:11:55 +00:00
Carol (Nichols || Goulding) fe8d9948d5
fix: Revert "fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299)" (#4303)
This reverts commit 7ddbf7c025.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-14 15:42:28 +00:00
Dom Dwyer 31fdeaaabc refactor: log split worker panics at error level
When the split background worker panics, it now causes an ERROR level
log to be emitted.
2022-04-14 15:39:35 +01:00
Dom Dwyer 00b5c1b296 fix: compaction deadlock
This commit resolves the compaction deadlock described in #4306.

The deadlock occurs during StreamSplitExec execution, where a background
worker is spawned to read input record batches and partition them into
two groups. This code pushes the resulting split record batches into two
channels - one for records that match a given predicate, and another
channel for those that do not. These channels buffer at most 2 record
batches each.

The compactor that executes this plan reads the resulting partitions
sequentially to completion. Completion is indicated by reading until the
results stream ends, which ends when the underlying channel is closed,
and therefore the split worker task must have finished and closed the
results channel for the partition to be successfully read.

While the compactor is reading from the first partition, the worker is
attempting to push record batches into the second partition and blocks
due to the channel capacity being reached. The worker never drops the
channel for the first partition, so the compactor never finishes reading
the first partition, and nothing is reading the second partition to
unblock the worker. Deadlock!
2022-04-14 15:39:35 +01:00
Nga Tran 3070d78e8c
chore: add more compactor debug info (#4310)
* chore: add more compactor debug info

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* chore: fix format

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-13 19:22:19 +00:00
Carol (Nichols || Goulding) 7ddbf7c025
fix: Revert "feat: Use the sort key stored in the catalog during compaction" (#4299)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-13 14:11:10 +00:00
kodiakhq[bot] 21f748062e
Merge branch 'main' into cn/sort-in-compactor 2022-04-13 12:43:31 +00:00
Andrew Lamb e96aed6949
chore: add comments and `trace` calls to query provider regarding sort keys (#4274)
* chore: add comments and debug to query provider

* docs: Update query/src/provider.rs

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-12 16:36:39 +00:00
Carol (Nichols || Goulding) 55fe3b8d50
feat: Use the sort key stored in the catalog during compaction
Fixes #4249.
2022-04-11 14:09:45 -04:00
Andrew Lamb be4ebe2563
feat: Add more context to error messages (#4263)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-11 10:51:50 +00:00
Nga Tran f838cb78a2
fix: not to add IOxReadFilterNode for empty non-duplicated chunks (#4264)
* fix: not to add IOxReadFilterNode for no data of non-duplicated chunks if there is already scan node for overlapped/duplicated chunks

* refactor: address review comments

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-08 21:03:22 +00:00
Andrew Lamb bbbdcc75a8
feat: `QuerierDatabase::chunks` returns `Result` (#4260)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-08 18:54:17 +00:00
Andrew Lamb 34e65c23fa
fix: Update for signature change (#4252)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-08 11:21:07 +00:00
Carol (Nichols || Goulding) b16fcc284d
feat: Add new columns to the sort key during compaction
Connects to #4196.
2022-04-06 09:31:42 -04:00
Carol (Nichols || Goulding) 9043966443
docs: Fix some typos in comments as I noticed them 2022-03-31 16:34:47 -04:00
Andrew Lamb 22b24bdab3
chore: Update datafusion again (#4148)
* chore: update datafusoon

* refactor: Update for DataFusion API changes

* chore: TEMP TEMP change df to local copy

* chore: Update to datafusion again

* fix: Update Cargo.lock

* fix: logical conflict
2022-03-30 16:51:48 +00:00
Marco Neumann 20bbb88dc5
refactor: remove table name from `TableSummary` (#4170)
This allows us to remove the table name from the low-level chunk
representations (like `ParquetFile`, RUB, ...) since table names are
already tracked by the higher-level data structures (e.g. catalog,
catalog chunk) that manage the low-level chunk representations.

This is similar to #4167.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 13:24:00 +00:00
Marco Neumann 2b76c31157
refactor: make statistics null counts optional (#4160)
Min/max values and distinct counts are already optional, so let's make
the null counts optional as well. This will be helpful for NG to deal w/
partial statistics (e.g. we only populate stats for the time column).

Note that the total count is still mandatory, but we normally have the
chunk/file-level row count at hand.
2022-03-29 17:47:57 +00:00
dependabot[bot] 17af5fcbd1
chore(deps): Bump tokio-util from 0.7.0 to 0.7.1 (#4154)
* chore(deps): Bump tokio-util from 0.7.0 to 0.7.1

Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.0 to 0.7.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.0...tokio-util-0.7.1)

---
updated-dependencies:
- dependency-name: tokio-util
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-29 08:39:02 +00:00
Andrew Lamb 5c69a3f43b
chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 (#4119)
* chore: update datafusion

* chore(deps): Bump arrow from 10.0.0 to 11.0.0

Bumps [arrow](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0)

---
updated-dependencies:
- dependency-name: arrow
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0

Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0)

---
updated-dependencies:
- dependency-name: arrow-flight
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: update parquet to 11.0.0

* fix: error on create schema, test for same

* fix: upgrade zstd

* chore: Run cargo hakari tasks

* fix: fix logical merge conflict

* fix: hakari

* fix: hakari

* fix: update newly introduced dep

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 15:27:36 +00:00
Andrew Lamb b83b000590
chore: Update datafusion (#4071)
* chore: update to datafusion 5936edc2a94d5fb20702a41eab2b80695961b9dc

* chore: Update apis to match datafusion changes
2022-03-22 13:17:41 +00:00
Marco Neumann c9908b260c
refactor: dyn-dispatch database in query subsystem (#4083)
* refactor: dyn-dispatch database in query subsystem

This is similar to #4080 but concerns the database itself.

For #3934.

* docs: improve wording

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-22 09:15:52 +00:00