Commit Graph

7407 Commits (81d41f81a1c55e0b6f842e5b1081246c4cae1002)

Author SHA1 Message Date
Andrew Lamb 5c69a3f43b
chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 (#4119)
* chore: update datafusion

* chore(deps): Bump arrow from 10.0.0 to 11.0.0

Bumps [arrow](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0)

---
updated-dependencies:
- dependency-name: arrow
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0

Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0)

---
updated-dependencies:
- dependency-name: arrow-flight
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: update parquet to 11.0.0

* fix: error on create schema, test for same

* fix: upgrade zstd

* chore: Run cargo hakari tasks

* fix: fix logical merge conflict

* fix: hakari

* fix: hakari

* fix: update newly introduced dep

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 15:27:36 +00:00
Andrew Lamb 1ca9d28fee
fix: remove unecessary test setup (#4127)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 15:17:17 +00:00
Dom Dwyer a4aedcf2ae fix: propagate shutdown into CompactorHandler
Prior to this commit, calling shutdown() on the CompactorServerType (the
server layer run by the iox binary) would cancel it's own
CancellationToken, while the CompactorHandler (the actual compaction
workload entrypoint) would be watching it's own, different token.

This commit removes the redundant CancellationToken in the
CompactorServerType, instead using the inner CompactorHandler for
cancellation notification & completion.
2022-03-24 15:16:11 +00:00
Dom Dwyer 011d3fbf25 fix: propagate shutdown into QuerierHandlerImpl
Prior to this commit, calling shutdown() on the QuerierServer (the
server layer run by the iox binary) would cancel it's own
CancellationToken, while the QuerierHandlerImpl (the actual querier
workload entrypoint) would be watching it's own, different token.

This commit removes the redundant CancellationToken in the
QuerierServer, instead using the inner QueryHandlerImpl for cancellation
notification & completion.
2022-03-24 15:16:11 +00:00
Marco Neumann 8ca5c337b2
refactor: port more query tests to NG, some code clean up (#4125)
* refactor: inline function that is used once

* refactor: generalize multi-chunk creation for NG

* refactor: `TwoMeasurementsManyFieldsTwoChunks` is OG-specific

* refactor: generalize `OneMeasurementTwoChunksDifferentTagSet`

* refactor: port `OneMeasurementFourChunksWithDuplicates` to NG

* refactor: `TwoMeasurementsManyFieldsLifecycle` is OG-specific

* refactor: simplify NG chunk generation

* refactor: port `ThreeDeleteThreeChunks` to NG

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 15:07:09 +00:00
Andrew Lamb fb75ef7b82
test: add end to end test for all in one mode, restructure fixture (#4114)
* test: add end to end test for all in one mode, restructure fixture

* docs: fix typos and clarify schema requrements
2022-03-24 12:53:25 +00:00
kodiakhq[bot] 70a9806c29
Merge pull request #4069 from influxdata/dom/test-distribution
test: assert uniform-ish sharder distribution
2022-03-24 11:55:09 +00:00
kodiakhq[bot] 2e40b1435d
Merge branch 'main' into dom/test-distribution 2022-03-24 11:45:58 +00:00
Dom Dwyer cc284fc5dc chore: expanded test comment 2022-03-24 11:42:58 +00:00
Marco Neumann cc7f744e8e
test: two-chunk scenarios for NG (#4113)
Add the generic components to create two-chunk scenarios. Includes small
scenario fixes for things like system tables that are not identical
between OG and NG (also see #4111.)

Ref #3934.
2022-03-24 09:50:57 +00:00
Marco Neumann 283d3dad5d
refactor: generalize query test scenarios a bit (#4103)
Some query test scenarios are duplicates and are very OG specific. Let's
use generic scenarios (i.e. the ones that contain all chunk stages
instead of a specific one) where applicable.

For #3934.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 09:30:19 +00:00
Andrew Lamb 204dd7c8e9
refactor: Fix some random clippy lints from the future (#4118)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 09:21:29 +00:00
Andrew Lamb 29b89aaca7
refactor: extract influxrpc, flight and testing gRPC out of influxdb_ioxd (#4106)
* refactor: extract grpc service implementations out of influxdb_ioxd

* chore: Run cargo hakari tasks

* refactor: rename server_common to service_common

* refactor: rename server_grpc_influxrpc to service_grpc_influxrpc

* refactor: rename server_grpc_flight to service_grpc_flight

* refactor: rename server_grpc_testing to service_grpc_testing

* fix: Cargo.toml

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-03-23 20:14:45 +00:00
kodiakhq[bot] 3f9a58362b
Merge pull request #4111 from influxdata/crepererum/issue3934h
refactor: make query tests less OG-specific
2022-03-23 19:18:08 +00:00
kodiakhq[bot] 93485a11ec
Merge branch 'main' into crepererum/issue3934h 2022-03-23 19:10:02 +00:00
Carol (Nichols || Goulding) 67e13a7c34
fix: Change to_delete column on parquet_files to be a time (#4117)
Set to_delete to the time the file was marked as deleted rather than
true.

Fixes #4059.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-23 18:47:27 +00:00
Marco Neumann 51da6dd7fa
feat: store sort key in NG metadata (#4110)
The sort key is optional and currently only produced by `iox_tests`.
Writing it within the ingester/compactor is tracked by #3968. The sort
key is read by the querier (and this will be verified by the query tests
and is required to merge #4103).

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-23 18:24:46 +00:00
Andrew Lamb 7f2c2fde2c
fix: fix all in one mode argument handling so it can start (#4115)
* fix: fix all in one mode argument handling

* fix: clippy

* fix: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-23 18:16:22 +00:00
Marco Neumann c33ef79375
test: improve query test runner output (#4112)
- prints more previous/expected values when failing (instead of just
  emitting an `Err` which will be debug-printed)
- fixed newline handling (i.e. do not add additional newlines in
  `PrintlnWriter::write`)

Before:

```text
Running scenario 'Two chunks: NG Chunk Parquet; NG Chunk Parquet'

SQL: '"SELECT * from information_schema.tables;"'

thread 'cases::test_cases_sql_information_schema_sql' panicked at 'test failed: ScenarioMismatch { scenario_name: "Two chunks: NG Chunk Parquet; NG Chunk Parquet", previous_results: ["+---------------+--------------------+---------------------+------------+", "| table_catalog | table_schema       | table_name          | table_type |", "+---------------+--------------------+---------------------+------------+", "| public        | information_schema | columns             | VIEW       |", "| public        | information_schema | tables              | VIEW       |", "| public        | iox                | h2o                 | BASE TABLE |", "| public        | iox                | o2                  | BASE TABLE |", "| public        | system             | chunk_columns       | BASE TABLE |", "| public        | system             | chunks              | BASE TABLE |", "| public        | system             | columns             | BASE TABLE |", "| public        | system             | operations          | BASE TABLE |", "| public        | system             | persistence_windows | BASE TABLE |", "| public        | system             | queries             | BASE TABLE |", "+---------------+--------------------+---------------------+------------+"], current_results: ["+---------------+--------------------+------------+------------+", "| table_catalog | table_schema       | table_name | table_type |", "+---------------+--------------------+------------+------------+", "| public        | information_schema | columns    | VIEW       |", "| public        | information_schema | tables     | VIEW       |", "| public        | iox                | h2o        | BASE TABLE |", "| public        | iox                | o2         | BASE TABLE |", "+---------------+--------------------+------------+------------+"] }', query_tests/src/cases.rs:169:10
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
```

After:

```text
Running scenario 'Two chunks: NG Chunk Parquet; NG Chunk Parquet'
SQL: '"SELECT * from information_schema.tables;"'
Answers produced by scenario Two chunks: NG Chunk Parquet; NG Chunk Parquet differ from previous answer

previous:
+---------------+--------------------+---------------------+------------+
| table_catalog | table_schema       | table_name          | table_type |
+---------------+--------------------+---------------------+------------+
| public        | information_schema | columns             | VIEW       |
| public        | information_schema | tables              | VIEW       |
| public        | iox                | h2o                 | BASE TABLE |
| public        | iox                | o2                  | BASE TABLE |
| public        | system             | chunk_columns       | BASE TABLE |
| public        | system             | chunks              | BASE TABLE |
| public        | system             | columns             | BASE TABLE |
| public        | system             | operations          | BASE TABLE |
| public        | system             | persistence_windows | BASE TABLE |
| public        | system             | queries             | BASE TABLE |
+---------------+--------------------+---------------------+------------+

current:
+---------------+--------------------+------------+------------+
| table_catalog | table_schema       | table_name | table_type |
+---------------+--------------------+------------+------------+
| public        | information_schema | columns    | VIEW       |
| public        | information_schema | tables     | VIEW       |
| public        | iox                | h2o        | BASE TABLE |
| public        | iox                | o2         | BASE TABLE |
+---------------+--------------------+------------+------------+

thread 'cases::test_cases_sql_information_schema_sql' panicked at 'test failed: ScenarioMismatch { scenario_name: "Two chunks: NG Chunk Parquet; NG Chunk Parquet", previous_results: ["+---------------+--------------------+---------------------+------------+", "| table_catalog | table_schema       | table_name          | table_type |", "+---------------+--------------------+---------------------+------------+", "| public        | information_schema | columns             | VIEW       |", "| public        | information_schema | tables              | VIEW       |", "| public        | iox                | h2o                 | BASE TABLE |", "| public        | iox                | o2                  | BASE TABLE |", "| public        | system             | chunk_columns       | BASE TABLE |", "| public        | system             | chunks              | BASE TABLE |", "| public        | system             | columns             | BASE TABLE |", "| public        | system             | operations          | BASE TABLE |", "| public        | system             | persistence_windows | BASE TABLE |", "| public        | system             | queries             | BASE TABLE |", "+---------------+--------------------+---------------------+------------+"], current_results: ["+---------------+--------------------+------------+------------+", "| table_catalog | table_schema       | table_name | table_type |", "+---------------+--------------------+------------+------------+", "| public        | information_schema | columns    | VIEW       |", "| public        | information_schema | tables     | VIEW       |", "| public        | iox                | h2o        | BASE TABLE |", "| public        | iox                | o2         | BASE TABLE |", "+---------------+--------------------+------------+------------+"] }', query_tests/src/cases.rs:169:10
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
```
2022-03-23 18:06:09 +00:00
kodiakhq[bot] 32adb37591
Merge pull request #4049 from influxdata/cn/get-tombstones
feat: Add tombstones to parquet files for compaction
2022-03-23 14:28:20 +00:00
kodiakhq[bot] 58bfab5a8c
Merge branch 'main' into cn/get-tombstones 2022-03-23 14:18:41 +00:00
Paul Dix 4f5321d19b
feat: add compactor configuration for kafka topic and sequencers (#4107) 2022-03-23 14:11:47 +00:00
Carol (Nichols || Goulding) c3a8834970
test: Add a test for add_tombstones_to_groups 2022-03-23 09:56:27 -04:00
Carol (Nichols || Goulding) 080156aa27
fix: Only do one catalog query for tombstones per each group of parquet files
The query will get all tombstones that could be relevant to the group;
then associate subsets of the results with each parquet file.
2022-03-23 09:56:26 -04:00
Carol (Nichols || Goulding) 2749c37d02
fix: Query for tombstones in a time range, not for a particular parquet file
The compactor at this point is still querying for each file; this is an
intermediate step
2022-03-23 09:52:00 -04:00
Carol (Nichols || Goulding) 4d2e71c03e
feat: Wrap parquet files with their relevant tombstones 2022-03-23 09:52:00 -04:00
Carol (Nichols || Goulding) 87dc2981f6
feat: Query for tombstones relevant to a parquet file
Connects to #3948.
2022-03-23 09:52:00 -04:00
Luke Bond e109fa4987
feat: schema client and CLI (#4105)
* feat: schema client and CLI

* chore: clarification in comment in schema command

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-23 13:49:24 +00:00
dependabot[bot] 8ee9b793f5
chore(deps): Bump hyper from 0.14.17 to 0.14.18 (#4109)
Bumps [hyper](https://github.com/hyperium/hyper) from 0.14.17 to 0.14.18.
- [Release notes](https://github.com/hyperium/hyper/releases)
- [Changelog](https://github.com/hyperium/hyper/blob/master/CHANGELOG.md)
- [Commits](https://github.com/hyperium/hyper/compare/v0.14.17...v0.14.18)

---
updated-dependencies:
- dependency-name: hyper
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-23 13:12:43 +00:00
dependabot[bot] 36071e2d12
chore(deps): Bump log from 0.4.14 to 0.4.16 (#4108)
Bumps [log](https://github.com/rust-lang/log) from 0.4.14 to 0.4.16.
- [Release notes](https://github.com/rust-lang/log/releases)
- [Changelog](https://github.com/rust-lang/log/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/log/commits)

---
updated-dependencies:
- dependency-name: log
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-23 13:01:30 +00:00
Marco Neumann 5ae1e2fecf refactor: make query tests less OG-specific 2022-03-23 12:04:32 +01:00
Marco Neumann 89206e013c
test: run SOME query tests for querier (#4098)
This includes some type changes to dispatch between OG and NG and allows
some tests to be run against the NG querier. This only contains parquet
files though, so it's somewhat a limited scope.

For #3934.
2022-03-22 17:39:19 +00:00
Nga Tran c3ef56588f
feat: use creation time to check level upgradable (#4094)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-22 13:51:18 +00:00
Paul Dix b18b18afd9
fix: have ingester use single mutable batch for buffer (#4095)
Removed some unnecessary tests as they no longer apply with the new buffer structure. This will hopefully reduce the memory footprint of the ingesters significantly.

Closes #4072
2022-03-22 13:42:52 +00:00
Nga Tran 886f9dc8c1
feat: split compacted data into 2 compacted sets (#4088)
* feat: split compacted data into 2 compacted sets

* chore: clean up

* refactor: address review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-22 13:28:32 +00:00
Andrew Lamb b83b000590
chore: Update datafusion (#4071)
* chore: update to datafusion 5936edc2a94d5fb20702a41eab2b80695961b9dc

* chore: Update apis to match datafusion changes
2022-03-22 13:17:41 +00:00
Luke Bond b098828c97
feat: schema grpc server & proto in router2 (#4081)
* feat: schema grpc server & proto in router2

* chore: comments in schema proto

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-22 11:27:20 +00:00
Marco Neumann c9908b260c
refactor: dyn-dispatch database in query subsystem (#4083)
* refactor: dyn-dispatch database in query subsystem

This is similar to #4080 but concerns the database itself.

For #3934.

* docs: improve wording

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-22 09:15:52 +00:00
Luke Bond 9ec45f5aec
Revert "fix: propagate shutdown into QuerierHandlerImpl" (#4090)
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-03-21 18:53:06 +00:00
Luke Bond 24e03deb5b
Revert "fix: propagate shutdown into CompactorHandler" (#4091) 2022-03-21 18:43:24 +00:00
Luke Bond 3c2775f8f2
fix: teach semantic commits script about GH revert PRs (#4092) 2022-03-21 17:37:28 +00:00
Marco Neumann 55643945a1
refactor: `querier` w/o `db` (#4063)
* feat: `TombstoneRepo::list_by_table`

* feat: `ParquetFileRepo::list_by_table_not_to_delete`

* refactor: `querier` w/o `db`

Get the `querier` to work w/o relying on `db`. A few notes:

- Testing is kinda shallow, we really need to get `query_tests` working
  w/ `querier` (see #3934).
- We still run a sync loop for namespaces, tables and schemas. This will
  be a replaced by "update namespace incl. tables and schemas on demand".
  Note however that we cannot fetch single tables and schemas on demand
  at the moment, because DataFusion doesn't implement async schema
  inspection (only `scan` / "give me all the chunks" is async). I think
  that's OK for now and we can address this later.
- There is NO cache for parquet files and tombstones at the moment. For
  correctness, they need to be fetched in a single transaction (or we
  need a kinda tricky sequence number / logical clock tracking) and I am
  not sure yet how this makes sense when we have the ingester data wired
  up and predicates pushed down to the catalog (see next point). So
  let's measure first and then decide on a caching strategy for this.
- Predicates are currently NOT pushed down to the catalog. I'll need to
  figure out how to extract time range from generic DataFusion
  expressions to make that work (it's easier for InfluxRPC queries, but
  they are not tested at the moment, see first point).

Sorry that this commit is kinda huge. I initially planned to only
migrate the chunks away from `db` and leave the tables and schemas for a
follow-up PR, but the DataFusion trait structure (chunks are bound to
their tables) makes this kinda pointless.

Closes #3974.

* docs: explain what we're doing

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* docs: mention tracking issues

* docs: explain what we're doing

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-03-21 16:58:00 +00:00
dependabot[bot] a23efce408
chore(deps): Bump kube-derive from 0.69.1 to 0.70.0 (#4073)
* chore(deps): Bump kube-derive from 0.69.1 to 0.70.0

Bumps [kube-derive](https://github.com/kube-rs/kube-rs) from 0.69.1 to 0.70.0.
- [Release notes](https://github.com/kube-rs/kube-rs/releases)
- [Changelog](https://github.com/kube-rs/kube-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/kube-rs/kube-rs/compare/0.69.1...0.70.0)

---
updated-dependencies:
- dependency-name: kube-derive
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): Bump kube-runtime from 0.69.1 to 0.70.0

Bumps [kube-runtime](https://github.com/kube-rs/kube-rs) from 0.69.1 to 0.70.0.
- [Release notes](https://github.com/kube-rs/kube-rs/releases)
- [Changelog](https://github.com/kube-rs/kube-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/kube-rs/kube-rs/compare/0.69.1...0.70.0)

---
updated-dependencies:
- dependency-name: kube-runtime
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: upgrade kube to version 0.70

* chore: hakari

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
2022-03-21 15:32:45 +00:00
kodiakhq[bot] d3d628fcf3
Merge pull request #4070 from influxdata/cn/update-catalog
feat: Update the catalog for a completed compaction set
2022-03-21 14:28:49 +00:00
Carol (Nichols || Goulding) 201ced1d66
test: Mark a parquet file deleted in the update catalog operation 2022-03-21 10:16:58 -04:00
Carol (Nichols || Goulding) dbca54d917
refactor: Move add parquet file and tombstones within update catalog
This should never be done on its own so doesn't really need to be its
own method. We also don't do anything with the returned data, so no need
to allocate those vectors.
2022-03-21 10:16:58 -04:00
Carol (Nichols || Goulding) 2fea10dfd7
feat: Mark old compacted parquet files to be deleted in transaction
Connects to #3952
2022-03-21 10:16:58 -04:00
Carol (Nichols || Goulding) 5b294968a5
feat: Add processed tombstone records with compacted parquet file
In a transaction when the parquet file is added to the catalog.

Connects to #3952.
2022-03-21 10:16:57 -04:00
Carol (Nichols || Goulding) b983b24fcf
fix: Adding processed tombstones to catalog only needs tombstone ID 2022-03-21 10:16:57 -04:00
Carol (Nichols || Goulding) 8fd3d85634
refactor: Move add_parquet_file_with_tombstones from ingester to compactor 2022-03-21 10:16:57 -04:00