Commit Graph

7293 Commits (3f9a58362bd99d363683f09e75057f7f2c6910a4)

Author SHA1 Message Date
kodiakhq[bot] 3f9a58362b
Merge pull request #4111 from influxdata/crepererum/issue3934h
refactor: make query tests less OG-specific
2022-03-23 19:18:08 +00:00
kodiakhq[bot] 93485a11ec
Merge branch 'main' into crepererum/issue3934h 2022-03-23 19:10:02 +00:00
Carol (Nichols || Goulding) 67e13a7c34
fix: Change to_delete column on parquet_files to be a time (#4117)
Set to_delete to the time the file was marked as deleted rather than
true.

Fixes #4059.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-23 18:47:27 +00:00
Marco Neumann 51da6dd7fa
feat: store sort key in NG metadata (#4110)
The sort key is optional and currently only produced by `iox_tests`.
Writing it within the ingester/compactor is tracked by #3968. The sort
key is read by the querier (and this will be verified by the query tests
and is required to merge #4103).

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-23 18:24:46 +00:00
Andrew Lamb 7f2c2fde2c
fix: fix all in one mode argument handling so it can start (#4115)
* fix: fix all in one mode argument handling

* fix: clippy

* fix: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-23 18:16:22 +00:00
Marco Neumann c33ef79375
test: improve query test runner output (#4112)
- prints more previous/expected values when failing (instead of just
  emitting an `Err` which will be debug-printed)
- fixed newline handling (i.e. do not add additional newlines in
  `PrintlnWriter::write`)

Before:

```text
Running scenario 'Two chunks: NG Chunk Parquet; NG Chunk Parquet'

SQL: '"SELECT * from information_schema.tables;"'

thread 'cases::test_cases_sql_information_schema_sql' panicked at 'test failed: ScenarioMismatch { scenario_name: "Two chunks: NG Chunk Parquet; NG Chunk Parquet", previous_results: ["+---------------+--------------------+---------------------+------------+", "| table_catalog | table_schema       | table_name          | table_type |", "+---------------+--------------------+---------------------+------------+", "| public        | information_schema | columns             | VIEW       |", "| public        | information_schema | tables              | VIEW       |", "| public        | iox                | h2o                 | BASE TABLE |", "| public        | iox                | o2                  | BASE TABLE |", "| public        | system             | chunk_columns       | BASE TABLE |", "| public        | system             | chunks              | BASE TABLE |", "| public        | system             | columns             | BASE TABLE |", "| public        | system             | operations          | BASE TABLE |", "| public        | system             | persistence_windows | BASE TABLE |", "| public        | system             | queries             | BASE TABLE |", "+---------------+--------------------+---------------------+------------+"], current_results: ["+---------------+--------------------+------------+------------+", "| table_catalog | table_schema       | table_name | table_type |", "+---------------+--------------------+------------+------------+", "| public        | information_schema | columns    | VIEW       |", "| public        | information_schema | tables     | VIEW       |", "| public        | iox                | h2o        | BASE TABLE |", "| public        | iox                | o2         | BASE TABLE |", "+---------------+--------------------+------------+------------+"] }', query_tests/src/cases.rs:169:10
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
```

After:

```text
Running scenario 'Two chunks: NG Chunk Parquet; NG Chunk Parquet'
SQL: '"SELECT * from information_schema.tables;"'
Answers produced by scenario Two chunks: NG Chunk Parquet; NG Chunk Parquet differ from previous answer

previous:
+---------------+--------------------+---------------------+------------+
| table_catalog | table_schema       | table_name          | table_type |
+---------------+--------------------+---------------------+------------+
| public        | information_schema | columns             | VIEW       |
| public        | information_schema | tables              | VIEW       |
| public        | iox                | h2o                 | BASE TABLE |
| public        | iox                | o2                  | BASE TABLE |
| public        | system             | chunk_columns       | BASE TABLE |
| public        | system             | chunks              | BASE TABLE |
| public        | system             | columns             | BASE TABLE |
| public        | system             | operations          | BASE TABLE |
| public        | system             | persistence_windows | BASE TABLE |
| public        | system             | queries             | BASE TABLE |
+---------------+--------------------+---------------------+------------+

current:
+---------------+--------------------+------------+------------+
| table_catalog | table_schema       | table_name | table_type |
+---------------+--------------------+------------+------------+
| public        | information_schema | columns    | VIEW       |
| public        | information_schema | tables     | VIEW       |
| public        | iox                | h2o        | BASE TABLE |
| public        | iox                | o2         | BASE TABLE |
+---------------+--------------------+------------+------------+

thread 'cases::test_cases_sql_information_schema_sql' panicked at 'test failed: ScenarioMismatch { scenario_name: "Two chunks: NG Chunk Parquet; NG Chunk Parquet", previous_results: ["+---------------+--------------------+---------------------+------------+", "| table_catalog | table_schema       | table_name          | table_type |", "+---------------+--------------------+---------------------+------------+", "| public        | information_schema | columns             | VIEW       |", "| public        | information_schema | tables              | VIEW       |", "| public        | iox                | h2o                 | BASE TABLE |", "| public        | iox                | o2                  | BASE TABLE |", "| public        | system             | chunk_columns       | BASE TABLE |", "| public        | system             | chunks              | BASE TABLE |", "| public        | system             | columns             | BASE TABLE |", "| public        | system             | operations          | BASE TABLE |", "| public        | system             | persistence_windows | BASE TABLE |", "| public        | system             | queries             | BASE TABLE |", "+---------------+--------------------+---------------------+------------+"], current_results: ["+---------------+--------------------+------------+------------+", "| table_catalog | table_schema       | table_name | table_type |", "+---------------+--------------------+------------+------------+", "| public        | information_schema | columns    | VIEW       |", "| public        | information_schema | tables     | VIEW       |", "| public        | iox                | h2o        | BASE TABLE |", "| public        | iox                | o2         | BASE TABLE |", "+---------------+--------------------+------------+------------+"] }', query_tests/src/cases.rs:169:10
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
```
2022-03-23 18:06:09 +00:00
kodiakhq[bot] 32adb37591
Merge pull request #4049 from influxdata/cn/get-tombstones
feat: Add tombstones to parquet files for compaction
2022-03-23 14:28:20 +00:00
kodiakhq[bot] 58bfab5a8c
Merge branch 'main' into cn/get-tombstones 2022-03-23 14:18:41 +00:00
Paul Dix 4f5321d19b
feat: add compactor configuration for kafka topic and sequencers (#4107) 2022-03-23 14:11:47 +00:00
Carol (Nichols || Goulding) c3a8834970
test: Add a test for add_tombstones_to_groups 2022-03-23 09:56:27 -04:00
Carol (Nichols || Goulding) 080156aa27
fix: Only do one catalog query for tombstones per each group of parquet files
The query will get all tombstones that could be relevant to the group;
then associate subsets of the results with each parquet file.
2022-03-23 09:56:26 -04:00
Carol (Nichols || Goulding) 2749c37d02
fix: Query for tombstones in a time range, not for a particular parquet file
The compactor at this point is still querying for each file; this is an
intermediate step
2022-03-23 09:52:00 -04:00
Carol (Nichols || Goulding) 4d2e71c03e
feat: Wrap parquet files with their relevant tombstones 2022-03-23 09:52:00 -04:00
Carol (Nichols || Goulding) 87dc2981f6
feat: Query for tombstones relevant to a parquet file
Connects to #3948.
2022-03-23 09:52:00 -04:00
Luke Bond e109fa4987
feat: schema client and CLI (#4105)
* feat: schema client and CLI

* chore: clarification in comment in schema command

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-23 13:49:24 +00:00
dependabot[bot] 8ee9b793f5
chore(deps): Bump hyper from 0.14.17 to 0.14.18 (#4109)
Bumps [hyper](https://github.com/hyperium/hyper) from 0.14.17 to 0.14.18.
- [Release notes](https://github.com/hyperium/hyper/releases)
- [Changelog](https://github.com/hyperium/hyper/blob/master/CHANGELOG.md)
- [Commits](https://github.com/hyperium/hyper/compare/v0.14.17...v0.14.18)

---
updated-dependencies:
- dependency-name: hyper
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-23 13:12:43 +00:00
dependabot[bot] 36071e2d12
chore(deps): Bump log from 0.4.14 to 0.4.16 (#4108)
Bumps [log](https://github.com/rust-lang/log) from 0.4.14 to 0.4.16.
- [Release notes](https://github.com/rust-lang/log/releases)
- [Changelog](https://github.com/rust-lang/log/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/log/commits)

---
updated-dependencies:
- dependency-name: log
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-23 13:01:30 +00:00
Marco Neumann 5ae1e2fecf refactor: make query tests less OG-specific 2022-03-23 12:04:32 +01:00
Marco Neumann 89206e013c
test: run SOME query tests for querier (#4098)
This includes some type changes to dispatch between OG and NG and allows
some tests to be run against the NG querier. This only contains parquet
files though, so it's somewhat a limited scope.

For #3934.
2022-03-22 17:39:19 +00:00
Nga Tran c3ef56588f
feat: use creation time to check level upgradable (#4094)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-22 13:51:18 +00:00
Paul Dix b18b18afd9
fix: have ingester use single mutable batch for buffer (#4095)
Removed some unnecessary tests as they no longer apply with the new buffer structure. This will hopefully reduce the memory footprint of the ingesters significantly.

Closes #4072
2022-03-22 13:42:52 +00:00
Nga Tran 886f9dc8c1
feat: split compacted data into 2 compacted sets (#4088)
* feat: split compacted data into 2 compacted sets

* chore: clean up

* refactor: address review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-22 13:28:32 +00:00
Andrew Lamb b83b000590
chore: Update datafusion (#4071)
* chore: update to datafusion 5936edc2a94d5fb20702a41eab2b80695961b9dc

* chore: Update apis to match datafusion changes
2022-03-22 13:17:41 +00:00
Luke Bond b098828c97
feat: schema grpc server & proto in router2 (#4081)
* feat: schema grpc server & proto in router2

* chore: comments in schema proto

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-22 11:27:20 +00:00
Marco Neumann c9908b260c
refactor: dyn-dispatch database in query subsystem (#4083)
* refactor: dyn-dispatch database in query subsystem

This is similar to #4080 but concerns the database itself.

For #3934.

* docs: improve wording

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-22 09:15:52 +00:00
Luke Bond 9ec45f5aec
Revert "fix: propagate shutdown into QuerierHandlerImpl" (#4090)
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-03-21 18:53:06 +00:00
Luke Bond 24e03deb5b
Revert "fix: propagate shutdown into CompactorHandler" (#4091) 2022-03-21 18:43:24 +00:00
Luke Bond 3c2775f8f2
fix: teach semantic commits script about GH revert PRs (#4092) 2022-03-21 17:37:28 +00:00
Marco Neumann 55643945a1
refactor: `querier` w/o `db` (#4063)
* feat: `TombstoneRepo::list_by_table`

* feat: `ParquetFileRepo::list_by_table_not_to_delete`

* refactor: `querier` w/o `db`

Get the `querier` to work w/o relying on `db`. A few notes:

- Testing is kinda shallow, we really need to get `query_tests` working
  w/ `querier` (see #3934).
- We still run a sync loop for namespaces, tables and schemas. This will
  be a replaced by "update namespace incl. tables and schemas on demand".
  Note however that we cannot fetch single tables and schemas on demand
  at the moment, because DataFusion doesn't implement async schema
  inspection (only `scan` / "give me all the chunks" is async). I think
  that's OK for now and we can address this later.
- There is NO cache for parquet files and tombstones at the moment. For
  correctness, they need to be fetched in a single transaction (or we
  need a kinda tricky sequence number / logical clock tracking) and I am
  not sure yet how this makes sense when we have the ingester data wired
  up and predicates pushed down to the catalog (see next point). So
  let's measure first and then decide on a caching strategy for this.
- Predicates are currently NOT pushed down to the catalog. I'll need to
  figure out how to extract time range from generic DataFusion
  expressions to make that work (it's easier for InfluxRPC queries, but
  they are not tested at the moment, see first point).

Sorry that this commit is kinda huge. I initially planned to only
migrate the chunks away from `db` and leave the tables and schemas for a
follow-up PR, but the DataFusion trait structure (chunks are bound to
their tables) makes this kinda pointless.

Closes #3974.

* docs: explain what we're doing

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* docs: mention tracking issues

* docs: explain what we're doing

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-03-21 16:58:00 +00:00
dependabot[bot] a23efce408
chore(deps): Bump kube-derive from 0.69.1 to 0.70.0 (#4073)
* chore(deps): Bump kube-derive from 0.69.1 to 0.70.0

Bumps [kube-derive](https://github.com/kube-rs/kube-rs) from 0.69.1 to 0.70.0.
- [Release notes](https://github.com/kube-rs/kube-rs/releases)
- [Changelog](https://github.com/kube-rs/kube-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/kube-rs/kube-rs/compare/0.69.1...0.70.0)

---
updated-dependencies:
- dependency-name: kube-derive
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): Bump kube-runtime from 0.69.1 to 0.70.0

Bumps [kube-runtime](https://github.com/kube-rs/kube-rs) from 0.69.1 to 0.70.0.
- [Release notes](https://github.com/kube-rs/kube-rs/releases)
- [Changelog](https://github.com/kube-rs/kube-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/kube-rs/kube-rs/compare/0.69.1...0.70.0)

---
updated-dependencies:
- dependency-name: kube-runtime
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: upgrade kube to version 0.70

* chore: hakari

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
2022-03-21 15:32:45 +00:00
kodiakhq[bot] d3d628fcf3
Merge pull request #4070 from influxdata/cn/update-catalog
feat: Update the catalog for a completed compaction set
2022-03-21 14:28:49 +00:00
Carol (Nichols || Goulding) 201ced1d66
test: Mark a parquet file deleted in the update catalog operation 2022-03-21 10:16:58 -04:00
Carol (Nichols || Goulding) dbca54d917
refactor: Move add parquet file and tombstones within update catalog
This should never be done on its own so doesn't really need to be its
own method. We also don't do anything with the returned data, so no need
to allocate those vectors.
2022-03-21 10:16:58 -04:00
Carol (Nichols || Goulding) 2fea10dfd7
feat: Mark old compacted parquet files to be deleted in transaction
Connects to #3952
2022-03-21 10:16:58 -04:00
Carol (Nichols || Goulding) 5b294968a5
feat: Add processed tombstone records with compacted parquet file
In a transaction when the parquet file is added to the catalog.

Connects to #3952.
2022-03-21 10:16:57 -04:00
Carol (Nichols || Goulding) b983b24fcf
fix: Adding processed tombstones to catalog only needs tombstone ID 2022-03-21 10:16:57 -04:00
Carol (Nichols || Goulding) 8fd3d85634
refactor: Move add_parquet_file_with_tombstones from ingester to compactor 2022-03-21 10:16:57 -04:00
Carol (Nichols || Goulding) 933dc69ecf
feat: For each compacted data set, persist new parquet file to object store (#4058)
* feat: Rearrange skeleton functions for split/persist/catalog update

* feat: Persist compacted files to object storage

Fixes #3951.

* docs: Add comment about batches' schemas
2022-03-21 14:16:03 +00:00
Marco Neumann 0779f81b6b
refactor: rework `TableCache (#4054)
* feat: `TableRepo::get_by_namespace_and_name`

* refactor: rework `TableCache`

- dual cache that can also map table names to IDs
- deal w/ missing tables w/o panics
- set proper timeouts to missing data

For #3974.

* test: extend table cache tests
2022-03-21 13:40:06 +00:00
kodiakhq[bot] 26a7a61d0a
Merge pull request #4080 from influxdata/crepererum/issue3934d
refactor: dyn-dispatch chunks in query subsystem
2022-03-21 12:47:28 +00:00
Marco Neumann d1df95df87 refactor: dyn-dispatch chunks in query subsystem
- this is what DataFusion is doing as well; it's also fast enough
  because the number of chunks in a query is not THAT massive (it's not
  like we are doing row-level dyn dispatching)
- it simplifies abstracting over different databases
- it allows us to drop our enum-based dispatching that we have for
  `DbChunk` and that we would also need for the querier (e.g. depending
  on if a chunk is backed by a parquet file or ingester data)
- it likely speeds up compile times because the `query` is no longer
  contains massive amounts of generic code

For #3934.
2022-03-21 12:47:54 +01:00
dependabot[bot] cd36229e27
chore(deps): Bump synchronized-writer from 1.1.10 to 1.1.11 (#4075)
Bumps [synchronized-writer](https://github.com/magiclen/synchronized-writer) from 1.1.10 to 1.1.11.
- [Release notes](https://github.com/magiclen/synchronized-writer/releases)
- [Commits](https://github.com/magiclen/synchronized-writer/compare/v1.1.10...v1.1.11)

---
updated-dependencies:
- dependency-name: synchronized-writer
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-21 10:53:11 +00:00
dependabot[bot] 66cf34c2e2
chore(deps): Bump tokio-rustls from 0.23.2 to 0.23.3 (#4074)
Bumps [tokio-rustls](https://github.com/tokio-rs/tls) from 0.23.2 to 0.23.3.
- [Release notes](https://github.com/tokio-rs/tls/releases)
- [Commits](https://github.com/tokio-rs/tls/commits)

---
updated-dependencies:
- dependency-name: tokio-rustls
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-21 10:43:07 +00:00
kodiakhq[bot] c543b61f6c
Merge pull request #4079 from influxdata/crepererum/issue3934c
refactor: steps towards dynamic database type dispatch
2022-03-21 10:17:37 +00:00
Marco Neumann ca152e7934 refactor: avoid generics in `QueryDatabase`
A step to make this trait object-safe.

Ref #3934.
2022-03-21 10:45:05 +01:00
Marco Neumann 0071b85c22 refactor: make `ExecutionContextProvider` object-safe
Ref #3934.
2022-03-21 10:40:53 +01:00
dependabot[bot] 836aecc7ad
chore(deps): Bump libc from 0.2.120 to 0.2.121 (#4076)
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.120 to 0.2.121.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.120...0.2.121)

---
updated-dependencies:
- dependency-name: libc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-21 09:28:39 +00:00
kodiakhq[bot] 0e5dc716e3
Merge pull request #4061 from influxdata/crepererum/issue3934b
refactor: make `QueryChunk` object-safe
2022-03-19 07:05:51 +00:00
kodiakhq[bot] 67939fb37d
Merge branch 'main' into crepererum/issue3934b 2022-03-19 06:56:30 +00:00
kodiakhq[bot] c75be65a46
Merge pull request #4067 from influxdata/dom/router-precision
feat(router2): write timestamp precision
2022-03-18 17:45:02 +00:00