Commit Graph

499 Commits (eb6abb5d670799e9e1d9a1a4784305aadf5a6914)

Author SHA1 Message Date
Marco Neumann 840e4801b8
feat: make querier RAM pool split a proper feature (#5283)
* feat: make querier RAM pool split a proper feature

- use propre pool names
- expose sizing via CLI/env

Closes https://github.com/influxdata/conductor/issues/1102.

* refactor: improve naming and docs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-03 15:27:23 +00:00
Nga Tran 471b8be92f
chore: Revert "refactor: bump batch size (#5251)" (#5288)
This reverts commit bb172f8fa8.
2022-08-03 14:23:45 +00:00
Marco Neumann bb172f8fa8
refactor: bump batch size (#5251)
This is what DataFusion uses by default and I don't see a reason why we
should use such small batch sizes.

The affect is probably only visible in certain filter-aggregate queries
that don't focus on a single series (because there we likely end up with
1 or 2 batches only, esp. after #5250) for coarse-grained filters, esp.
  when the filter key is not the first sort key.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-01 13:49:58 +00:00
Sam Arnold 3fbe860bb9
fix: interpret [MIN_NANO_TIME, MAX_NANO_TIME) range as all time for optimization (#5231)
InfluxQL queries can send (technically incorrect) ranges like this, meaning all time
but excluding the max nanosecond time.

Since this is an important case, we should handle it specially and use the optimized
'all time' handling for meta queries even though this is technically wrong in that
it does not filter out column names / measurement names at MAX_NANO_TIME exactly.

Closes: https://github.com/influxdata/conductor/issues/1072

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-28 12:24:26 +00:00
Andrew Lamb 9215a534d0
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` (#5229)
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0`

* chore: Run cargo hakari tasks

* fix: Update for API changes

* fix: clippy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-28 08:10:47 +00:00
Marco Neumann 9a9a1a4777
feat: limit per-table chunk data for every query (#5223)
* feat: `QueryChunk::as_any`

* feat: allo `ChunkPruner::prune_chunks` to fail

* feat: limit per-table chunk data for every query

Closes #5211.

* fix: address review comments

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-07-27 13:20:05 +00:00
Andrew Lamb 66af2bdd88
refactor: Split up `delete_three_delete_three_chunks.sql` test case (#5197)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-22 20:57:31 +00:00
Andrew Lamb 9fed013848
chore: Update datafusion pin (#5162)
* chore: Update datafusion pin

* fix: Update expected output

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-20 14:34:08 +00:00
Marko Mikulicic b8236e2b9d
fix: Fix SeriesKey sort order for special _measurement and _field (#5150)
* fix: Fix SeriesKey sort order for special _measurement and _field

* fix: Update expected test output

* fix: Update more tests

* fix: Re-sort tag key when using binary encoding

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-07-20 08:45:17 +00:00
Marco Neumann b8d9799a26
feat: wire span all the way to `QuerierTable::chunks` (#5134)
* feat: pass context to `QueryDatabase::chunks`

* feat: wire span all the way to `QuerierTable::chunks`

This is required for #5129.
2022-07-19 14:12:55 +00:00
Andrew Lamb e2d871b00b
chore: Update datafusion and arrow/parquet/arrow-flight to `18.0.0` (#5079)
* chore: Update datafusion to 10.0.0, arrow/parquet/arrow-flight to 18

* chore: Run cargo hakari tasks

* fix: update cargo pin

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-18 15:01:03 +00:00
Marco Neumann 9c2b6cd96c
fix: always pass proper context to `InfluxRpcPlanner` (#5144)
There were some instances were we forgot to pass context (and therefore
tracing) information to `InfluxRpcPlanner`. This removes the `Default`
implementation requires to always pass a context when creating
`InfluxRpcPlanner` to prevent this type of bug.

Ref #5129.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-18 14:45:22 +00:00
dependabot[bot] 9b67de2f43
chore(deps): Bump tokio from 1.19.2 to 1.20.0
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.19.2 to 1.20.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.19.2...tokio-1.20.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-14 01:21:43 +00:00
Marco Neumann b1b2cb5d4a
feat: load read buffer on demand (#5091)
* refactor: extract `select_schema`

* refactor: improve `InternalLostInputField` error message

* test: improve SQL runner output

* feat: load read buffer on demand

Closes #5032.

* refactor: move `[Half]OwnedSelection` to `schema` crate`
2022-07-13 08:51:40 +00:00
Marco Neumann 96da584139
test: do NOT create expensive bloom filters when we do not need them (#5089)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-11 16:29:53 +00:00
Marco Neumann 607831585c
refactor: use less executors and threads during tests (#5086)
`Executor` is only used as a performance boundary, not as a correctness
or data boundary so let's try to re-use it. This also simplifies
profiling of tests since we don't end up with hundreds (or even
thousands) of threads.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-11 16:23:22 +00:00
Andrew Lamb c46e1c6347
chore: Update datafusion + arrow/parquet/arrow-flight to `17.0.0` (#5021)
* fix: correct nullability declaration of system tables

* chore: Update datafusion and arrow/parquet/arrow-flight

* chore: Run cargo hakari tasks

* fix: Update tests

* fix: Update tests

* fix: predicate pruning

* fix: add some tests

* fix: query_functions

* fix: fix read_buffer test

* fix: fix clippy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-07 19:22:15 +00:00
Sam Arnold e193913ed3
fix: optimize field columns for all-time predicates (#5046)
* fix: optimize field columns for all-time predicates

Also fix timestamp range to allow selecting points at MAX_NANO_TIME

* fix: clamp end to MIN_NANO_TIME for safety

* refactor: add contains_all method to TimestampRange
2022-07-06 12:01:28 +00:00
Sam Arnold 03f456d8fd
fix: optimize tag_keys to go only to schema when predicate is empty (#4985)
* docs: fix comment

* test: add test for delete behaviour

* fix: tag_keys optimization for empty predicate

Also need to eliminate 'true' predicates from simplified predicate so
is_empty works correctly.

* refactor: use lit instead of spelling out literal true

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-05 12:45:25 +00:00
dependabot[bot] 68eff79594
chore(deps): Bump once_cell from 1.12.0 to 1.13.0 (#5033)
Bumps [once_cell](https://github.com/matklad/once_cell) from 1.12.0 to 1.13.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.12.0...v1.13.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-05 08:54:51 +00:00
Andrew Lamb c4c251129e
chore: Update datafusion (#5020)
* chore: Update datafusion

* fix: Update plan

* fix: update explain plans

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-01 19:59:41 +00:00
Marco Neumann 324eb3f797
fix: Fix test bug for InfluxRPC `<unknown field> != <value>` (#5006)
* fix: ignore InfluxRPC `<unknown field> != <value>`

Fixes #4786.

* refactor: remove special influxrpc "unknown" handling

See <https://github.com/influxdata/influxdb_iox/pull/5006#pullrequestreview-1025625442>.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-01 19:41:34 +00:00
Marco Neumann 016dd93d9c
feat: filter chunks before requesting read buffers (#4996)
Fixes #4976.
2022-07-01 08:59:07 +00:00
Carol (Nichols || Goulding) 3049479b78
feat: Implement new querier to ingester config design 2022-06-30 08:26:50 -04:00
Carol (Nichols || Goulding) 59da2dccb8
feat: Assert if no ingester addresses are found
Temporarily support `--ingester-addresses` (and always return all
ingesters) so that this PR can be deployed during the switchover.
2022-06-30 08:22:47 -04:00
Carol (Nichols || Goulding) f37f8013ec
feat: Assign a sequencer id to QuerierTables to know which ingester to query 2022-06-30 08:22:46 -04:00
Andrew Lamb 01fb2e132d
chore: Update datafusion pin (#4969)
* chore: Update datafusion pin

* fix: Update for api

* fix: Explicitly set coalsce batch size

* fix: Update batch size as well

* fix: update tests for new explain plan, and improved coercion
2022-06-29 17:52:37 +00:00
Dom Dwyer 75a3fd5e1e refactor: use propagated partition key in ingester
Changes the ingester to use the partition key derived in the router, and
transmitted over through the kafka API boundary.

This should have no observable behavioural change, but be more resilient
as we're no longer assuming the partitioning algorithm produces the same
value in both the router (where data is partitioned) and the ingester
(where data is persisted, segregated by partition key).

This is a pre-requisite to allowing the user to specify partitioning
schemes.
2022-06-21 15:57:30 +01:00
Marco Neumann 743c1692ea
refactor: stream query results from ingester to querier (#4875)
* refactor: stream partitions from ingester

Ref #4849.

* refactor: do not collect record batched on the ingester side

Ref #4849.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:58:50 +00:00
Marco Neumann 66c7d95312
refactor: use new ingester<>querier wire protocol (#4867)
* refactor: use new ingester<>querier wire protocol

Use and document the new and more flexible ingester<>querier wire
protocol.

Note that the ingester does NOT stream the response data yet, but the
internal data structures would allow that. A follow-up change will
adjust the ingester code to stream the data.

Ref #4849.

* fix: typos

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: clarify naming and public interface

* test: add schema assertion to `ingester_response_to_record_batches`

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-06-16 08:02:28 +00:00
Dom Dwyer 4df2964566 refactor: store PartitionKey in DmlWrite
Carry the PartitionKey in the DmlWrite, allowing the batch to be
associated with a specific partition key.
2022-06-15 15:48:54 +01:00
Marco Neumann 7c60edd38c
refactor: prepare new ingester<>querier protocol on the querier side (#4863)
* refactor: prepare new ingester<>querier protocol on the querier side

This changes the querier internals to work with the new protocol. The
wire protocol stays the same (for now). There's a (somewhat hackish)
adapter in place on the querier side that converts the old to the new
protocol on-the-fly. This is an intermediate step before we actually
change the wire protocol (and in a step after that also take advantage
of the new possibilites on the ingester side).

Ref #4849.

* docs: explain adapter
2022-06-15 14:32:24 +00:00
Andrew Lamb e91d00b10c
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `16.0.0 (#4851)
* chore: TEMP Update DataFusion to pre-release

* chore: update arrow et al to 16.0.0

* chore: Run cargo hakari tasks

* fix: update reader read_dictionary API

* chore: Update to real Datafusion release

* fix: Update parquet API

* fix: update test

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-06-14 16:31:40 +00:00
Dom Dwyer b41ea1d718 refactor: PartitionKey type
This commit changes the code base to use a new reference-counted
PartitionKey type wrapper, instead of passing a bare String around.

This allows the compiler to type check & verify usage of the partition
key, instead of passing a bare string around. By reference counting the
underlying string, we reduce memory usage for some use cases.
2022-06-14 14:47:56 +01:00
Andrew Lamb 50697906b1
refactor: Make `DMLWrite::sequence_number` a `SequenceNumber` (#4817) 2022-06-09 19:36:37 +00:00
Andrew Lamb 2ec7764fdd
refactor: rename builder like predicate methods to be `with_` (#4808)
* refactor: rename builder like predicate methods to be `with_`

* fix: merge conflict

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-09 11:26:03 +00:00
Andrew Lamb f34282be2c
fix: Do not run DataFusion optimizer pass twice (#4809)
* fix: Do not run DataFusion optimizer pass twice

* docs: improve docstring and logging
2022-06-08 21:01:22 +00:00
Andrew Lamb afc1c12062
refactor: consolidate `PredicateBuilder` into `Predicate` (#4799)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-08 12:21:24 +00:00
Marco Neumann 82f6696516
chore: remove some unused deps (#4803)
Found by `cargo udeps`.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-08 11:15:18 +00:00
Andrew Lamb 8e96a2721d
chore: Update datafusion (again) (#4788)
* chore: Update datafusion

* chore: Update imports

* refactor: update API usage

* refactor: clean up some uses of binary_expr

* fix: remove unused export

* fix: update explain output

* chore: update more explain tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-07 08:17:56 +00:00
dependabot[bot] e03bf94420
chore(deps): Bump tokio from 1.18.2 to 1.19.1 (#4783)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.18.2 to 1.19.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.18.2...tokio-1.19.1)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-06 14:15:12 +00:00
kodiakhq[bot] 412309e7b1
Merge branch 'main' into cn/read-buffer-cache 2022-06-06 12:52:48 +00:00
Carol (Nichols || Goulding) bfd537c853
docs: Remove comments referencing number of test scenarios created
These comments aren't near the code that affects how many scenarios get
created, so they were incorrect and are likely to be incorrect in
different ways in the future.
2022-06-03 16:29:30 -04:00
Carol (Nichols || Goulding) 5c6c086d26
docs: Improve description of ChunkStage in query test scenarios
Namely, that ChunkStage::Parquet probably doesn't correspond to
ParquetChunk; it means the data has been persisted to parquet and the
chunks are now managed by the querier.
2022-06-03 16:29:30 -04:00
Carol (Nichols || Goulding) c6cb594a6d
test: There are no more MUB chunk types, remove that from test helper fn 2022-06-03 16:29:29 -04:00
Carol (Nichols || Goulding) e1061ce623
docs: Don't attempt to list out chunk types exhaustively 2022-06-03 16:29:29 -04:00
Carol (Nichols || Goulding) 63b59f6470
test: Document current possibly-incorrect behavior in the test 2022-06-03 14:33:04 -04:00
Carol (Nichols || Goulding) 7daf680e76
test: Add nonexistent column not equal; this currently fails 2022-06-03 12:51:12 -04:00
Carol (Nichols || Goulding) d3df9db1a6
test: Validate SQL referencing nonexistent column returns an error 2022-06-03 09:16:04 -04:00
Andrew Lamb 3592aa52d8
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0` (#4743)
* chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `15.0.0`

* chore: Update APIs

* chore: Run cargo hakari tasks

* feat: normalize parquet file metadata

* chore: update size tests

* chore: add docs on metadata stripping

* chore: TEMP UPDATE TO DF BRANCH

* chore: Update for new API

* fix: Update to latest DF

* fix: cargo hakari

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
2022-06-03 10:32:26 +00:00
Andrew Lamb 257aaa7e7b
fix: Support `_field != <name>` predicates (#4721)
* fix: Support `_field != <name>` predicates

* fix: update test

* fix: add negative test

* fix: improve comments

* refactor: make `add_include` and `add_exclude` infallible

* chore: add type annotations

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 16:04:53 +00:00
Marco Neumann c91dbe062e
test: "optimize" ingesterrecord batches in query tests (#4700)
* test: "optimize" ingesterrecord batches in query tests

It seems that I had the right idea in #4656 but wasn't able to trigger
https://github.com/influxdata/conductor/issues/955 because the query
tests do not "optimize" the record batches in the same way the actual
gRPC implementation does. If we apply the same transformation we indeed
end up with the same error.

* fix: all batches within the ingester flight response must have same schema

* refactor: simplify and reuse code

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-01 07:37:11 +00:00
Marco Neumann a08a91c5ba
fix: ensure querier cache is refreshed for partition sort key (#4660)
* test: call `maybe_start_logging` in auto-generated cases

* fix: ensure querier cache is refreshed for partition sort key

Fixes #4631.

* docs: explain querier sort key handling and test

* test: test another version of issue 4631

* fix: correctly invalidate partition sort keys

* fix: fix `table_not_found_on_ingester`
2022-05-25 10:44:42 +00:00
dependabot[bot] 76f7043417
chore(deps): Bump once_cell from 1.11.0 to 1.12.0 (#4666)
Bumps [once_cell](https://github.com/matklad/once_cell) from 1.11.0 to 1.12.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.11.0...v1.12.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-24 08:14:03 +00:00
Marco Neumann 47347bef9f
test: add query test scenario w/ missing columns in different chunks (#4656)
* test: do NOT filter out query test scenarios w/ unordered stages in different partitions

It should be possible to have two chunks in different partitions where
both are in the ingester stage or the first one is in the parquet stage
and the 2nd one in the ingester stage.

* test: add query test scenario w/ missing columns in different chunks

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-23 12:13:41 +00:00
dependabot[bot] 6bc0c74c7d
chore(deps): Bump once_cell from 1.10.0 to 1.11.0 (#4646)
* chore(deps): Bump once_cell from 1.10.0 to 1.11.0

Bumps [once_cell](https://github.com/matklad/once_cell) from 1.10.0 to 1.11.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.10.0...v1.11.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-20 07:40:38 +00:00
Dom Dwyer baa86d846f refactor: use ParquetStore instead of ObjectStore
Changes the code paths that interact with Parquet files in the object
store to reference the ParquetStorage directly (DRY refactor).

This change takes us from a dependency graph of:

                    ┌─────────────────┐
                    │                 │
                    ▼                 │
            Parquet Consumer          │
                    │         ┌──────────────┐
                    ├────────▶│ParquetStorage│
                    ▼         └──────────────┘
            ┌──────────────┐
            │ ObjectStore  │
            └──────────────┘
                    │
               ┌────┴────┐
               ▼         ▼
             File       s3
            System    (etc)

to:

                Parquet Consumer
                        │
                        ▼
                ┌──────────────┐
                │ParquetStorage│
                └──────────────┘
                        │
                        ▼
                ┌──────────────┐
                │ ObjectStore  │
                └──────────────┘
                        │
                   ┌────┴────┐
                   ▼         ▼
                 File       s3
                System    (etc)

With the ParquetStorage being solely responsible for managing
interactions with the object store when dealing with Parquet files.
2022-05-19 13:52:51 +01:00
Marco Neumann 770293a973
feat: add LRU cache metrics (#4632)
* refactor: require `Resource`s to be convertible to `u64`

* refactor: require `Resource`s to have a unit name

* refactor: make LRU cache IDs static

* feat: add LRU cache metrics

* docs: improve type names in LRU doctest

* docs: epxlain `MeasuredT`

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* docs: explain `test_metrics`

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-05-19 08:05:17 +00:00
Marco Neumann 52346642a0
ci: fix cargo deny (#4629)
* ci: fix cargo deny

* chore: downgrade `socket2`, version 0.4.5 was yanked

* chore: rename `query` to `iox_query`

`query` is already taken on crates.io and yanked and I am getting tired
of working around that.
2022-05-18 09:38:35 +00:00
Andrew Lamb 3a33e806c7
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `14.0.0` (#4619)
* chore: Update datafusion deps

* chore: update arrow/parquet/arrow flight deps

* chore: Run cargo hakari tasks

* chore: Update location of utils

* chore: Update some more APIs

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-05-17 14:13:03 +00:00
Marco Neumann 779f0e9cdf
feat: querier RAM pool (#4593)
* feat: `SortKey::size`

* feat: `FunctionEstimator`

* feat: querier RAM pool

Let's put all the caches into a single RAM pool, so we can at least
somewhat control RAM usage. Note that this does NOT limit the peak
memory during query execution though, but should at least stop unlimited
cache growth. A follow-up PR will add metrics.

* refactor: improve some size calculations

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-17 13:11:20 +00:00
kodiakhq[bot] 542ec97b66
Merge branch 'main' into cn/rename-no-ng 2022-05-13 13:47:48 +00:00
Andrew Lamb ff8241ea57
fix: Use one (shared) http2 connection per querier and ingester pair (#4583)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-13 11:15:10 +00:00
Nga Tran 4434ec6836
chore: convert some debug to trace to reduce noises (#4589) 2022-05-12 22:40:29 +00:00
Carol (Nichols || Goulding) 55313d290a
fix: Update or remove comments that mention NG or OG
Connects to #4450.
2022-05-12 16:09:08 -04:00
Carol (Nichols || Goulding) 025a775cc6
fix: Rename make_ng_chunk to make_chunk.
Connects to #4450.
2022-05-12 16:09:07 -04:00
Nga Tran 66fe4c54ec
fix: make test output deterministic (#4578)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-12 14:15:51 +00:00
Nga Tran 0d5e3c97f0
chore: dump out the diff for SQL tests (#4571)
* chore: dump out the diff for SQL tests

* refactor: more prety assertion to a specific test to avoid affecting other available tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-11 18:43:36 +00:00
Nga Tran 1913a0150f
test: explain tests for queries whose data come from both parquet and ingester (#4546)
* refactor: remove New from a test scenario setup

* test: add explain for 2 different chunk stages

* test: expplain for several chunks from both parquet and ingester
2022-05-10 13:17:18 +00:00
Nga Tran dec628d48b
refactor: rewrite all structs and functions to remove "new" (#4544)
* refactor: ChunkStageNew to ChunkStage and DeleteTimeNew to DeleteTime

* refactor: PredNew -> Pred

* refactor: ChunkDataNew -> ChunkData

* refactor: new name functions to remove _new
2022-05-09 17:58:49 +00:00
Jake Goulding e07bcd40c2 refactor: Remove unused dependencies
These were found by iterating over all of the dependencies of each
Cargo.toml, then grepping that crate for the dependency's name. If it
didn't show up, I attempted to remove it.

I left a few dependencies that this process flagged:

* generated_types
  - `pbjson`,`serde`. Apparently used by the generated code.

* grpc-router-test-gen
  - `prost`. Apparently used by the generated code.

* influxdb_iox
  - `heappy`. Doesn't appear used, but is behind enough feature
    flags that I don't care to reason about and it's already optional.
  - `tikv_jemalloc_sys`. Appears to be setting a feature flag of an
    indirect dependency.

* iox_gitops_adapter
  - `k8s_openapi`. Appears to be setting a feature flag of an indirect
    dependency.
2022-05-06 15:57:58 -04:00
Carol (Nichols || Goulding) 068096e7e1
fix: Rename data_types2 to data_types 2022-05-06 14:45:39 -04:00
Carol (Nichols || Goulding) 2ef44f2024
fix: Move timestamp types to data_types2 2022-05-06 14:45:38 -04:00
Carol (Nichols || Goulding) 3ab0788a94
fix: Move DeletePredicate types to data_types2 2022-05-06 14:45:37 -04:00
Carol (Nichols || Goulding) 485d6edb8f
refactor: Move IngesterQueryRequest to generated_types 2022-05-06 14:45:37 -04:00
Carol (Nichols || Goulding) 96f0c88b48
fix: Remove db crate from query_tests 2022-05-06 11:30:36 -04:00
Carol (Nichols || Goulding) 7286b4391a
fix: Remove db crate 2022-05-06 11:27:33 -04:00
Andrew Lamb 02893e598c
chore: Update datafusion and upgrade arrow/parquet/arrow-flight to 13 (#4516)
* chore: Tool for automating arrow version update

* chore: Update datafusion and arrow/parquet/arrow-flight

* fix: update for changes in Arrow API

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-05 00:21:02 +00:00
Carol (Nichols || Goulding) e015d3bafb
feat: Remove the server_benchmarks crate (#4506) 2022-05-02 18:20:01 +00:00
Nga Tran 799480d34e
refactor: query_tests - port a few OG to NG tests and remove many more that already ported (#4487)
* refactor: port a few OG to NG tests and remove many more that already ported

* chore: Apply suggestions from code review

* chore: address review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-02 14:44:02 +00:00
Marco Neumann 6eed09a926
test: use "real" ingester in query tests (#4455)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-28 14:39:31 +00:00
dependabot[bot] 420c306caa
chore(deps): Bump tokio from 1.17.0 to 1.18.0 (#4453)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.17.0 to 1.18.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.17.0...tokio-1.18.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-28 08:21:17 +00:00
Andrew Lamb 9e91af4501
refactor: Move IOx UDfs into a Function Registry (1/3) (#4428)
* refactor: Move all UDF implementations to query_function crate

* refactor: Move regex udf to query_functions

* refactor: Move functions out of query

* fix: lints, imports

* chore: Run cargo hakari tasks

* fix: clipy + benches

* fix: reduce borrowing and fix clippy

* fix: moar clippy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-26 17:30:27 +00:00
Marco Neumann 2337935660
test: chunks in ingester stage (#4415)
* refactor: document and improve `MockIngesterConnection`

* refactor: split `OldOneMeasurementFourChunksWithDuplicates` for `EXPLAIN` queries

* fix: mark "IngsterPartition" chunks as unsorted

* fix: "group by" queries may require sorted comparison

* refactor: re-export a few more types from querier

* fix: ensure that test parquet files are de-duped

* test: chunks in ingester stage

* docs: explain test code
2022-04-26 07:55:19 +00:00
dependabot[bot] 4c94e46642
chore(deps): Bump croaring from 0.5.2 to 0.6.0 (#4408)
* chore(deps): Bump croaring from 0.5.2 to 0.6.0

Bumps [croaring](https://github.com/saulius/croaring-rs) from 0.5.2 to 0.6.0.
- [Release notes](https://github.com/saulius/croaring-rs/releases)
- [Commits](https://github.com/saulius/croaring-rs/compare/0.5.2...0.6.0)

---
updated-dependencies:
- dependency-name: croaring
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix: croaring 0.6.0 compat

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
2022-04-25 16:41:08 +00:00
Marco Neumann f444e63960
test: include materialized delete predicates in NG query tests (#4371)
* refactor: move `batch_filter` to `datafusion_util`

* fix: outdated docstring

* feat: allow passing record batches to `iox_tests` parquet files

* test: include materialized delete predicates in NG query tests

* docs: improve wording

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-21 13:00:13 +00:00
Andrew Lamb 73bed810da
chore: Update arrow, arrow-flight, parquet, tonic, prost, etc (#4357)
* chore: Update datafusion

* chore: Update arrow/arrow-flight/parquet to 12

* chore: update datafusion correctly

* chore: Update prost, tonic, and dependents

* fix: Fixup some api changes

* fix: Update test output in db

* fix: Update test output in parquet_file

* fix: remove old pbjson types

* fix: Add "--experimental_allow_proto3_optional" flag

* chore: Run cargo hakari tasks

* fix: compile error

* chore: Update heappy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-20 11:12:17 +00:00
Andrew Lamb f6e6821276
feat: Add basic Querier <--> Ingester "Service Configuration" (#4259)
* feat: Add basic Querier <--> Ingester "Service Configuration"

* docs: update comments in test

* refactor: cleanup tests a little

* refactor: make trait more consistent

* docs: improve comments in IngesterPartition
2022-04-11 11:50:22 +00:00
Andrew Lamb bbbdcc75a8
feat: `QuerierDatabase::chunks` returns `Result` (#4260)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-08 18:54:17 +00:00
Marco Neumann b1af5b3f44
feat: query log system table for querier (#4157)
* feat: query log system table for querier

Closes #4084.

* fix: typo

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* docs: extend

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 15:38:11 +00:00
Marco Neumann 2b76c31157
refactor: make statistics null counts optional (#4160)
Min/max values and distinct counts are already optional, so let's make
the null counts optional as well. This will be helpful for NG to deal w/
partial statistics (e.g. we only populate stats for the time column).

Note that the total count is still mandatory, but we normally have the
chunk/file-level row count at hand.
2022-03-29 17:47:57 +00:00
Marco Neumann 7d947c79d5
refactor: small query tests clean up (#4156)
* refactor: make NG query test generation more flexible

* refactor: rename OG-specfic query tests

* docs: explain chunk stage generation in NG query tests

* fix: typo
2022-03-29 14:00:34 +00:00
Marco Neumann decd018a6a
refactor: remove querier sync loop (#4146)
Namespaces are now created on demand and contain their full schema.
Tombstones/chunks are created on demand during the query.

Closes #4123.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-29 11:47:23 +00:00
Andrew Lamb 58c630d709
chore: Update datafusion (#4133)
* chore: Update datafusion

* fix: typo

* fix: Update explain plan output

* fix: update Cargo.locl

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-25 15:08:39 +00:00
Marco Neumann 9886ff42cc refactor: clean up querier public interface 2022-03-25 11:54:52 +01:00
Andrew Lamb 5c69a3f43b
chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 (#4119)
* chore: update datafusion

* chore(deps): Bump arrow from 10.0.0 to 11.0.0

Bumps [arrow](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0)

---
updated-dependencies:
- dependency-name: arrow
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0

Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0)

---
updated-dependencies:
- dependency-name: arrow-flight
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: update parquet to 11.0.0

* fix: error on create schema, test for same

* fix: upgrade zstd

* chore: Run cargo hakari tasks

* fix: fix logical merge conflict

* fix: hakari

* fix: hakari

* fix: update newly introduced dep

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 15:27:36 +00:00
Marco Neumann 8ca5c337b2
refactor: port more query tests to NG, some code clean up (#4125)
* refactor: inline function that is used once

* refactor: generalize multi-chunk creation for NG

* refactor: `TwoMeasurementsManyFieldsTwoChunks` is OG-specific

* refactor: generalize `OneMeasurementTwoChunksDifferentTagSet`

* refactor: port `OneMeasurementFourChunksWithDuplicates` to NG

* refactor: `TwoMeasurementsManyFieldsLifecycle` is OG-specific

* refactor: simplify NG chunk generation

* refactor: port `ThreeDeleteThreeChunks` to NG

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 15:07:09 +00:00
Marco Neumann cc7f744e8e
test: two-chunk scenarios for NG (#4113)
Add the generic components to create two-chunk scenarios. Includes small
scenario fixes for things like system tables that are not identical
between OG and NG (also see #4111.)

Ref #3934.
2022-03-24 09:50:57 +00:00
Marco Neumann 283d3dad5d
refactor: generalize query test scenarios a bit (#4103)
Some query test scenarios are duplicates and are very OG specific. Let's
use generic scenarios (i.e. the ones that contain all chunk stages
instead of a specific one) where applicable.

For #3934.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 09:30:19 +00:00
kodiakhq[bot] 93485a11ec
Merge branch 'main' into crepererum/issue3934h 2022-03-23 19:10:02 +00:00
Marco Neumann c33ef79375
test: improve query test runner output (#4112)
- prints more previous/expected values when failing (instead of just
  emitting an `Err` which will be debug-printed)
- fixed newline handling (i.e. do not add additional newlines in
  `PrintlnWriter::write`)

Before:

```text
Running scenario 'Two chunks: NG Chunk Parquet; NG Chunk Parquet'

SQL: '"SELECT * from information_schema.tables;"'

thread 'cases::test_cases_sql_information_schema_sql' panicked at 'test failed: ScenarioMismatch { scenario_name: "Two chunks: NG Chunk Parquet; NG Chunk Parquet", previous_results: ["+---------------+--------------------+---------------------+------------+", "| table_catalog | table_schema       | table_name          | table_type |", "+---------------+--------------------+---------------------+------------+", "| public        | information_schema | columns             | VIEW       |", "| public        | information_schema | tables              | VIEW       |", "| public        | iox                | h2o                 | BASE TABLE |", "| public        | iox                | o2                  | BASE TABLE |", "| public        | system             | chunk_columns       | BASE TABLE |", "| public        | system             | chunks              | BASE TABLE |", "| public        | system             | columns             | BASE TABLE |", "| public        | system             | operations          | BASE TABLE |", "| public        | system             | persistence_windows | BASE TABLE |", "| public        | system             | queries             | BASE TABLE |", "+---------------+--------------------+---------------------+------------+"], current_results: ["+---------------+--------------------+------------+------------+", "| table_catalog | table_schema       | table_name | table_type |", "+---------------+--------------------+------------+------------+", "| public        | information_schema | columns    | VIEW       |", "| public        | information_schema | tables     | VIEW       |", "| public        | iox                | h2o        | BASE TABLE |", "| public        | iox                | o2         | BASE TABLE |", "+---------------+--------------------+------------+------------+"] }', query_tests/src/cases.rs:169:10
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
```

After:

```text
Running scenario 'Two chunks: NG Chunk Parquet; NG Chunk Parquet'
SQL: '"SELECT * from information_schema.tables;"'
Answers produced by scenario Two chunks: NG Chunk Parquet; NG Chunk Parquet differ from previous answer

previous:
+---------------+--------------------+---------------------+------------+
| table_catalog | table_schema       | table_name          | table_type |
+---------------+--------------------+---------------------+------------+
| public        | information_schema | columns             | VIEW       |
| public        | information_schema | tables              | VIEW       |
| public        | iox                | h2o                 | BASE TABLE |
| public        | iox                | o2                  | BASE TABLE |
| public        | system             | chunk_columns       | BASE TABLE |
| public        | system             | chunks              | BASE TABLE |
| public        | system             | columns             | BASE TABLE |
| public        | system             | operations          | BASE TABLE |
| public        | system             | persistence_windows | BASE TABLE |
| public        | system             | queries             | BASE TABLE |
+---------------+--------------------+---------------------+------------+

current:
+---------------+--------------------+------------+------------+
| table_catalog | table_schema       | table_name | table_type |
+---------------+--------------------+------------+------------+
| public        | information_schema | columns    | VIEW       |
| public        | information_schema | tables     | VIEW       |
| public        | iox                | h2o        | BASE TABLE |
| public        | iox                | o2         | BASE TABLE |
+---------------+--------------------+------------+------------+

thread 'cases::test_cases_sql_information_schema_sql' panicked at 'test failed: ScenarioMismatch { scenario_name: "Two chunks: NG Chunk Parquet; NG Chunk Parquet", previous_results: ["+---------------+--------------------+---------------------+------------+", "| table_catalog | table_schema       | table_name          | table_type |", "+---------------+--------------------+---------------------+------------+", "| public        | information_schema | columns             | VIEW       |", "| public        | information_schema | tables              | VIEW       |", "| public        | iox                | h2o                 | BASE TABLE |", "| public        | iox                | o2                  | BASE TABLE |", "| public        | system             | chunk_columns       | BASE TABLE |", "| public        | system             | chunks              | BASE TABLE |", "| public        | system             | columns             | BASE TABLE |", "| public        | system             | operations          | BASE TABLE |", "| public        | system             | persistence_windows | BASE TABLE |", "| public        | system             | queries             | BASE TABLE |", "+---------------+--------------------+---------------------+------------+"], current_results: ["+---------------+--------------------+------------+------------+", "| table_catalog | table_schema       | table_name | table_type |", "+---------------+--------------------+------------+------------+", "| public        | information_schema | columns    | VIEW       |", "| public        | information_schema | tables     | VIEW       |", "| public        | iox                | h2o        | BASE TABLE |", "| public        | iox                | o2         | BASE TABLE |", "+---------------+--------------------+------------+------------+"] }', query_tests/src/cases.rs:169:10
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9d1b2106e23b1abd32fce1f17267604a5102f57a/library/std/src/panicking.rs:498:5
```
2022-03-23 18:06:09 +00:00
Marco Neumann 5ae1e2fecf refactor: make query tests less OG-specific 2022-03-23 12:04:32 +01:00
Marco Neumann 89206e013c
test: run SOME query tests for querier (#4098)
This includes some type changes to dispatch between OG and NG and allows
some tests to be run against the NG querier. This only contains parquet
files though, so it's somewhat a limited scope.

For #3934.
2022-03-22 17:39:19 +00:00
Andrew Lamb b83b000590
chore: Update datafusion (#4071)
* chore: update to datafusion 5936edc2a94d5fb20702a41eab2b80695961b9dc

* chore: Update apis to match datafusion changes
2022-03-22 13:17:41 +00:00
Marco Neumann c9908b260c
refactor: dyn-dispatch database in query subsystem (#4083)
* refactor: dyn-dispatch database in query subsystem

This is similar to #4080 but concerns the database itself.

For #3934.

* docs: improve wording

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-22 09:15:52 +00:00
Marco Neumann d1df95df87 refactor: dyn-dispatch chunks in query subsystem
- this is what DataFusion is doing as well; it's also fast enough
  because the number of chunks in a query is not THAT massive (it's not
  like we are doing row-level dyn dispatching)
- it simplifies abstracting over different databases
- it allows us to drop our enum-based dispatching that we have for
  `DbChunk` and that we would also need for the querier (e.g. depending
  on if a chunk is backed by a parquet file or ingester data)
- it likely speeds up compile times because the `query` is no longer
  contains massive amounts of generic code

For #3934.
2022-03-21 12:47:54 +01:00
Marco Neumann ca152e7934 refactor: avoid generics in `QueryDatabase`
A step to make this trait object-safe.

Ref #3934.
2022-03-21 10:45:05 +01:00
Marco Neumann 0071b85c22 refactor: make `ExecutionContextProvider` object-safe
Ref #3934.
2022-03-21 10:40:53 +01:00
Marco Neumann 169fa2fb2f refactor: make `QueryChunk` object-safe
This makes it way easier to dyn-type database implementations. The only
real change is that we make `QueryChunk::Error` opaque. Nobody is going
to inspect that anyways, it's just printed to the user.

This is a follow-up of #4053.

Ref #3934.
2022-03-18 11:40:31 +01:00
Marco Neumann a122b1e2ca
refactor: dyn-typed DB for `query_tests` (#4053)
To test the `db::Db` as well as the `querier` with the same test
framework, they require a shared interface. Ideally this interface is
dynamically typed instead of static dispatched via generics because:

- `query_tests` already take ages to compile
- we often hold a list of scenarios and a single scenario should (in a
  future PR) be able to represent both OG as well as NG

The vision here is that we basically keep the whole test setup but add
new scenarios which are NG-specific later on.

Now the issue w/ many query-related types is that they are NOT
object-safe because methods that don't take `&self` or they have
associated types that we cannot specify in general for OG and NG at the
same time.

So we need a bunch of wrappers that make dynamic dispatch possible. They
mostly call to an internal "interface" crate which is the actual `dyn`
part. The interface is currently only implemented for OG.

The scenarios currently also only contain OG databases. However,
creating a dynamic interface that can be used in all `query_tests` is
already a huge step.

Note that there are two places where we downcast the dynamic/abstract
database to `db::Db` again:

1. To create one scenario based on another and where we need to
   manipulate `db::Db` with OG-specific semantics.
2. `server_benchmarks`. These contain OG databases only and there is no
   point in benchmarking throw the dynamic dispatch interface because
   prod (`influxdb_ioxd`) also uses static dispatch.

Ref #3934.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-18 10:11:17 +00:00
Marco Neumann 0850a93f20
refactor: make `QueryDatabase::chunks` async (#4047)
For OG we can determine the chunks w/o any IO, for NG however this might
require a few catalog queries.

This is likely not the last change of this sort, i.e. the whole schema
handling is currently sync as well.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-17 12:55:25 +00:00
Dom Dwyer 5585dd3c21 refactor: switch to using DynObjectStore
Changes all consumers of the object store to use the dynamically
dispatched DynObjectStore type, instead of using a hardcoded concrete
implementation type.
2022-03-15 16:32:52 +00:00
Dom Dwyer 1d5066c421 refactor: rename ObjectStore -> ObjectStoreImpl
Frees up the name for so we can use `dyn ObjectStore` throughout the
code instead of `ObjectStoreApi`.
2022-03-15 16:29:43 +00:00
Andrew Lamb 2c3d30ca32
chore: Update datafusion, arrow, flight and parquet (#4000)
* chore: Update datafusion, arrow, flight and parquet

* fix: api change

* fix: fmt

* fix: update test metadata size

* fix: Update sizes in parquet test

* fix: more metadata size update
2022-03-10 12:24:47 +00:00
Marco Neumann 77f6153f72
refactor: remove `QueryDatabase::chunk_summaries` (#3977)
- This is not used by the query engine at all.
- The query engine should not care about ALL chunks but only about the
  chunks it gets via `QueryDatabase::chunks` (which includes a table
  name and a predicate).
- All other users of that API are NOT really query-related.
2022-03-08 11:34:26 +00:00
dependabot[bot] 48908054d1
chore(deps): Bump once_cell from 1.9.0 to 1.10.0 (#3955)
* chore(deps): Bump once_cell from 1.9.0 to 1.10.0

Bumps [once_cell](https://github.com/matklad/once_cell) from 1.9.0 to 1.10.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.9.0...v1.10.0)

---
updated-dependencies:
- dependency-name: once_cell
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-03-07 08:54:20 +00:00
Andrew Lamb 7b436d37cd
fix: correctly set nulls in string columns (#3939)
* fix: correctly set nulls in string columns

* fix: test

* test: fix

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-04 20:15:44 +00:00
Andrew Lamb 07fdbe7c6b
fix: apply `_field` restrictions that do not match any fields (#3903)
* fix: apply field restriction correctly

* refactor: Use Vec::retain
2022-03-03 15:04:05 +00:00
Edd Robinson 3d047073b9
feat: add tracing down to the chunk level (#3804)
* refactor: wire exectution context to Deduplicator

* feat: example trace to chunk read_filter

* refactor: make execution context required

* refactor: expose metadata API

* refactor: more span context for chunk read_filter

* refactor: fix build

* refactor: push context into result stream

* refactor: make executor optional
2022-03-02 19:08:22 +00:00
Marco Neumann 33851be3a5
chore: upgrade Rust to 1.59 (#3875)
Mostly a few new clippy crates around `flat_map`, `and_then`, and
"underscore locks" (!!!):
https://rust-lang.github.io/rust-clippy/master/index.html#let_underscore_lock

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-28 15:14:19 +00:00
Raphael Taylor-Davies 2a842fbb1a
feat: correctly sort data and store in catalog metadata (#3864)
* feat: respect sort order in ChunkTableProvider (#3214)

feat: persist sort order in catalog (#3845)

refactor: owned SortKey (#3845)

* fix: size tests

* refactor: immutable SortKey

* test: test sort order restart (#3845)

* chore: explicit None for sort key

* chore: test cleanup

* fix: handling of sort keys containing fields

* chore: remove unused selected_sort_key

* chore: more docs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-25 17:56:27 +00:00
dependabot[bot] 3b7d31c88a
chore(deps): Bump arrow from 9.0.2 to 9.1.0 (#3826)
Bumps [arrow](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0)

---
updated-dependencies:
- dependency-name: arrow
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-23 09:25:46 +00:00
dependabot[bot] ad3868ed7c
chore(deps): Bump tokio from 1.16.1 to 1.17.0 (#3814)
* chore(deps): Bump tokio from 1.16.1 to 1.17.0

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.16.1 to 1.17.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build: update workspace-hack

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 16:27:43 +00:00
Raphael Taylor-Davies 69480eaa5c
feat: remove remaining table-granularity metrics (#3808)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 15:02:57 +00:00
Andrew Lamb 9588b43a90
fix: Make errors in rewriting return `Error` rather than a `panic` (#3767)
* test: add test for predicate errors

* fix: Return errors properly rather than panic

* fix: handle errors in influxrpc planner

* fix: appease clippy

* fix: tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-17 15:39:14 +00:00
Andrew Lamb a30803e692
chore: Update datafusion, update `arrow`/`parquet`/`arrow-flight` to 9.0 (#3733)
* chore: Update datafusion

* chore: Update arrow

* fix: missing updates

* chore: Update cargo.lock

* fix: update for smaller parquet size

* fix: update test for smaller parquet files

* test: ensure parquet_file tests write multiple row groups

* fix: update callsite

* fix: Update for tests

* fix: harkari

* fix: use IoxObjectStore::existing

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-15 12:10:24 +00:00
Andrew Lamb d9f331ba2a
chore: update datafusion, stop repartitioning so aggressively (#3633)
* chore: update datafusion

* fix: Update to use new datafusion api

* chore: update expected plans

* fix: support zero output partitions

* fix: update test

* fix: Update for new DataFusion API

* fix: newly added system table

* fix: update cargo lock
2022-02-09 19:53:41 +00:00
Andrew Lamb 85004831a3
refactor: extract predicate sql tests out of rust (#3683)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-09 11:24:34 +00:00
Andrew Lamb 49edfebdf6
refactor: Complete move of test scenarios into library.rs (#3682)
Co-authored-by: Edd Robinson <me@edd.io>
2022-02-09 10:46:00 +00:00
Raphael Taylor-Davies c18ad4ac97
feat: special case max timestamp range for table_names and field_columns (#3642) 2022-02-08 16:09:36 +00:00
Andrew Lamb 8d7865496d
refactor: remove out of date comments (#3653)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 15:14:34 +00:00
Carol (Nichols || Goulding) 2e30483f1f
refactor: Remove predicate module from predicate crate (#3648)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 14:54:07 +00:00
Andrew Lamb 52b2f2e606
refactor: Move more test scenarios into library.rs (#3639) 2022-02-04 10:53:57 +00:00
Andrew Lamb a5045de02c
fix: remove incorrect comment (#3622)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-03 11:06:29 +00:00
kodiakhq[bot] a2ed6a1b75
Merge branch 'main' into combine-non-overlapping-chunks 2022-02-02 20:47:51 +00:00
Andrew Lamb 766bf9826c
refactor: remove unecessary #allow lints (#3615)
* refactor: remove unecessary #allow lints

* fix: clippy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-02 20:23:08 +00:00
Andrew Lamb c4a234e83c
feat: Allow sql test runner to compare sorted output (#3618)
* refactor: Add Query struct

* feat: Implement sorted checking

* refactor: port some sql tests over

* fix: fmt

* fix: Apply suggestions from code review

Co-authored-by: Edd Robinson <me@edd.io>

Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-02 19:59:52 +00:00
Raphael Taylor-Davies 8a8de19fb5 feat: combine non-overlapping chunks without deletes 2022-02-02 16:40:30 +00:00
Andrew Lamb 55daf95b48
refactor: Consolidate more query_test DbSetups (#3607)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-02 15:16:56 +00:00
Andrew Lamb 5d5310351b
refactor: Move query_test readme to standard location (#3608)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-02 15:06:13 +00:00
Andrew Lamb 62e8b41509
refactor: Start consolidating test scenarios in query_tests (#3580)
* refactor: Start consolidating test scenarios in query_tests

* fix: reference

* fix: fmt
2022-01-28 21:12:57 +00:00
Raphael Taylor-Davies 21c1824a7a
refactor: remove table_names from Predicate (#3545)
* refactor: remove table_names from Predicate

* chore: fix benchmarks

* chore: review feedback

Co-authored-by: Edd Robinson <me@edd.io>

* chore: review feedback

* chore: replace Default::default with InfluxRpcPredicate::default()

Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 14:44:49 +00:00
Andrew Lamb 5488c257d1
chore: Update datafusion, upgrade to arrow/parqet/arrow-flight 8.0.0 (#3517)
* chore: Update datafusion

* chore: update to arrow 8

* fix: update to use new DataFusion APIs

* fix: update case for sortedness

* fix: cargo hakari
2022-01-27 13:33:27 +00:00
Raphael Taylor-Davies 54ae5de9bf
feat: chunk pruning metrics (#3516)
Co-authored-by: Edd Robinson <me@edd.io>
2022-01-25 11:11:50 +00:00
Andrew Lamb 9615feacb3
fix(InfluxQL): Support RegEx with escape sequences not supported by Rust regex (#3502)
* fix(InfluxQL): Translate unsupported meta characters

* fix: remove debugging

* fix: clippy sacrifice

* docs: Add additional background and rationale for rewriting

* fix: doc link
2022-01-21 14:40:10 +00:00
Andrew Lamb 9c19cd6cc4
fix: clamp start/end of TimestampRange to min/max valid timestamp values (#3487)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-20 16:08:00 +00:00
Andrew Lamb f0d50f447a
fix: Special case tag_keys with max timestamp range (#3485)
* fix: Special case tag_keys with max timestamp range

* docs: comment

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-20 14:14:34 +00:00
Andrew Lamb 9b6e626626
chore: Update datafusion (and get fix for influxql test failure) (#3484)
* test: add tests for comparing dictionary arrays

* chore: update datafusion deps

* refactor: Update code for DataFusion API changes

* fix: update test

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-20 14:01:47 +00:00
Marco Neumann 168afb63ad feat: add `size` methods to DML-related types
This will be helpful when we want to batch DML operations in memory
(e.g. when using RSKafka).

This also ensures that `MBChunk` accounts for the column names that
are stored within `MutableBatch`.
2022-01-18 13:52:31 +01:00
Andrew Lamb dd23056efd
chore: update datafusion, arrow, prost, tonic, pbjson, etc (#3455)
* chore: update datafusion, arrow, prost, tonic, etc

* fix: update pprof as well

* chore: update hakari

* fix: update pbjson

* chore: update heappy

* fix: hakari

* fix: workaround https://github.com/influxdata/influxdb_iox/issues/3458

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-13 17:07:15 +00:00