Andrew Lamb
5c69a3f43b
chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 ( #4119 )
...
* chore: update datafusion
* chore(deps): Bump arrow from 10.0.0 to 11.0.0
Bumps [arrow](https://github.com/apache/arrow-rs ) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0 )
---
updated-dependencies:
- dependency-name: arrow
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0
Bumps [arrow-flight](https://github.com/apache/arrow-rs ) from 10.0.0 to 11.0.0.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0 )
---
updated-dependencies:
- dependency-name: arrow-flight
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore: update parquet to 11.0.0
* fix: error on create schema, test for same
* fix: upgrade zstd
* chore: Run cargo hakari tasks
* fix: fix logical merge conflict
* fix: hakari
* fix: hakari
* fix: update newly introduced dep
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-24 15:27:36 +00:00
Andrew Lamb
b83b000590
chore: Update datafusion ( #4071 )
...
* chore: update to datafusion 5936edc2a94d5fb20702a41eab2b80695961b9dc
* chore: Update apis to match datafusion changes
2022-03-22 13:17:41 +00:00
Marco Neumann
c9908b260c
refactor: dyn-dispatch database in query subsystem ( #4083 )
...
* refactor: dyn-dispatch database in query subsystem
This is similar to #4080 but concerns the database itself.
For #3934 .
* docs: improve wording
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-22 09:15:52 +00:00
Marco Neumann
d1df95df87
refactor: dyn-dispatch chunks in query subsystem
...
- this is what DataFusion is doing as well; it's also fast enough
because the number of chunks in a query is not THAT massive (it's not
like we are doing row-level dyn dispatching)
- it simplifies abstracting over different databases
- it allows us to drop our enum-based dispatching that we have for
`DbChunk` and that we would also need for the querier (e.g. depending
on if a chunk is backed by a parquet file or ingester data)
- it likely speeds up compile times because the `query` is no longer
contains massive amounts of generic code
For #3934 .
2022-03-21 12:47:54 +01:00
Marco Neumann
ca152e7934
refactor: avoid generics in `QueryDatabase`
...
A step to make this trait object-safe.
Ref #3934 .
2022-03-21 10:45:05 +01:00
Marco Neumann
0071b85c22
refactor: make `ExecutionContextProvider` object-safe
...
Ref #3934 .
2022-03-21 10:40:53 +01:00
kodiakhq[bot]
67939fb37d
Merge branch 'main' into crepererum/issue3934b
2022-03-19 06:56:30 +00:00
Dom Dwyer
1b26bf1d78
refactor: reduce INFO verbosity
...
We're currently emitting ~5GB of logs every 30 minutes, and a quick scan
through the logs shows the lines this PR changes to be the most frequent
(multiple times per second).
I don't believe any of these are important enough to be INFO, but if
needed, an appropriate log filter will bring them back.
2022-03-18 15:57:55 +00:00
Marco Neumann
169fa2fb2f
refactor: make `QueryChunk` object-safe
...
This makes it way easier to dyn-type database implementations. The only
real change is that we make `QueryChunk::Error` opaque. Nobody is going
to inspect that anyways, it's just printed to the user.
This is a follow-up of #4053 .
Ref #3934 .
2022-03-18 11:40:31 +01:00
Marco Neumann
0850a93f20
refactor: make `QueryDatabase::chunks` async ( #4047 )
...
For OG we can determine the chunks w/o any IO, for NG however this might
require a few catalog queries.
This is likely not the last change of this sort, i.e. the whole schema
handling is currently sync as well.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-17 12:55:25 +00:00
Dom Dwyer
5585dd3c21
refactor: switch to using DynObjectStore
...
Changes all consumers of the object store to use the dynamically
dispatched DynObjectStore type, instead of using a hardcoded concrete
implementation type.
2022-03-15 16:32:52 +00:00
Dom Dwyer
1d5066c421
refactor: rename ObjectStore -> ObjectStoreImpl
...
Frees up the name for so we can use `dyn ObjectStore` throughout the
code instead of `ObjectStoreApi`.
2022-03-15 16:29:43 +00:00
Andrew Lamb
2c3d30ca32
chore: Update datafusion, arrow, flight and parquet ( #4000 )
...
* chore: Update datafusion, arrow, flight and parquet
* fix: api change
* fix: fmt
* fix: update test metadata size
* fix: Update sizes in parquet test
* fix: more metadata size update
2022-03-10 12:24:47 +00:00
Marco Neumann
77f6153f72
refactor: remove `QueryDatabase::chunk_summaries` ( #3977 )
...
- This is not used by the query engine at all.
- The query engine should not care about ALL chunks but only about the
chunks it gets via `QueryDatabase::chunks` (which includes a table
name and a predicate).
- All other users of that API are NOT really query-related.
2022-03-08 11:34:26 +00:00
Marco Neumann
5cc1c697fc
refactor: remove `QueryDatabase::partition_addr` ( #3976 )
...
- This was not actually used by the query engine.
- The query engine doesn't have a concept of a "partition", it only
cares about chunks.
- Unbound access to all partitions in the database is quite expensive
(esp. on NG).
2022-03-08 11:17:31 +00:00
Marco Neumann
db3f1e8db7
feat: wire up tombstones into querier ( #3962 )
...
* feat: `TombstoneRepo::list_by_namespace`
* test: model sequencer properly
* feat: wire up tombstones into querier
Closes #3932 .
* refactor: `override_delete_predicates` => `set_delete_predicates`
2022-03-08 10:06:22 +00:00
Raphael Taylor-Davies
688c1de5af
feat: add distinct count to chunk summary table ( #3966 )
...
* feat: add distinct count to chunk summary table
* chore: fmt
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-07 18:16:08 +00:00
dependabot[bot]
48908054d1
chore(deps): Bump once_cell from 1.9.0 to 1.10.0 ( #3955 )
...
* chore(deps): Bump once_cell from 1.9.0 to 1.10.0
Bumps [once_cell](https://github.com/matklad/once_cell ) from 1.9.0 to 1.10.0.
- [Release notes](https://github.com/matklad/once_cell/releases )
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md )
- [Commits](https://github.com/matklad/once_cell/compare/v1.9.0...v1.10.0 )
---
updated-dependencies:
- dependency-name: once_cell
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore: Run cargo hakari tasks
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-03-07 08:54:20 +00:00
Raphael Taylor-Davies
e304613546
feat: include trace ID in query log ( #3912 ) ( #3923 )
...
* feat: include trace ID in query log (#3912 )
* chore: fmt
* chore: lint
2022-03-03 17:50:49 +00:00
Andrew Lamb
07fdbe7c6b
fix: apply `_field` restrictions that do not match any fields ( #3903 )
...
* fix: apply field restriction correctly
* refactor: Use Vec::retain
2022-03-03 15:04:05 +00:00
Edd Robinson
de7c46c9bb
feat: add read_window_aggregate tracing
2022-03-03 14:30:27 +00:00
Edd Robinson
787a848bf5
feat: add tracing for tag_values
2022-03-03 14:27:01 +00:00
Edd Robinson
6a6fbf73ae
feat: add tracing support tag_keys
2022-03-03 14:27:01 +00:00
Edd Robinson
301ae886ce
feat: add tracing down to the chunk level ( #3804 )
...
* refactor: wire exectution context to Deduplicator
* feat: example trace to chunk read_filter
* refactor: make execution context required
* refactor: expose metadata API
* refactor: more span context for chunk read_filter
* refactor: fix build
* refactor: push context into result stream
* refactor: make executor optional
2022-03-03 14:27:00 +00:00
Edd Robinson
f2126c76d3
refactor: address PR feedback ( #3906 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-03 11:04:05 +00:00
Marco Neumann
bbeba73345
feat: sync partitions in querier ( #3900 )
...
* feat: sync partitions in querier
* docs: explain per-table partition grouping
2022-03-03 09:28:44 +00:00
Edd Robinson
3d047073b9
feat: add tracing down to the chunk level ( #3804 )
...
* refactor: wire exectution context to Deduplicator
* feat: example trace to chunk read_filter
* refactor: make execution context required
* refactor: expose metadata API
* refactor: more span context for chunk read_filter
* refactor: fix build
* refactor: push context into result stream
* refactor: make executor optional
2022-03-02 19:08:22 +00:00
Andrew Lamb
286d5f7b2b
feat: add `success` column to system.queries ( #3891 )
...
* feat: add `success` column to system.queries
* refactor: Remove lifetime from QueryCompletedToken and thread through flight
* test: update test to make incomplete query clearer
* refactor: use better patter to set complete
* fix: logical merge conflict
2022-03-02 15:05:06 +00:00
Marco Neumann
af57664f53
feat: wire up flight into the querier ( #3889 )
2022-03-02 09:20:26 +00:00
Marco Neumann
6e2470bf5f
feat: create CatalogChunk in querier ( #3862 )
...
* feat: `NamespaceRepo::get_by_id`
* feat: create `CatalogChunk` in querier
2022-02-28 17:20:38 +00:00
Marco Neumann
33851be3a5
chore: upgrade Rust to 1.59 ( #3875 )
...
Mostly a few new clippy crates around `flat_map`, `and_then`, and
"underscore locks" (!!!):
https://rust-lang.github.io/rust-clippy/master/index.html#let_underscore_lock
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-28 15:14:19 +00:00
Raphael Taylor-Davies
2a842fbb1a
feat: correctly sort data and store in catalog metadata ( #3864 )
...
* feat: respect sort order in ChunkTableProvider (#3214 )
feat: persist sort order in catalog (#3845 )
refactor: owned SortKey (#3845 )
* fix: size tests
* refactor: immutable SortKey
* test: test sort order restart (#3845 )
* chore: explicit None for sort key
* chore: test cleanup
* fix: handling of sort keys containing fields
* chore: remove unused selected_sort_key
* chore: more docs
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-25 17:56:27 +00:00
Marco Neumann
f966f4c7a4
feat: create `ParquetChunk` in querier ( #3857 )
...
Adds a small adapter that is able to produce `ParquetChunk`s for NG.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-25 08:54:16 +00:00
Raphael Taylor-Davies
a3a628c783
refactor: use ChunkMetadata in catalog interface ( #3845 ) ( #3858 )
...
* refactor: use ChunkMetadata in catalog interface (#3845 )
* chore: review feedback
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-24 18:05:07 +00:00
Marco Neumann
49d1be30e7
feat: wire up `ParquetFilePath` for NG ( #3853 )
...
It's a bit of a duck-type hack, but if we wanna just `ParquetFileChunk`
in the new architecture, we somehow need it to accept new-gen paths.
Also path handling should be somewhat centralized since
ingester/compactor/querier all need to construct them. So having a
`ParquetFilePath` that supports both path styles seems to be a
not-to-bad solution. This should obviously be cleaned up in some
not-to-distant future.
2022-02-24 16:05:38 +00:00
Marco Neumann
d62a052394
feat: extend catalog so we can recover `ParquetChunk`s from it ( #3852 )
...
* refactor: less parquet data copying
* feat: `PartitionRepo::get_by_id`
* feat: `TableRepo::get_by_id`
* feat: `ParquetFile::file_size_bytes`
* feat: `ParquetFile::parquet_metadata`
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-24 13:16:15 +00:00
Raphael Taylor-Davies
ade6475570
feat: add sort ordinal to chunk_columns system table ( #2035 ) ( #3831 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-23 12:51:56 +00:00
dependabot[bot]
b63f920d4c
chore(deps): Bump parquet from 9.0.2 to 9.1.0 ( #3828 )
...
* chore(deps): Bump parquet from 9.0.2 to 9.1.0
Bumps [parquet](https://github.com/apache/arrow-rs ) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0 )
---
updated-dependencies:
- dependency-name: parquet
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore: update chunk size test
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-23 11:25:15 +00:00
dependabot[bot]
3b7d31c88a
chore(deps): Bump arrow from 9.0.2 to 9.1.0 ( #3826 )
...
Bumps [arrow](https://github.com/apache/arrow-rs ) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0 )
---
updated-dependencies:
- dependency-name: arrow
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-23 09:25:46 +00:00
dependabot[bot]
ad3868ed7c
chore(deps): Bump tokio from 1.16.1 to 1.17.0 ( #3814 )
...
* chore(deps): Bump tokio from 1.16.1 to 1.17.0
Bumps [tokio](https://github.com/tokio-rs/tokio ) from 1.16.1 to 1.17.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases )
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0 )
---
updated-dependencies:
- dependency-name: tokio
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* build: update workspace-hack
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 16:27:43 +00:00
Raphael Taylor-Davies
9ff44ce90b
feat: sequencer metrics during replay ( #3806 )
...
* feat: sequencer metrics during replay
* feat: add replay metrics test
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 16:01:27 +00:00
Raphael Taylor-Davies
69480eaa5c
feat: remove remaining table-granularity metrics ( #3808 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 15:02:57 +00:00
Marco Neumann
44ee0166a0
fix: start Kafka write buffer stream at "earliest" offset, not at "0" ( #3748 )
2022-02-15 13:36:59 +00:00
Andrew Lamb
a30803e692
chore: Update datafusion, update `arrow`/`parquet`/`arrow-flight` to 9.0 ( #3733 )
...
* chore: Update datafusion
* chore: Update arrow
* fix: missing updates
* chore: Update cargo.lock
* fix: update for smaller parquet size
* fix: update test for smaller parquet files
* test: ensure parquet_file tests write multiple row groups
* fix: update callsite
* fix: Update for tests
* fix: harkari
* fix: use IoxObjectStore::existing
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-15 12:10:24 +00:00
dependabot[bot]
89105ccfab
chore(deps): Bump tokio-util from 0.6.9 to 0.7.0 ( #3743 )
...
Bumps [tokio-util](https://github.com/tokio-rs/tokio ) from 0.6.9 to 0.7.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases )
- [Commits](https://github.com/tokio-rs/tokio/commits )
---
updated-dependencies:
- dependency-name: tokio-util
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-15 11:33:41 +00:00
Marco Neumann
4db27eec68
fix: defined behaviour when seeking to an unknown sequence number ( #3745 )
...
* chore: upgrade rskafka
* refactor: less cloning
* fix: defined behaviour when seeking to an unknown sequence number
The new, defined behavior is: "return an error once and then end the
stream".
Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: Edd Robinson <me@edd.io>
2022-02-15 11:08:01 +00:00
Raphael Taylor-Davies
b1190262b7
feat: restartable `Database` ( #3368 ) ( #3711 )
...
* feat: restartable `Database` (#3368 )
* chore: fmt
* fix: wipe_preserved_catalog
* chore: review feedback
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-10 18:32:05 +00:00
Andrew Lamb
d9f331ba2a
chore: update datafusion, stop repartitioning so aggressively ( #3633 )
...
* chore: update datafusion
* fix: Update to use new datafusion api
* chore: update expected plans
* fix: support zero output partitions
* fix: update test
* fix: Update for new DataFusion API
* fix: newly added system table
* fix: update cargo lock
2022-02-09 19:53:41 +00:00
Raphael Taylor-Davies
ca331503a5
feat: add WriteBufferErrorKind ( #3664 )
...
* feat: add WriteBufferErrorKind
* fix: test_offset_after_broken_message
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 15:34:05 +00:00
Raphael Taylor-Davies
d986c04421
feat: lazy system tables ( #3661 )
...
* feat: lazy system tables
* chore: review feedback
* chore: fmt
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 13:48:44 +00:00