Commit Graph

60 Commits (208c3673ada1f6250a968b580cfc00b01685ab8f)

Author SHA1 Message Date
Edd Robinson f2126c76d3
refactor: address PR feedback (#3906)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-03 11:04:05 +00:00
Marco Neumann bbeba73345
feat: sync partitions in querier (#3900)
* feat: sync partitions in querier

* docs: explain per-table partition grouping
2022-03-03 09:28:44 +00:00
Edd Robinson 3d047073b9
feat: add tracing down to the chunk level (#3804)
* refactor: wire exectution context to Deduplicator

* feat: example trace to chunk read_filter

* refactor: make execution context required

* refactor: expose metadata API

* refactor: more span context for chunk read_filter

* refactor: fix build

* refactor: push context into result stream

* refactor: make executor optional
2022-03-02 19:08:22 +00:00
Andrew Lamb 286d5f7b2b
feat: add `success` column to system.queries (#3891)
* feat: add `success` column to system.queries

* refactor: Remove lifetime from QueryCompletedToken and thread through flight

* test: update test to make incomplete query clearer

* refactor: use better patter to set complete

* fix: logical merge conflict
2022-03-02 15:05:06 +00:00
Marco Neumann af57664f53
feat: wire up flight into the querier (#3889) 2022-03-02 09:20:26 +00:00
Marco Neumann 6e2470bf5f
feat: create CatalogChunk in querier (#3862)
* feat: `NamespaceRepo::get_by_id`

* feat: create `CatalogChunk` in querier
2022-02-28 17:20:38 +00:00
Marco Neumann 33851be3a5
chore: upgrade Rust to 1.59 (#3875)
Mostly a few new clippy crates around `flat_map`, `and_then`, and
"underscore locks" (!!!):
https://rust-lang.github.io/rust-clippy/master/index.html#let_underscore_lock

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-28 15:14:19 +00:00
Raphael Taylor-Davies 2a842fbb1a
feat: correctly sort data and store in catalog metadata (#3864)
* feat: respect sort order in ChunkTableProvider (#3214)

feat: persist sort order in catalog (#3845)

refactor: owned SortKey (#3845)

* fix: size tests

* refactor: immutable SortKey

* test: test sort order restart (#3845)

* chore: explicit None for sort key

* chore: test cleanup

* fix: handling of sort keys containing fields

* chore: remove unused selected_sort_key

* chore: more docs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-25 17:56:27 +00:00
Marco Neumann f966f4c7a4
feat: create `ParquetChunk` in querier (#3857)
Adds a small adapter that is able to produce `ParquetChunk`s for NG.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-25 08:54:16 +00:00
Raphael Taylor-Davies a3a628c783
refactor: use ChunkMetadata in catalog interface (#3845) (#3858)
* refactor: use ChunkMetadata in catalog interface (#3845)

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-24 18:05:07 +00:00
Marco Neumann 49d1be30e7
feat: wire up `ParquetFilePath` for NG (#3853)
It's a bit of a duck-type hack, but if we wanna just `ParquetFileChunk`
in the new architecture, we somehow need it to accept new-gen paths.
Also path handling should be somewhat centralized since
ingester/compactor/querier all need to construct them. So having a
`ParquetFilePath` that supports both path styles seems to be a
not-to-bad solution. This should obviously be cleaned up in some
not-to-distant future.
2022-02-24 16:05:38 +00:00
Marco Neumann d62a052394
feat: extend catalog so we can recover `ParquetChunk`s from it (#3852)
* refactor: less parquet data copying

* feat: `PartitionRepo::get_by_id`

* feat: `TableRepo::get_by_id`

* feat: `ParquetFile::file_size_bytes`

* feat: `ParquetFile::parquet_metadata`

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-24 13:16:15 +00:00
Raphael Taylor-Davies ade6475570
feat: add sort ordinal to chunk_columns system table (#2035) (#3831)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-23 12:51:56 +00:00
dependabot[bot] b63f920d4c
chore(deps): Bump parquet from 9.0.2 to 9.1.0 (#3828)
* chore(deps): Bump parquet from 9.0.2 to 9.1.0

Bumps [parquet](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0)

---
updated-dependencies:
- dependency-name: parquet
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: update chunk size test

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-23 11:25:15 +00:00
dependabot[bot] 3b7d31c88a
chore(deps): Bump arrow from 9.0.2 to 9.1.0 (#3826)
Bumps [arrow](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0)

---
updated-dependencies:
- dependency-name: arrow
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-23 09:25:46 +00:00
dependabot[bot] ad3868ed7c
chore(deps): Bump tokio from 1.16.1 to 1.17.0 (#3814)
* chore(deps): Bump tokio from 1.16.1 to 1.17.0

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.16.1 to 1.17.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build: update workspace-hack

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 16:27:43 +00:00
Raphael Taylor-Davies 9ff44ce90b
feat: sequencer metrics during replay (#3806)
* feat: sequencer metrics during replay

* feat: add replay metrics test

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 16:01:27 +00:00
Raphael Taylor-Davies 69480eaa5c
feat: remove remaining table-granularity metrics (#3808)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 15:02:57 +00:00
Marco Neumann 44ee0166a0
fix: start Kafka write buffer stream at "earliest" offset, not at "0" (#3748) 2022-02-15 13:36:59 +00:00
Andrew Lamb a30803e692
chore: Update datafusion, update `arrow`/`parquet`/`arrow-flight` to 9.0 (#3733)
* chore: Update datafusion

* chore: Update arrow

* fix: missing updates

* chore: Update cargo.lock

* fix: update for smaller parquet size

* fix: update test for smaller parquet files

* test: ensure parquet_file tests write multiple row groups

* fix: update callsite

* fix: Update for tests

* fix: harkari

* fix: use IoxObjectStore::existing

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-15 12:10:24 +00:00
dependabot[bot] 89105ccfab
chore(deps): Bump tokio-util from 0.6.9 to 0.7.0 (#3743)
Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.6.9 to 0.7.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/commits)

---
updated-dependencies:
- dependency-name: tokio-util
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-15 11:33:41 +00:00
Marco Neumann 4db27eec68
fix: defined behaviour when seeking to an unknown sequence number (#3745)
* chore: upgrade rskafka

* refactor: less cloning

* fix: defined behaviour when seeking to an unknown sequence number

The new, defined behavior is: "return an error once and then end the
stream".

Co-authored-by: Edd Robinson <me@edd.io>

Co-authored-by: Edd Robinson <me@edd.io>
2022-02-15 11:08:01 +00:00
Raphael Taylor-Davies b1190262b7
feat: restartable `Database` (#3368) (#3711)
* feat: restartable `Database` (#3368)

* chore: fmt

* fix: wipe_preserved_catalog

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-10 18:32:05 +00:00
Andrew Lamb d9f331ba2a
chore: update datafusion, stop repartitioning so aggressively (#3633)
* chore: update datafusion

* fix: Update to use new datafusion api

* chore: update expected plans

* fix: support zero output partitions

* fix: update test

* fix: Update for new DataFusion API

* fix: newly added system table

* fix: update cargo lock
2022-02-09 19:53:41 +00:00
Raphael Taylor-Davies ca331503a5
feat: add WriteBufferErrorKind (#3664)
* feat: add WriteBufferErrorKind

* fix: test_offset_after_broken_message

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 15:34:05 +00:00
Raphael Taylor-Davies d986c04421
feat: lazy system tables (#3661)
* feat: lazy system tables

* chore: review feedback

* chore: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 13:48:44 +00:00
Raphael Taylor-Davies be662ec731
feat: lazy query log! (#3654)
* feat: lazy query log

* chore: fmt

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 13:07:28 +00:00
Raphael Taylor-Davies 4e0b7a20fa
feat: add timeouts to write chunk (#3662)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 21:07:43 +00:00
Carol (Nichols || Goulding) 2e30483f1f
refactor: Remove predicate module from predicate crate (#3648)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 14:54:07 +00:00
Marco Neumann e2db1df11f
refactor: improve writer buffer consumer interface (#3631)
* refactor: improve writer buffer consumer interface

The change looks huge but is actually rather simple. To
understand the interface change, let me first explain what we want:

- be able to fetch watermarks for any sequencer
- have streams:
  - each streams tracks a sequencer and has an offset state (no read
    multiplexing)
  - we can seek a stream
  - seeking and streaming cannot be done at the same time (that would be
    weird and likely leads to many bugs both in write buffer and in the
    user code)
- ideally we don't need to create streams of all sequencers but can
  choose a subset

Before this change we had one mutable consumer struct where you can get
all streams and watermark functions (this mutable-borrows the consumer)
or you can seek a single stream (this also mutable-borrows the
consumer). This is a bit weird for multiple reasons:

- you cannot seek a single stream without dropping all of them
- the mutable-borrow construct makes it really difficult to pass the
  streams into separate threads
- the consumer is boxed (because its mutable) which makes it more
  difficult to handle in a large-scale application

What this change does is the following:

- you have an immutable consumer (similar to the producer)
- the consumer offers the following methods:
  - get the set of sequencer IDs
  - get watermark for any sequencer
  - get a stream handler (see next point) for any sequencer
- the stream handler captures the stream state (offset) and provides you
  a standard `Stream<_>` interface as well as a seek function.
  Mutable-borrows ensure that you cannot use both at the same time.

The stream handler provides you the stream via `handler.stream()`. It
doesn't implement `Stream<_>` itself because the way boxing, dynamic
dispatch work, and pinning interact (i.e. I couldn't get it to work
without the indirection).

As a bonus point (which we don't use however) you can now create
multiple streams for the same sequencer and they all have their own
offset.

* fix: review comments

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 12:24:17 +00:00
Andrew Lamb a63a617cca
test: Add logging to make `db` tests more debuggable (#3643)
* test: enable logging in db tests

* test: log when check passed

* fix: facepalm

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-04 21:25:25 +00:00
Andrew Lamb 429d59f1b6
feat: Simplify predicates in the `InfluxRpcFrontend` before using them (#3588)
* feat: normalize + simplify RPC predicates before using them

* docs: Update predicate/src/rpc_predicate.rs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-02 19:46:57 +00:00
Marco Neumann 22778a3a80
chore: upgrade rskafka and parking_lot (#3592) 2022-02-01 11:50:42 +00:00
Carol (Nichols || Goulding) 0f72a881ef
refactor: Rename Rust struct parquet_file::IoxMetadata to be IoxMetadataOld 2022-01-31 10:36:33 -05:00
Raphael Taylor-Davies 442d63e65b
feat: catalog timestamp pruning (#3571)
* feat: catalog timestamp pruning

* chore: test
2022-01-28 13:45:13 +00:00
Raphael Taylor-Davies d8685888c8
fix: chunk ordering (#3560)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-28 11:36:08 +00:00
Raphael Taylor-Davies 5efc42494c
feat: add chunk order to chunk columns table (#3556)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 17:14:28 +00:00
Raphael Taylor-Davies d1d45fe818
feat: columnar predicate pruning across `Chunks` (#3553)
* feat: columnar predicate pruning

* fix: doc

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 17:02:46 +00:00
Andrew Lamb 2062267d0f
chore: Update hashbrown (#3551)
* chore: Update hashbrown

* fix: hakari

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 15:34:10 +00:00
Raphael Taylor-Davies 21c1824a7a
refactor: remove table_names from Predicate (#3545)
* refactor: remove table_names from Predicate

* chore: fix benchmarks

* chore: review feedback

Co-authored-by: Edd Robinson <me@edd.io>

* chore: review feedback

* chore: replace Default::default with InfluxRpcPredicate::default()

Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 14:44:49 +00:00
Paul Dix 16d584b2ff
feat: Add db_name/namespace to DmlWrite and DmlDelete (#3531)
* feat: Add db_name/namespace to DmlWrite and DmlDelete

This is required for the new ingester to be able to work with the write buffer. The protobuf that gets serialized over Kafka already includes the database name, it just wasn't getting carried through to the marshaled Dml operation.

* fix: database != namespace, propagation through write buffer

Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 14:12:20 +00:00
Andrew Lamb 5488c257d1
chore: Update datafusion, upgrade to arrow/parqet/arrow-flight 8.0.0 (#3517)
* chore: Update datafusion

* chore: update to arrow 8

* fix: update to use new DataFusion APIs

* fix: update case for sortedness

* fix: cargo hakari
2022-01-27 13:33:27 +00:00
Edd Robinson 0a0b8b2150
feat: decouple read buffer row group size from Datafusion batch size (#3538)
* feat: add chunk builder

* test: test coverage for chunk builder

* refactor: apply suggestions from code review

* refactor: address PR feedback
2022-01-26 12:39:29 +00:00
Nga Tran d559561fd7
refactor: have the deduplicate work without chunk statistics (#3519)
* refactor: have the deduplicate work without chunk statistics

* test: more tests for duplicates data on different combinations of record batches

* refactor: address review comments
2022-01-25 17:00:25 +00:00
Raphael Taylor-Davies 54ae5de9bf
feat: chunk pruning metrics (#3516)
Co-authored-by: Edd Robinson <me@edd.io>
2022-01-25 11:11:50 +00:00
Andrew Lamb 9c19cd6cc4
fix: clamp start/end of TimestampRange to min/max valid timestamp values (#3487)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-20 16:08:00 +00:00
Andrew Lamb 9b6e626626
chore: Update datafusion (and get fix for influxql test failure) (#3484)
* test: add tests for comparing dictionary arrays

* chore: update datafusion deps

* refactor: Update code for DataFusion API changes

* fix: update test

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-20 14:01:47 +00:00
Marco Neumann 168afb63ad feat: add `size` methods to DML-related types
This will be helpful when we want to batch DML operations in memory
(e.g. when using RSKafka).

This also ensures that `MBChunk` accounts for the column names that
are stored within `MutableBatch`.
2022-01-18 13:52:31 +01:00
Andrew Lamb 1843476651
chore: Update datafusion deps (#3471)
* chore: Update datafusion

* refactor: Update to use new Exec plan APIs

* fix: error message

* fix: fixup last bit

* fix: clippy

* fix: doclink

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-17 15:26:19 +00:00
Edd Robinson cdb4f43d62 refactor: address feedback 2022-01-14 10:41:27 +00:00