Raphael Taylor-Davies
a3a628c783
refactor: use ChunkMetadata in catalog interface ( #3845 ) ( #3858 )
...
* refactor: use ChunkMetadata in catalog interface (#3845 )
* chore: review feedback
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-24 18:05:07 +00:00
Marco Neumann
49d1be30e7
feat: wire up `ParquetFilePath` for NG ( #3853 )
...
It's a bit of a duck-type hack, but if we wanna just `ParquetFileChunk`
in the new architecture, we somehow need it to accept new-gen paths.
Also path handling should be somewhat centralized since
ingester/compactor/querier all need to construct them. So having a
`ParquetFilePath` that supports both path styles seems to be a
not-to-bad solution. This should obviously be cleaned up in some
not-to-distant future.
2022-02-24 16:05:38 +00:00
Marco Neumann
d62a052394
feat: extend catalog so we can recover `ParquetChunk`s from it ( #3852 )
...
* refactor: less parquet data copying
* feat: `PartitionRepo::get_by_id`
* feat: `TableRepo::get_by_id`
* feat: `ParquetFile::file_size_bytes`
* feat: `ParquetFile::parquet_metadata`
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-24 13:16:15 +00:00
Raphael Taylor-Davies
ade6475570
feat: add sort ordinal to chunk_columns system table ( #2035 ) ( #3831 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-23 12:51:56 +00:00
dependabot[bot]
b63f920d4c
chore(deps): Bump parquet from 9.0.2 to 9.1.0 ( #3828 )
...
* chore(deps): Bump parquet from 9.0.2 to 9.1.0
Bumps [parquet](https://github.com/apache/arrow-rs ) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0 )
---
updated-dependencies:
- dependency-name: parquet
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore: update chunk size test
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-23 11:25:15 +00:00
dependabot[bot]
3b7d31c88a
chore(deps): Bump arrow from 9.0.2 to 9.1.0 ( #3826 )
...
Bumps [arrow](https://github.com/apache/arrow-rs ) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases )
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0 )
---
updated-dependencies:
- dependency-name: arrow
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-23 09:25:46 +00:00
dependabot[bot]
ad3868ed7c
chore(deps): Bump tokio from 1.16.1 to 1.17.0 ( #3814 )
...
* chore(deps): Bump tokio from 1.16.1 to 1.17.0
Bumps [tokio](https://github.com/tokio-rs/tokio ) from 1.16.1 to 1.17.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases )
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0 )
---
updated-dependencies:
- dependency-name: tokio
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* build: update workspace-hack
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 16:27:43 +00:00
Raphael Taylor-Davies
9ff44ce90b
feat: sequencer metrics during replay ( #3806 )
...
* feat: sequencer metrics during replay
* feat: add replay metrics test
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 16:01:27 +00:00
Raphael Taylor-Davies
69480eaa5c
feat: remove remaining table-granularity metrics ( #3808 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 15:02:57 +00:00
Marco Neumann
44ee0166a0
fix: start Kafka write buffer stream at "earliest" offset, not at "0" ( #3748 )
2022-02-15 13:36:59 +00:00
Andrew Lamb
a30803e692
chore: Update datafusion, update `arrow`/`parquet`/`arrow-flight` to 9.0 ( #3733 )
...
* chore: Update datafusion
* chore: Update arrow
* fix: missing updates
* chore: Update cargo.lock
* fix: update for smaller parquet size
* fix: update test for smaller parquet files
* test: ensure parquet_file tests write multiple row groups
* fix: update callsite
* fix: Update for tests
* fix: harkari
* fix: use IoxObjectStore::existing
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-15 12:10:24 +00:00
dependabot[bot]
89105ccfab
chore(deps): Bump tokio-util from 0.6.9 to 0.7.0 ( #3743 )
...
Bumps [tokio-util](https://github.com/tokio-rs/tokio ) from 0.6.9 to 0.7.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases )
- [Commits](https://github.com/tokio-rs/tokio/commits )
---
updated-dependencies:
- dependency-name: tokio-util
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-15 11:33:41 +00:00
Marco Neumann
4db27eec68
fix: defined behaviour when seeking to an unknown sequence number ( #3745 )
...
* chore: upgrade rskafka
* refactor: less cloning
* fix: defined behaviour when seeking to an unknown sequence number
The new, defined behavior is: "return an error once and then end the
stream".
Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: Edd Robinson <me@edd.io>
2022-02-15 11:08:01 +00:00
Raphael Taylor-Davies
b1190262b7
feat: restartable `Database` ( #3368 ) ( #3711 )
...
* feat: restartable `Database` (#3368 )
* chore: fmt
* fix: wipe_preserved_catalog
* chore: review feedback
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-10 18:32:05 +00:00
Andrew Lamb
d9f331ba2a
chore: update datafusion, stop repartitioning so aggressively ( #3633 )
...
* chore: update datafusion
* fix: Update to use new datafusion api
* chore: update expected plans
* fix: support zero output partitions
* fix: update test
* fix: Update for new DataFusion API
* fix: newly added system table
* fix: update cargo lock
2022-02-09 19:53:41 +00:00
Raphael Taylor-Davies
ca331503a5
feat: add WriteBufferErrorKind ( #3664 )
...
* feat: add WriteBufferErrorKind
* fix: test_offset_after_broken_message
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 15:34:05 +00:00
Raphael Taylor-Davies
d986c04421
feat: lazy system tables ( #3661 )
...
* feat: lazy system tables
* chore: review feedback
* chore: fmt
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 13:48:44 +00:00
Raphael Taylor-Davies
be662ec731
feat: lazy query log! ( #3654 )
...
* feat: lazy query log
* chore: fmt
* chore: review feedback
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 13:07:28 +00:00
Raphael Taylor-Davies
4e0b7a20fa
feat: add timeouts to write chunk ( #3662 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 21:07:43 +00:00
Carol (Nichols || Goulding)
2e30483f1f
refactor: Remove predicate module from predicate crate ( #3648 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 14:54:07 +00:00
Marco Neumann
e2db1df11f
refactor: improve writer buffer consumer interface ( #3631 )
...
* refactor: improve writer buffer consumer interface
The change looks huge but is actually rather simple. To
understand the interface change, let me first explain what we want:
- be able to fetch watermarks for any sequencer
- have streams:
- each streams tracks a sequencer and has an offset state (no read
multiplexing)
- we can seek a stream
- seeking and streaming cannot be done at the same time (that would be
weird and likely leads to many bugs both in write buffer and in the
user code)
- ideally we don't need to create streams of all sequencers but can
choose a subset
Before this change we had one mutable consumer struct where you can get
all streams and watermark functions (this mutable-borrows the consumer)
or you can seek a single stream (this also mutable-borrows the
consumer). This is a bit weird for multiple reasons:
- you cannot seek a single stream without dropping all of them
- the mutable-borrow construct makes it really difficult to pass the
streams into separate threads
- the consumer is boxed (because its mutable) which makes it more
difficult to handle in a large-scale application
What this change does is the following:
- you have an immutable consumer (similar to the producer)
- the consumer offers the following methods:
- get the set of sequencer IDs
- get watermark for any sequencer
- get a stream handler (see next point) for any sequencer
- the stream handler captures the stream state (offset) and provides you
a standard `Stream<_>` interface as well as a seek function.
Mutable-borrows ensure that you cannot use both at the same time.
The stream handler provides you the stream via `handler.stream()`. It
doesn't implement `Stream<_>` itself because the way boxing, dynamic
dispatch work, and pinning interact (i.e. I couldn't get it to work
without the indirection).
As a bonus point (which we don't use however) you can now create
multiple streams for the same sequencer and they all have their own
offset.
* fix: review comments
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 12:24:17 +00:00
Andrew Lamb
a63a617cca
test: Add logging to make `db` tests more debuggable ( #3643 )
...
* test: enable logging in db tests
* test: log when check passed
* fix: facepalm
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-04 21:25:25 +00:00
Andrew Lamb
429d59f1b6
feat: Simplify predicates in the `InfluxRpcFrontend` before using them ( #3588 )
...
* feat: normalize + simplify RPC predicates before using them
* docs: Update predicate/src/rpc_predicate.rs
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-02 19:46:57 +00:00
Marco Neumann
22778a3a80
chore: upgrade rskafka and parking_lot ( #3592 )
2022-02-01 11:50:42 +00:00
Carol (Nichols || Goulding)
0f72a881ef
refactor: Rename Rust struct parquet_file::IoxMetadata to be IoxMetadataOld
2022-01-31 10:36:33 -05:00
Raphael Taylor-Davies
442d63e65b
feat: catalog timestamp pruning ( #3571 )
...
* feat: catalog timestamp pruning
* chore: test
2022-01-28 13:45:13 +00:00
Raphael Taylor-Davies
d8685888c8
fix: chunk ordering ( #3560 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-28 11:36:08 +00:00
Raphael Taylor-Davies
5efc42494c
feat: add chunk order to chunk columns table ( #3556 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 17:14:28 +00:00
Raphael Taylor-Davies
d1d45fe818
feat: columnar predicate pruning across `Chunks` ( #3553 )
...
* feat: columnar predicate pruning
* fix: doc
* chore: review feedback
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 17:02:46 +00:00
Andrew Lamb
2062267d0f
chore: Update hashbrown ( #3551 )
...
* chore: Update hashbrown
* fix: hakari
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 15:34:10 +00:00
Raphael Taylor-Davies
21c1824a7a
refactor: remove table_names from Predicate ( #3545 )
...
* refactor: remove table_names from Predicate
* chore: fix benchmarks
* chore: review feedback
Co-authored-by: Edd Robinson <me@edd.io>
* chore: review feedback
* chore: replace Default::default with InfluxRpcPredicate::default()
Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 14:44:49 +00:00
Paul Dix
16d584b2ff
feat: Add db_name/namespace to DmlWrite and DmlDelete ( #3531 )
...
* feat: Add db_name/namespace to DmlWrite and DmlDelete
This is required for the new ingester to be able to work with the write buffer. The protobuf that gets serialized over Kafka already includes the database name, it just wasn't getting carried through to the marshaled Dml operation.
* fix: database != namespace, propagation through write buffer
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 14:12:20 +00:00
Andrew Lamb
5488c257d1
chore: Update datafusion, upgrade to arrow/parqet/arrow-flight 8.0.0 ( #3517 )
...
* chore: Update datafusion
* chore: update to arrow 8
* fix: update to use new DataFusion APIs
* fix: update case for sortedness
* fix: cargo hakari
2022-01-27 13:33:27 +00:00
Edd Robinson
0a0b8b2150
feat: decouple read buffer row group size from Datafusion batch size ( #3538 )
...
* feat: add chunk builder
* test: test coverage for chunk builder
* refactor: apply suggestions from code review
* refactor: address PR feedback
2022-01-26 12:39:29 +00:00
Nga Tran
d559561fd7
refactor: have the deduplicate work without chunk statistics ( #3519 )
...
* refactor: have the deduplicate work without chunk statistics
* test: more tests for duplicates data on different combinations of record batches
* refactor: address review comments
2022-01-25 17:00:25 +00:00
Raphael Taylor-Davies
54ae5de9bf
feat: chunk pruning metrics ( #3516 )
...
Co-authored-by: Edd Robinson <me@edd.io>
2022-01-25 11:11:50 +00:00
Andrew Lamb
9c19cd6cc4
fix: clamp start/end of TimestampRange to min/max valid timestamp values ( #3487 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-20 16:08:00 +00:00
Andrew Lamb
9b6e626626
chore: Update datafusion (and get fix for influxql test failure) ( #3484 )
...
* test: add tests for comparing dictionary arrays
* chore: update datafusion deps
* refactor: Update code for DataFusion API changes
* fix: update test
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-20 14:01:47 +00:00
Marco Neumann
168afb63ad
feat: add `size` methods to DML-related types
...
This will be helpful when we want to batch DML operations in memory
(e.g. when using RSKafka).
This also ensures that `MBChunk` accounts for the column names that
are stored within `MutableBatch`.
2022-01-18 13:52:31 +01:00
Andrew Lamb
1843476651
chore: Update datafusion deps ( #3471 )
...
* chore: Update datafusion
* refactor: Update to use new Exec plan APIs
* fix: error message
* fix: fixup last bit
* fix: clippy
* fix: doclink
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-17 15:26:19 +00:00
Edd Robinson
cdb4f43d62
refactor: address feedback
2022-01-14 10:41:27 +00:00
Edd Robinson
9283432a0f
refactor: display as ns
2022-01-14 10:26:11 +00:00
Edd Robinson
0b343bcf19
feat: add RAII token to time query completion
2022-01-14 10:26:11 +00:00
Edd Robinson
6a842fc105
feat: add completed duration to system table
2022-01-14 10:26:11 +00:00
Edd Robinson
211bee5886
feat: add support for setting complete time
2022-01-14 10:26:11 +00:00
Andrew Lamb
dd23056efd
chore: update datafusion, arrow, prost, tonic, pbjson, etc ( #3455 )
...
* chore: update datafusion, arrow, prost, tonic, etc
* fix: update pprof as well
* chore: update hakari
* fix: update pbjson
* chore: update heappy
* fix: hakari
* fix: workaround https://github.com/influxdata/influxdb_iox/issues/3458
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-13 17:07:15 +00:00
Marco Neumann
f3f6f335a9
chore: upgrade to snafu 0.7 ( #3440 )
2022-01-11 19:22:36 +00:00
Marco Neumann
37bb7f2120
chore: `cargo update`
...
dependabot currently doesn't work due to
https://github.com/dependabot/dependabot-core/issues/4574
Excluded `quote` due to
https://github.com/dtolnay/quote/issues/204
2022-01-11 14:57:51 +01:00
Andrew Lamb
336ffd1966
refactor: Remove `Result` in QueryDatabase trait (none of the functions can fail) ( #3422 )
...
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-06 22:03:08 +00:00
Andrew Lamb
a93ae739a9
feat: Add table_name to Partition API ( #3421 )
2022-01-06 16:38:39 +00:00