Commit Graph

423 Commits (61917c107f3ef0f6b7db8aa6dfae8fb409069d15)

Author SHA1 Message Date
Jacob Marble 0779b0d9bd
feat: add gRPC listener for new write protocol (#1842)
* feat: add gRPC listener for new write protocol

* chore: clippy happy

* chore: lint

* chore: cargo fmt --all

* chore: cargo clippy

* chore: protobuf-lint

* chore: more formatting

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:15:12 +00:00
Marco Neumann 4204127b05 refactor: use protobuf for in-parquet metadata 2021-06-30 16:51:37 +02:00
Andrew Lamb fef160e24f
feat: Implement data driven query_tests and port explain tests (#1814)
* feat: Implment data driven query testing and port explain tests

* fix: do not fmt the auto generated cases

* refactor: split setup and parser into separate modules

* refactor: Add log to runner, add end to end tests

* docs: fixu cpmments
2021-06-29 16:09:51 +00:00
Andrew Lamb 5cc773ad80
chore: update deps to get new arrow (#1831) 2021-06-28 20:31:31 +00:00
kodiakhq[bot] 1bde983d66
Merge branch 'main' into cn/kafka-write 2021-06-24 12:42:44 +00:00
Marko Mikulicic 69b0bb1510
feat: Implement grpc-router crate
This PR implements the main building block for implementing the gRPC StorageService router.
2021-06-23 17:21:46 +02:00
Carol (Nichols || Goulding) c66f9e5aeb feat: Write entries to Kafka when configured as the write buffer 2021-06-23 10:48:18 -04:00
kodiakhq[bot] 3fd45d987e
Merge branch 'main' into crepererum/fix_server_startup 2021-06-23 07:17:19 +00:00
Andrew Lamb 4c5007f961
fix: Select the correct timestamp for min/max selectors (#1771)
* test: Reproducer showing that the min/max selectors are order dependent

* fix: pick correct timestamp for first/last selectors

* refactor: remove println

* docs: Fixup comments and add to link to arrow-datafusion/issues/600

* fix: Add debug if timestamp is null

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-22 17:53:54 +00:00
Marco Neumann 55c546baff feat: eagerly check object store during CLI `run`
Instead of waiting for the server ID to be set and then mark the server
as errored, directly check the object store on startup. This is
important so that we fail fast when Istio isn't up and running yet.
2021-06-22 18:21:30 +02:00
Carol (Nichols || Goulding) b4644e6108 test: Start of Kafka Write Buffer integration tests 2021-06-21 09:41:35 -04:00
Andrew Lamb ab052c0501
fix: fix flaky test by updating datafusion dep (#1758)
* chore: update DataFusion dependencies

* fix: Re-enable previously flaking tests

This reverts commit c63ad0ea31.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-18 20:17:18 +00:00
kodiakhq[bot] 1d8469951f
Merge branch 'main' into smaller-cache 2021-06-18 18:50:10 +00:00
Marko Mikulicic b612c3af4e
chore: Switch to smaller cache dep 2021-06-18 09:43:28 +02:00
Andrew Lamb ec43a87909
chore: Update itertools deps (#1750) 2021-06-17 17:56:44 +00:00
Andrew Lamb c5eea9af6a
feat: Implement DeduplicateExec (#1733)
* feat: Implement DeduplicateExec

* fix: Doc comments

* fix: fix comment

* fix: Update with arrow ticket references and use datafusion coalsce batches impl

* refactor: rename inner.rs to algo.rs

* docs: Add additional documentation on rationale for last field value

* docs: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* docs: Update query/src/provider/deduplicate/algo.rs

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* docs: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* refactor: do not use pub(crate)

* docs: fix test comments

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-06-17 14:17:52 +00:00
Paul Dix 802ecacf61 feat: Add persistence windows ingestion tracking
This adds a new module, persistence_windows, to the mutable buffer crate. Later PRs will add this into the mutable buffer chunk where it can be used to track when the lifecycle for persistence should be triggered.
2021-06-16 15:28:37 -04:00
Marko Mikulicic 760bcde3f0
feat: Factor out tracing/logging CLI options
This PR factors out the tracing/logging CLI optinos into the `trogging` utility crate,
so that multiple binaries from the IOx suite (such as conductor) can use the same (and quite complex)
logging/tracing configuration options (flags and env vars).

Closes influxdata/conductor#343
2021-06-16 00:54:11 +02:00
Raphael Taylor-Davies 38d17a3093
chore: remove unused query dependency (#1731)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-15 22:06:13 +00:00
Andrew Lamb 2c8060160f
chore: update datafusion deps (#1721)
* chore: update datafusion deps

* chore: update deps

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-15 19:06:15 +00:00
Raphael Taylor-Davies bf54ab51f2
refactor: split lifecycle into separate crate (#1730) 2021-06-15 15:57:47 +00:00
kodiakhq[bot] 4dc338e1bc
Merge branch 'main' into trogging 2021-06-15 13:04:06 +00:00
Marko Mikulicic bde35cf5be
chore: Pull tracing+logging setup in its own crate
1. so that it can be reused by other binaries (e.g. conductor)
2. so that it's faster to build when working on it.
2021-06-15 14:52:04 +02:00
Raphael Taylor-Davies dd422492e2
feat: sort order in schema (#1357) (#1667)
* feat: sort order in schema (#1357)

* chore: review feedback

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-14 18:10:41 +00:00
kodiakhq[bot] fc1b5ea165
Merge branch 'main' into crepererum/parquet_metadata_wrapper 2021-06-14 11:20:39 +00:00
Marco Neumann 518f7c6f15 refactor: wrap upstream parquet MD into struct + clean up interface
This prevents users from `parquet_file::metadata` to also depend on
`parquet` directly. Furthermore they don't need to important dozend of
functions and can instead just use `IoxParquetMetaData` directly.
2021-06-14 13:17:01 +02:00
Andrew Lamb 0d8d32fd8f
chore: Update deps to get latest arrow (#1708)
* chore: Update deps to get latest arrow

* fix: Update to rust 1.52

* fix: clippy
2021-06-14 11:08:09 +00:00
Marco Neumann 898c638630 feat: wire up catalog checkpointing
Closes #1381.
2021-06-14 10:08:32 +02:00
Marko Mikulicic 5a68abaa53
feat: Implement LayeredTracing
__Rationale__

We currently use the `tracing` framework to output to both log outputs (e.g. stdout for k8s) and distributed tracing collectors (e.g. opentelemetry jaeger).

However, due to a limitation in the `tracing` SDK, we can only have one "filter" level that applies
to both logs and tracing outputs. This is unpractical because tracing collectors are designed
to receive high verbosity data (which will be then sampled within the opentelemetry library),
while logs generally are limited to the DEBUG level on production.

This PR adds a `FilteredLayer` tracing subscriber layer, that wraps a subscriber layer with a independent
filter, which can filter events goint to the wrapper subscriber layer more agressively than the global layer.

This will allow us to emit logs at INFO or DEBUG level while passing all events to opentelemetry at TRACE
level (and opentelemetry SDK will then sample the events so that only a small part will be sent to the
ot collector)

__Note__

This PR just implements the `FilteredLayer` and a test. Another PR will integrate this with
our log/tracing setup code.
2021-06-11 04:32:47 +02:00
Raphael Taylor-Davies 11b25b3aaf
refactor: swap order of partition and table in in-memory catalog (#1678)
* refactor: swap order of partition and table in in-memory catalog

* chore: review feedback

* chore: validate panic message

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-10 16:40:30 +00:00
Marco Neumann 876f642860 test: add benchmark for catalog loading 2021-06-10 15:42:21 +02:00
Andrew Lamb 29dadba4f3
chore: update dependencies (#1673)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-10 13:01:31 +00:00
Andrew Lamb a614fef5bc
chore: remove more unused dependencies (#1658)
* chore: remove more unused deps

* refactor: move benchmarks into server_benchmarks crate
2021-06-09 10:17:20 +00:00
Raphael Taylor-Davies 07c4277ca7
refactor: schema merge to give more control over field merging (#1653)
* refactor: schema merge to give more control over field merging

* chore: review feedback
2021-06-09 06:30:45 +00:00
Andrew Lamb e9834a907c
feat: Prune on boolean column predicates too (#1629)
* chore: update deps to get latest DataFusion

* fix: enable boolean pruning tests

* fix: update explain plan tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-07 16:51:30 +00:00
kodiakhq[bot] 87297f7db4
Merge branch 'main' into cn/delete 2021-06-07 13:32:42 +00:00
Raphael Taylor-Davies 5749a2c119
chore: cleanup legacy TSM -> parquet code (#1639)
* chore: cleanup legacy parquet code

* chore: remove tests of removed functionality

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-07 12:59:33 +00:00
Carol (Nichols || Goulding) f4a9a5ae56 fix: Remove write buffer 2021-06-04 14:40:17 -04:00
Andrew Lamb 42f26b609b
refactor: Move `query_tests` and `server_benchmarks` into their own crate --> smaller `server` (#1628)
* refactor: Separate query_tests into its own crate

* fix: references

* refactor: break out server benchmarks

* fix: Update query_tests/src/lib.rs

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-06-04 17:31:19 +00:00
Marco Neumann bbd73e59be feat: jitter background clean-up job + wait on first job 2021-06-03 11:23:29 +02:00
Marco Neumann 0a625b50e6 feat: store transaction timestamp in preserved catalog 2021-06-02 09:41:19 +02:00
Andrew Lamb 83b2eacea6
chore: update deps for datafusion (#1601) 2021-06-01 20:00:39 +00:00
Andrew Lamb c0e4e6951a
chore: update datafusion (#1594) 2021-06-01 15:55:07 +00:00
Andrew Lamb d3711a5591
refactor: Use ParquetExec from DataFusion to read parquet files (#1580)
* refactor: use ParquetExec to read parquet files

* fix: test

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-01 14:44:07 +00:00
Andrew Lamb 3338ddcca9
chore: update dependencies (#1593) 2021-06-01 14:05:56 +00:00
Andrew Lamb 73cedd2f88
chore: remove unused dependency (#1587)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-31 14:22:11 +00:00
Andrew Lamb d50c7c8919
chore: remove unused dependency (#1581) 2021-05-31 09:58:10 +00:00
Andrew Lamb 00e735ef0d
chore: remove unused dependencies (#1583) 2021-05-29 10:31:57 +00:00
Raphael Taylor-Davies d8f19348bf
feat: per-column dictionaries in MUB (#1570)
* feat: per-column dictionaries in MUB

* chore: fmt

* refactor: remove chunk-level dictionary

* chore: remove redundant sort

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-28 13:51:56 +00:00
kodiakhq[bot] 166851d952
Merge branch 'main' into crepererum/in_file_metadata 2021-05-26 07:39:53 +00:00