Commit Graph

443 Commits (dd93b2cdecb13f941c84056cb29672a0f85f79d6)

Author SHA1 Message Date
Andrew Lamb 4da8a16c18
chore: update to arrow 5.0 and master datafusion (#2049)
* chore: update to arrow 5.0 and master datafusion

* fix: Update test for change in object size
2021-07-19 12:49:51 +00:00
Marko Mikulicic 06399e88e0
chore: Add some debug logs to write buffer 2021-07-15 22:18:03 +02:00
Andrew Lamb 74b8bb76e6
chore: Update to correct pre-release version of DF (#2023)
Co-authored-by: Edd Robinson <me@edd.io>
2021-07-15 18:13:42 +00:00
Marco Neumann b5428e53a5 refactor: write buffer testing + better mocking
This refactors the write buffer a bit for:

- **Testing:** Add generic tests for the Kafka and the mocking
  implementation. The same interface can be used easily add new
  implementations (e.g. via Redis, filesystem, ...).
- **Partition on Write:** The caller of the writer operation must now
  specify the partition/sequencer ID. The implicit partitioning of the
  Kafka writer would have lead to broken data since we must never spill
  entries w/ the same primary key over multiple partitions. At the
  moment we will only use partition 0 but we can easily implement
  better logic in the future.
- **Improved Mocking:** The mocked implementation now simulates a system
  that feels more real. Especially the handling around multiple streams
  and "write while read" has been improved. This will be helpful for
  testing and for new features like seeking (during replay). A solid
  realistic mock also helps us to ensure that the tests using the mock
  do not rely on unrealistic behavior too much.
2021-07-15 17:20:45 +02:00
Raphael Taylor-Davies d71f38f27c
feat: compute PartitionCheckpoint from PersistenceWindows (#2011)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-15 12:17:23 +00:00
Raphael Taylor-Davies cbeeb97cff
feat: flush open window on persist (#1985)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-14 16:58:20 +00:00
Edd Robinson 0e5276ed20
Merge branch 'main' into alamb/go_go_go_go 2021-07-14 13:56:35 +01:00
Marco Neumann 9cb9ae0874 chore: move write buffer into its own crate 2021-07-14 14:09:18 +02:00
Andrew Lamb 4800b36949 chore: Update IOx to a pre-release version of arrow and datafusion to test out performance improvement 2021-07-13 15:44:57 -04:00
Andrew Lamb ef3269ee1d
chore: Update datafusion deps (#1984)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-13 17:29:29 +00:00
Marco Neumann 157a0cc98c chore: update flatbuffers to 2.0 2021-07-13 15:44:45 +02:00
Marco Neumann f5c63b2ae6 chore: update nom to version 6
Triggered some changes from `Fn` to `FnMut`.
2021-07-13 15:28:42 +02:00
Marco Neumann f11be17523 chore: update mockito to 0.30 2021-07-13 15:17:28 +02:00
Marco Neumann 2e391deb34 chore: update croaring to 0.5.0
Upstreame changelog:

- CRoaring updated to 0.3.1
- `-march=native` is not a default for croaring-sys anymore
- Impl Default for `Bitmap` and `Treemap`
2021-07-13 15:15:41 +02:00
Marco Neumann 5c16bd2085 chore: cargo update 2021-07-13 15:13:37 +02:00
Andrew Lamb 8b9b369189
chore: Update DataFusion again (#1930)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-08 15:14:45 +00:00
Andrew Lamb 7602bde850
chore: Update datafusion deps (#1799)
* chore: Update datafusion deps + rework code

* refactor: remove workaround as it has been contributed upstream

* fix: Update query/src/exec/split.rs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-08 10:58:32 +00:00
Marco Neumann d6cff911b6 test: ensure that query tests don't rebuild all the time
Beforehand:

```text
❯ env CARGO_LOG=cargo::core::compiler::fingerprint=info cargo test -p query_tests
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint] stale: changed "/home/mneumann/src/influxdb_iox/query_tests/cases"
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint]           (vs) "/home/mneumann/src/influxdb_iox/target/debug/build/query_tests-0e8f741dfb84437f/output"
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint]                FileTime { seconds: 1625474716, nanos: 436081357 } != FileTime { seconds: 1625474752, nanos: 52625167 }
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint] fingerprint error for query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)/Test/TargetInner { ..: lib_target("query_tests", ["lib"], "/home/mneumann/src/influxdb_iox/query_tests/src/lib.rs", Edition2018) }
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint]     err: current filesystem status shows we're outdated
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint] fingerprint error for query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)/RunCustomBuild/TargetInner { ..: custom_build_target("build-script-build", "/home/mneumann/src/influxdb_iox/query_tests/build.rs", Edition2018) }
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint]     err: current filesystem status shows we're outdated
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint] fingerprint error for query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)/Build/TargetInner { ..: lib_target("query_tests", ["lib"], "/home/mneumann/src/influxdb_iox/query_tests/src/lib.rs", Edition2018) }
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint]     err: current filesystem status shows we're outdated
   Compiling query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)
```

The issue is that both the input and the test output files are located
under `cases/`. `build.rs` used `cargo:rerun-if-changed=cases` which per
Cargo doc will scan ALL files in that directory. Note that the normal
`exclude` directive in `Cargo.toml` does NOT work, see
https://github.com/rust-lang/cargo/issues/4587 .

So we need to split input and output files into separate directories
(`cases/{in,out}`).
2021-07-05 15:30:10 +02:00
Marco Neumann 4ca2d3e148 chore: move persistence windows related code into own crate
The entire persistence windows data structures (including the
checkpoints) have nothing to do with the mutable buffer per se. So lets
move them into their own crate. This also makes `parquet_file` not
longer depend on `mutable_buffer`.
2021-07-05 10:23:58 +02:00
Marco Neumann cdab1bed05 feat: persist part+db checkpoint in parquets and catalog
This will be required for replay on server startup.
2021-07-05 09:42:46 +02:00
Jacob Marble 0779b0d9bd
feat: add gRPC listener for new write protocol (#1842)
* feat: add gRPC listener for new write protocol

* chore: clippy happy

* chore: lint

* chore: cargo fmt --all

* chore: cargo clippy

* chore: protobuf-lint

* chore: more formatting

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:15:12 +00:00
Marco Neumann 4204127b05 refactor: use protobuf for in-parquet metadata 2021-06-30 16:51:37 +02:00
Andrew Lamb fef160e24f
feat: Implement data driven query_tests and port explain tests (#1814)
* feat: Implment data driven query testing and port explain tests

* fix: do not fmt the auto generated cases

* refactor: split setup and parser into separate modules

* refactor: Add log to runner, add end to end tests

* docs: fixu cpmments
2021-06-29 16:09:51 +00:00
Andrew Lamb 5cc773ad80
chore: update deps to get new arrow (#1831) 2021-06-28 20:31:31 +00:00
kodiakhq[bot] 1bde983d66
Merge branch 'main' into cn/kafka-write 2021-06-24 12:42:44 +00:00
Marko Mikulicic 69b0bb1510
feat: Implement grpc-router crate
This PR implements the main building block for implementing the gRPC StorageService router.
2021-06-23 17:21:46 +02:00
Carol (Nichols || Goulding) c66f9e5aeb feat: Write entries to Kafka when configured as the write buffer 2021-06-23 10:48:18 -04:00
kodiakhq[bot] 3fd45d987e
Merge branch 'main' into crepererum/fix_server_startup 2021-06-23 07:17:19 +00:00
Andrew Lamb 4c5007f961
fix: Select the correct timestamp for min/max selectors (#1771)
* test: Reproducer showing that the min/max selectors are order dependent

* fix: pick correct timestamp for first/last selectors

* refactor: remove println

* docs: Fixup comments and add to link to arrow-datafusion/issues/600

* fix: Add debug if timestamp is null

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-22 17:53:54 +00:00
Marco Neumann 55c546baff feat: eagerly check object store during CLI `run`
Instead of waiting for the server ID to be set and then mark the server
as errored, directly check the object store on startup. This is
important so that we fail fast when Istio isn't up and running yet.
2021-06-22 18:21:30 +02:00
Carol (Nichols || Goulding) b4644e6108 test: Start of Kafka Write Buffer integration tests 2021-06-21 09:41:35 -04:00
Andrew Lamb ab052c0501
fix: fix flaky test by updating datafusion dep (#1758)
* chore: update DataFusion dependencies

* fix: Re-enable previously flaking tests

This reverts commit c63ad0ea31.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-18 20:17:18 +00:00
kodiakhq[bot] 1d8469951f
Merge branch 'main' into smaller-cache 2021-06-18 18:50:10 +00:00
Marko Mikulicic b612c3af4e
chore: Switch to smaller cache dep 2021-06-18 09:43:28 +02:00
Andrew Lamb ec43a87909
chore: Update itertools deps (#1750) 2021-06-17 17:56:44 +00:00
Andrew Lamb c5eea9af6a
feat: Implement DeduplicateExec (#1733)
* feat: Implement DeduplicateExec

* fix: Doc comments

* fix: fix comment

* fix: Update with arrow ticket references and use datafusion coalsce batches impl

* refactor: rename inner.rs to algo.rs

* docs: Add additional documentation on rationale for last field value

* docs: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* docs: Update query/src/provider/deduplicate/algo.rs

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* docs: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* refactor: do not use pub(crate)

* docs: fix test comments

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-06-17 14:17:52 +00:00
Paul Dix 802ecacf61 feat: Add persistence windows ingestion tracking
This adds a new module, persistence_windows, to the mutable buffer crate. Later PRs will add this into the mutable buffer chunk where it can be used to track when the lifecycle for persistence should be triggered.
2021-06-16 15:28:37 -04:00
Marko Mikulicic 760bcde3f0
feat: Factor out tracing/logging CLI options
This PR factors out the tracing/logging CLI optinos into the `trogging` utility crate,
so that multiple binaries from the IOx suite (such as conductor) can use the same (and quite complex)
logging/tracing configuration options (flags and env vars).

Closes influxdata/conductor#343
2021-06-16 00:54:11 +02:00
Raphael Taylor-Davies 38d17a3093
chore: remove unused query dependency (#1731)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-15 22:06:13 +00:00
Andrew Lamb 2c8060160f
chore: update datafusion deps (#1721)
* chore: update datafusion deps

* chore: update deps

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-15 19:06:15 +00:00
Raphael Taylor-Davies bf54ab51f2
refactor: split lifecycle into separate crate (#1730) 2021-06-15 15:57:47 +00:00
kodiakhq[bot] 4dc338e1bc
Merge branch 'main' into trogging 2021-06-15 13:04:06 +00:00
Marko Mikulicic bde35cf5be
chore: Pull tracing+logging setup in its own crate
1. so that it can be reused by other binaries (e.g. conductor)
2. so that it's faster to build when working on it.
2021-06-15 14:52:04 +02:00
Raphael Taylor-Davies dd422492e2
feat: sort order in schema (#1357) (#1667)
* feat: sort order in schema (#1357)

* chore: review feedback

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-14 18:10:41 +00:00
kodiakhq[bot] fc1b5ea165
Merge branch 'main' into crepererum/parquet_metadata_wrapper 2021-06-14 11:20:39 +00:00
Marco Neumann 518f7c6f15 refactor: wrap upstream parquet MD into struct + clean up interface
This prevents users from `parquet_file::metadata` to also depend on
`parquet` directly. Furthermore they don't need to important dozend of
functions and can instead just use `IoxParquetMetaData` directly.
2021-06-14 13:17:01 +02:00
Andrew Lamb 0d8d32fd8f
chore: Update deps to get latest arrow (#1708)
* chore: Update deps to get latest arrow

* fix: Update to rust 1.52

* fix: clippy
2021-06-14 11:08:09 +00:00
Marco Neumann 898c638630 feat: wire up catalog checkpointing
Closes #1381.
2021-06-14 10:08:32 +02:00
Marko Mikulicic 5a68abaa53
feat: Implement LayeredTracing
__Rationale__

We currently use the `tracing` framework to output to both log outputs (e.g. stdout for k8s) and distributed tracing collectors (e.g. opentelemetry jaeger).

However, due to a limitation in the `tracing` SDK, we can only have one "filter" level that applies
to both logs and tracing outputs. This is unpractical because tracing collectors are designed
to receive high verbosity data (which will be then sampled within the opentelemetry library),
while logs generally are limited to the DEBUG level on production.

This PR adds a `FilteredLayer` tracing subscriber layer, that wraps a subscriber layer with a independent
filter, which can filter events goint to the wrapper subscriber layer more agressively than the global layer.

This will allow us to emit logs at INFO or DEBUG level while passing all events to opentelemetry at TRACE
level (and opentelemetry SDK will then sample the events so that only a small part will be sent to the
ot collector)

__Note__

This PR just implements the `FilteredLayer` and a test. Another PR will integrate this with
our log/tracing setup code.
2021-06-11 04:32:47 +02:00
Raphael Taylor-Davies 11b25b3aaf
refactor: swap order of partition and table in in-memory catalog (#1678)
* refactor: swap order of partition and table in in-memory catalog

* chore: review feedback

* chore: validate panic message

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-10 16:40:30 +00:00