Commit Graph

550 Commits (b8f7319704ea7f802843a5e383769d917c9c29ac)

Author SHA1 Message Date
Carol (Nichols || Goulding) 8d1d877196 feat: Record first/last write times for RUB chunks 2021-07-22 11:35:22 -04:00
Raphael Taylor-Davies 8c974beba0
feat: add access timestamps to CatalogChunk (#2075) (#2081)
* feat: add access timestamps to CatalogChunk (#2075)

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-22 12:19:30 +00:00
Raphael Taylor-Davies ffe6e62aee
feat: add instant to datetime conversion (#2078)
* feat: add instant to datetime conversion

* chore: review feedback
2021-07-21 11:43:27 +00:00
Andrew Lamb 387667330a
chore: Update datafusion deps (#2073)
* chore: Update datafusion deps

* fix: update tests
2021-07-21 08:27:03 +00:00
Andrew Lamb 1c16988a51
chore: Update datafusion references (#2056) 2021-07-19 18:09:06 +00:00
Carol (Nichols || Goulding) ff6b0ac030
fix: Cargo git deps use rev rather than ref (#2050)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-19 13:31:24 +00:00
Edd Robinson a852dce450 refactor: default to num_cpu 2021-07-19 14:00:10 +01:00
Andrew Lamb 4da8a16c18
chore: update to arrow 5.0 and master datafusion (#2049)
* chore: update to arrow 5.0 and master datafusion

* fix: Update test for change in object size
2021-07-19 12:49:51 +00:00
Marko Mikulicic 06399e88e0
chore: Add some debug logs to write buffer 2021-07-15 22:18:03 +02:00
Andrew Lamb 74b8bb76e6
chore: Update to correct pre-release version of DF (#2023)
Co-authored-by: Edd Robinson <me@edd.io>
2021-07-15 18:13:42 +00:00
Marco Neumann b5428e53a5 refactor: write buffer testing + better mocking
This refactors the write buffer a bit for:

- **Testing:** Add generic tests for the Kafka and the mocking
  implementation. The same interface can be used easily add new
  implementations (e.g. via Redis, filesystem, ...).
- **Partition on Write:** The caller of the writer operation must now
  specify the partition/sequencer ID. The implicit partitioning of the
  Kafka writer would have lead to broken data since we must never spill
  entries w/ the same primary key over multiple partitions. At the
  moment we will only use partition 0 but we can easily implement
  better logic in the future.
- **Improved Mocking:** The mocked implementation now simulates a system
  that feels more real. Especially the handling around multiple streams
  and "write while read" has been improved. This will be helpful for
  testing and for new features like seeking (during replay). A solid
  realistic mock also helps us to ensure that the tests using the mock
  do not rely on unrealistic behavior too much.
2021-07-15 17:20:45 +02:00
Raphael Taylor-Davies d71f38f27c
feat: compute PartitionCheckpoint from PersistenceWindows (#2011)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-15 12:17:23 +00:00
Raphael Taylor-Davies cbeeb97cff
feat: flush open window on persist (#1985)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-14 16:58:20 +00:00
Edd Robinson 0e5276ed20
Merge branch 'main' into alamb/go_go_go_go 2021-07-14 13:56:35 +01:00
Marco Neumann 9cb9ae0874 chore: move write buffer into its own crate 2021-07-14 14:09:18 +02:00
Andrew Lamb 4800b36949 chore: Update IOx to a pre-release version of arrow and datafusion to test out performance improvement 2021-07-13 15:44:57 -04:00
Andrew Lamb ef3269ee1d
chore: Update datafusion deps (#1984)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-13 17:29:29 +00:00
Marco Neumann 157a0cc98c chore: update flatbuffers to 2.0 2021-07-13 15:44:45 +02:00
Marco Neumann f5c63b2ae6 chore: update nom to version 6
Triggered some changes from `Fn` to `FnMut`.
2021-07-13 15:28:42 +02:00
Marco Neumann f11be17523 chore: update mockito to 0.30 2021-07-13 15:17:28 +02:00
Marco Neumann 2e391deb34 chore: update croaring to 0.5.0
Upstreame changelog:

- CRoaring updated to 0.3.1
- `-march=native` is not a default for croaring-sys anymore
- Impl Default for `Bitmap` and `Treemap`
2021-07-13 15:15:41 +02:00
Marco Neumann 5c16bd2085 chore: cargo update 2021-07-13 15:13:37 +02:00
Andrew Lamb 8b9b369189
chore: Update DataFusion again (#1930)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-08 15:14:45 +00:00
Andrew Lamb 7602bde850
chore: Update datafusion deps (#1799)
* chore: Update datafusion deps + rework code

* refactor: remove workaround as it has been contributed upstream

* fix: Update query/src/exec/split.rs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-08 10:58:32 +00:00
Marco Neumann d6cff911b6 test: ensure that query tests don't rebuild all the time
Beforehand:

```text
❯ env CARGO_LOG=cargo::core::compiler::fingerprint=info cargo test -p query_tests
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint] stale: changed "/home/mneumann/src/influxdb_iox/query_tests/cases"
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint]           (vs) "/home/mneumann/src/influxdb_iox/target/debug/build/query_tests-0e8f741dfb84437f/output"
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint]                FileTime { seconds: 1625474716, nanos: 436081357 } != FileTime { seconds: 1625474752, nanos: 52625167 }
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint] fingerprint error for query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)/Test/TargetInner { ..: lib_target("query_tests", ["lib"], "/home/mneumann/src/influxdb_iox/query_tests/src/lib.rs", Edition2018) }
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint]     err: current filesystem status shows we're outdated
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint] fingerprint error for query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)/RunCustomBuild/TargetInner { ..: custom_build_target("build-script-build", "/home/mneumann/src/influxdb_iox/query_tests/build.rs", Edition2018) }
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint]     err: current filesystem status shows we're outdated
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint] fingerprint error for query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)/Build/TargetInner { ..: lib_target("query_tests", ["lib"], "/home/mneumann/src/influxdb_iox/query_tests/src/lib.rs", Edition2018) }
[2021-07-05T08:52:13Z INFO  cargo::core::compiler::fingerprint]     err: current filesystem status shows we're outdated
   Compiling query_tests v0.1.0 (/home/mneumann/src/influxdb_iox/query_tests)
```

The issue is that both the input and the test output files are located
under `cases/`. `build.rs` used `cargo:rerun-if-changed=cases` which per
Cargo doc will scan ALL files in that directory. Note that the normal
`exclude` directive in `Cargo.toml` does NOT work, see
https://github.com/rust-lang/cargo/issues/4587 .

So we need to split input and output files into separate directories
(`cases/{in,out}`).
2021-07-05 15:30:10 +02:00
Marco Neumann 4ca2d3e148 chore: move persistence windows related code into own crate
The entire persistence windows data structures (including the
checkpoints) have nothing to do with the mutable buffer per se. So lets
move them into their own crate. This also makes `parquet_file` not
longer depend on `mutable_buffer`.
2021-07-05 10:23:58 +02:00
Marco Neumann cdab1bed05 feat: persist part+db checkpoint in parquets and catalog
This will be required for replay on server startup.
2021-07-05 09:42:46 +02:00
Jacob Marble 0779b0d9bd
feat: add gRPC listener for new write protocol (#1842)
* feat: add gRPC listener for new write protocol

* chore: clippy happy

* chore: lint

* chore: cargo fmt --all

* chore: cargo clippy

* chore: protobuf-lint

* chore: more formatting

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:15:12 +00:00
Marco Neumann 4204127b05 refactor: use protobuf for in-parquet metadata 2021-06-30 16:51:37 +02:00
Andrew Lamb fef160e24f
feat: Implement data driven query_tests and port explain tests (#1814)
* feat: Implment data driven query testing and port explain tests

* fix: do not fmt the auto generated cases

* refactor: split setup and parser into separate modules

* refactor: Add log to runner, add end to end tests

* docs: fixu cpmments
2021-06-29 16:09:51 +00:00
Andrew Lamb 5cc773ad80
chore: update deps to get new arrow (#1831) 2021-06-28 20:31:31 +00:00
kodiakhq[bot] 1bde983d66
Merge branch 'main' into cn/kafka-write 2021-06-24 12:42:44 +00:00
Marko Mikulicic 69b0bb1510
feat: Implement grpc-router crate
This PR implements the main building block for implementing the gRPC StorageService router.
2021-06-23 17:21:46 +02:00
Carol (Nichols || Goulding) c66f9e5aeb feat: Write entries to Kafka when configured as the write buffer 2021-06-23 10:48:18 -04:00
kodiakhq[bot] 3fd45d987e
Merge branch 'main' into crepererum/fix_server_startup 2021-06-23 07:17:19 +00:00
Andrew Lamb 4c5007f961
fix: Select the correct timestamp for min/max selectors (#1771)
* test: Reproducer showing that the min/max selectors are order dependent

* fix: pick correct timestamp for first/last selectors

* refactor: remove println

* docs: Fixup comments and add to link to arrow-datafusion/issues/600

* fix: Add debug if timestamp is null

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-22 17:53:54 +00:00
Marco Neumann 55c546baff feat: eagerly check object store during CLI `run`
Instead of waiting for the server ID to be set and then mark the server
as errored, directly check the object store on startup. This is
important so that we fail fast when Istio isn't up and running yet.
2021-06-22 18:21:30 +02:00
Carol (Nichols || Goulding) b4644e6108 test: Start of Kafka Write Buffer integration tests 2021-06-21 09:41:35 -04:00
Andrew Lamb ab052c0501
fix: fix flaky test by updating datafusion dep (#1758)
* chore: update DataFusion dependencies

* fix: Re-enable previously flaking tests

This reverts commit c63ad0ea31.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-18 20:17:18 +00:00
kodiakhq[bot] 1d8469951f
Merge branch 'main' into smaller-cache 2021-06-18 18:50:10 +00:00
Marko Mikulicic b612c3af4e
chore: Switch to smaller cache dep 2021-06-18 09:43:28 +02:00
Andrew Lamb ec43a87909
chore: Update itertools deps (#1750) 2021-06-17 17:56:44 +00:00
Andrew Lamb c5eea9af6a
feat: Implement DeduplicateExec (#1733)
* feat: Implement DeduplicateExec

* fix: Doc comments

* fix: fix comment

* fix: Update with arrow ticket references and use datafusion coalsce batches impl

* refactor: rename inner.rs to algo.rs

* docs: Add additional documentation on rationale for last field value

* docs: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* docs: Update query/src/provider/deduplicate/algo.rs

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* docs: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* refactor: do not use pub(crate)

* docs: fix test comments

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-06-17 14:17:52 +00:00
Paul Dix 802ecacf61 feat: Add persistence windows ingestion tracking
This adds a new module, persistence_windows, to the mutable buffer crate. Later PRs will add this into the mutable buffer chunk where it can be used to track when the lifecycle for persistence should be triggered.
2021-06-16 15:28:37 -04:00
Marko Mikulicic 760bcde3f0
feat: Factor out tracing/logging CLI options
This PR factors out the tracing/logging CLI optinos into the `trogging` utility crate,
so that multiple binaries from the IOx suite (such as conductor) can use the same (and quite complex)
logging/tracing configuration options (flags and env vars).

Closes influxdata/conductor#343
2021-06-16 00:54:11 +02:00
Raphael Taylor-Davies 38d17a3093
chore: remove unused query dependency (#1731)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-15 22:06:13 +00:00
Andrew Lamb 2c8060160f
chore: update datafusion deps (#1721)
* chore: update datafusion deps

* chore: update deps

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-15 19:06:15 +00:00
Raphael Taylor-Davies bf54ab51f2
refactor: split lifecycle into separate crate (#1730) 2021-06-15 15:57:47 +00:00
kodiakhq[bot] 4dc338e1bc
Merge branch 'main' into trogging 2021-06-15 13:04:06 +00:00
Marko Mikulicic bde35cf5be
chore: Pull tracing+logging setup in its own crate
1. so that it can be reused by other binaries (e.g. conductor)
2. so that it's faster to build when working on it.
2021-06-15 14:52:04 +02:00
Raphael Taylor-Davies dd422492e2
feat: sort order in schema (#1357) (#1667)
* feat: sort order in schema (#1357)

* chore: review feedback

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-14 18:10:41 +00:00
kodiakhq[bot] fc1b5ea165
Merge branch 'main' into crepererum/parquet_metadata_wrapper 2021-06-14 11:20:39 +00:00
Marco Neumann 518f7c6f15 refactor: wrap upstream parquet MD into struct + clean up interface
This prevents users from `parquet_file::metadata` to also depend on
`parquet` directly. Furthermore they don't need to important dozend of
functions and can instead just use `IoxParquetMetaData` directly.
2021-06-14 13:17:01 +02:00
Andrew Lamb 0d8d32fd8f
chore: Update deps to get latest arrow (#1708)
* chore: Update deps to get latest arrow

* fix: Update to rust 1.52

* fix: clippy
2021-06-14 11:08:09 +00:00
Marco Neumann 898c638630 feat: wire up catalog checkpointing
Closes #1381.
2021-06-14 10:08:32 +02:00
Marko Mikulicic 5a68abaa53
feat: Implement LayeredTracing
__Rationale__

We currently use the `tracing` framework to output to both log outputs (e.g. stdout for k8s) and distributed tracing collectors (e.g. opentelemetry jaeger).

However, due to a limitation in the `tracing` SDK, we can only have one "filter" level that applies
to both logs and tracing outputs. This is unpractical because tracing collectors are designed
to receive high verbosity data (which will be then sampled within the opentelemetry library),
while logs generally are limited to the DEBUG level on production.

This PR adds a `FilteredLayer` tracing subscriber layer, that wraps a subscriber layer with a independent
filter, which can filter events goint to the wrapper subscriber layer more agressively than the global layer.

This will allow us to emit logs at INFO or DEBUG level while passing all events to opentelemetry at TRACE
level (and opentelemetry SDK will then sample the events so that only a small part will be sent to the
ot collector)

__Note__

This PR just implements the `FilteredLayer` and a test. Another PR will integrate this with
our log/tracing setup code.
2021-06-11 04:32:47 +02:00
Raphael Taylor-Davies 11b25b3aaf
refactor: swap order of partition and table in in-memory catalog (#1678)
* refactor: swap order of partition and table in in-memory catalog

* chore: review feedback

* chore: validate panic message

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-10 16:40:30 +00:00
Marco Neumann 876f642860 test: add benchmark for catalog loading 2021-06-10 15:42:21 +02:00
Andrew Lamb 29dadba4f3
chore: update dependencies (#1673)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-10 13:01:31 +00:00
Andrew Lamb a614fef5bc
chore: remove more unused dependencies (#1658)
* chore: remove more unused deps

* refactor: move benchmarks into server_benchmarks crate
2021-06-09 10:17:20 +00:00
Raphael Taylor-Davies 07c4277ca7
refactor: schema merge to give more control over field merging (#1653)
* refactor: schema merge to give more control over field merging

* chore: review feedback
2021-06-09 06:30:45 +00:00
Andrew Lamb e9834a907c
feat: Prune on boolean column predicates too (#1629)
* chore: update deps to get latest DataFusion

* fix: enable boolean pruning tests

* fix: update explain plan tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-07 16:51:30 +00:00
kodiakhq[bot] 87297f7db4
Merge branch 'main' into cn/delete 2021-06-07 13:32:42 +00:00
Raphael Taylor-Davies 5749a2c119
chore: cleanup legacy TSM -> parquet code (#1639)
* chore: cleanup legacy parquet code

* chore: remove tests of removed functionality

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-07 12:59:33 +00:00
Carol (Nichols || Goulding) f4a9a5ae56 fix: Remove write buffer 2021-06-04 14:40:17 -04:00
Andrew Lamb 42f26b609b
refactor: Move `query_tests` and `server_benchmarks` into their own crate --> smaller `server` (#1628)
* refactor: Separate query_tests into its own crate

* fix: references

* refactor: break out server benchmarks

* fix: Update query_tests/src/lib.rs

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2021-06-04 17:31:19 +00:00
Marco Neumann bbd73e59be feat: jitter background clean-up job + wait on first job 2021-06-03 11:23:29 +02:00
Marco Neumann 0a625b50e6 feat: store transaction timestamp in preserved catalog 2021-06-02 09:41:19 +02:00
Andrew Lamb 83b2eacea6
chore: update deps for datafusion (#1601) 2021-06-01 20:00:39 +00:00
Andrew Lamb c0e4e6951a
chore: update datafusion (#1594) 2021-06-01 15:55:07 +00:00
Andrew Lamb d3711a5591
refactor: Use ParquetExec from DataFusion to read parquet files (#1580)
* refactor: use ParquetExec to read parquet files

* fix: test

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-01 14:44:07 +00:00
Andrew Lamb 3338ddcca9
chore: update dependencies (#1593) 2021-06-01 14:05:56 +00:00
Andrew Lamb 73cedd2f88
chore: remove unused dependency (#1587)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-31 14:22:11 +00:00
Andrew Lamb d50c7c8919
chore: remove unused dependency (#1581) 2021-05-31 09:58:10 +00:00
Andrew Lamb 00e735ef0d
chore: remove unused dependencies (#1583) 2021-05-29 10:31:57 +00:00
Raphael Taylor-Davies d8f19348bf
feat: per-column dictionaries in MUB (#1570)
* feat: per-column dictionaries in MUB

* chore: fmt

* refactor: remove chunk-level dictionary

* chore: remove redundant sort

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-28 13:51:56 +00:00
kodiakhq[bot] 166851d952
Merge branch 'main' into crepererum/in_file_metadata 2021-05-26 07:39:53 +00:00
Andrew Lamb 638d754e0f
chore: Update datafusion + other deps (#1557) 2021-05-25 22:32:14 +00:00
Raphael Taylor-Davies c2fd85209c
feat: wait for task shutdown on DedicatedExecutor (#1537) 2021-05-25 11:33:55 +00:00
Marco Neumann 19a2733d30 feat: preserve transaction metadata in parquets 2021-05-25 09:56:12 +02:00
Andrew Lamb 14ba25f86d
chore: Update datafusion and use released version of arrow crates (#1546)
* chore: Update datafusion and use released version of arrow crate

* fix: Update for change in API
2021-05-24 15:37:22 +00:00
Edd Robinson 634ceb886b feat: add Dictionary Values type 2021-05-20 10:49:49 +01:00
Marko Mikulicic ce2f8351be
fix: Cache outbound gRPC connections 2021-05-19 18:28:45 +02:00
Marko Mikulicic 40b21dbca1
feat: Add /debug/pprof/profile 2021-05-19 13:08:42 +02:00
Andrew Lamb 133ce12827
chore: update deps (#1499)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-14 20:22:16 +00:00
Raphael Taylor-Davies f9178dbb5f
feat: push metrics into catalog (#1488)
* feat: push metrics into catalog

* chore: minor cleanup

* fix: include db labels in chunk metric domains

* chore: fmt

* fix: don't allow dropping moving chunks

* chore: further tweaks

* chore: review feedback

* feat: use new_unregistered() for metric instruments instead of default

* chore: use &[KeyValue] instead of &Vec<KeyValue>

* refactor: make GauageValue non default constructible
2021-05-14 17:37:39 +00:00
Andrew Lamb ebea554e5a
feat: Report num_cpus seen by IOx on startup (#1484) 2021-05-12 13:42:16 +00:00
Marco Neumann 0b1ef52481
chore: upgrade arrow and datafusion (#1483) 2021-05-12 13:08:37 +00:00
Raphael Taylor-Davies b02105e47b
feat: construct StringDictionary from PackedStringArray (#1475)
* feat: construct StringDictionary from PackedStringArray

* chore: fix formatting

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-11 20:16:25 +00:00
Raphael Taylor-Davies 4409d2c8af
feat: instrument catalog locks (#1464)
* feat: instrument catalog locks (#1355)

* chore: add metrics test

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-11 18:59:11 +00:00
Raphael Taylor-Davies c85d1574eb
feat: move dictionary and bitset into arrow_utils (#1459)
* feat: move dictionary and bitset into arrow_utils

* chore: review feedback

* chore: remove redundant dictionary methods

* chore: consistent type parameter name in PackedStringArray

* chore: review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-11 16:43:38 +00:00
Edd Robinson 3622a92c8b feat: wire in rb column metrics 2021-05-11 13:00:52 +01:00
Andrew Lamb b6290f1ff3
chore: update deps (#1466) 2021-05-10 22:14:08 +00:00
Raphael Taylor-Davies e2e3c9f77c
feat: workaround bug/limitation in OT handling of observers (#1457)
* feat: workardound bug/limitation in OT handling of observers

* chore: fix lints

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-07 18:53:28 +00:00
Edd Robinson 8ccc359cab refactor: address PR feedback 2021-05-07 13:48:44 +01:00
Edd Robinson eae3fec571 feat: wire up regex UDF as predicate filter expr 2021-05-07 13:44:51 +01:00
Edd Robinson 3fc2c9fc04 feat: add DataFusion regex match operator
This commit adds a new custom UDF to IOx that provide a regex operator to Datafusion plans.
Effectively it allows predicates to contain regex operators that are applied as filters, only allowing rows that satisfy the regex to be returned.

I did not use the Arrow regex kernel for this work because that does not return a boolean array indicating which rows matched a regex, but instead returns a new string array of results. This doesn't work well with DF's approach to filtering.
2021-05-07 13:44:51 +01:00
Raphael Taylor-Davies 216903a949
refactor: move protobuf conversion logic to generated_types (#1437)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-06 15:49:27 +00:00
Raphael Taylor-Davies 10f89a3e8d
refactor: split entry out into separate crate (#1428)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-06 11:36:23 +00:00
Raphael Taylor-Davies b153fc9ef3
chore: remove unused dependency from influxdb_line_protocol (#1435) 2021-05-06 10:13:00 +00:00
Raphael Taylor-Davies 01ba48c1ba
feat: instrumentable RwLocks (#1355) (#1421) 2021-05-06 08:36:30 +00:00
Raphael Taylor-Davies 7e2e2c4caf
refactor: initial split of observability_deps (#1429) 2021-05-06 08:25:09 +00:00
Andrew Lamb 86771ea629
chore: update arrow/datafusion deps (#1433)
* chore: update datafusion deps

* chore: update arrow deps

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-05 22:37:31 +00:00
Raphael Taylor-Davies ca1c698fd0
chore: update hashbrown (#1430)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-05-05 22:32:46 +00:00
kodiakhq[bot] 7e01650c53
Merge branch 'main' into reflection 2021-05-05 20:29:13 +00:00
Marko Mikulicic 0d6d94dc00
feat: Enable gRPC reflection 2021-05-05 22:04:47 +02:00
Raphael Taylor-Davies 411cf134e9
refactor: explode arrow_deps (#1425)
* refactor: explode arrow_deps

* chore: workaround doctest bug
2021-05-05 16:59:12 +00:00
Marco Neumann 1f42eb89cd feat: implement parquet metadata handling
Closes #1379 and contributes to #1380.
2021-05-05 13:29:16 +02:00
Marco Neumann 80d78c2a67 feat: re-export matching `parquet_format` in `arrow_deps` 2021-05-05 13:29:16 +02:00
Marco Neumann 34754ebcdb refactor: move MemoryStream to arrow_deps 2021-05-05 13:29:16 +02:00
kodiakhq[bot] ce568b4b2f
Merge branch 'main' into crepererum/issue1253 2021-05-03 11:11:42 +00:00
Marko Mikulicic b579ef8646
feat: Add jemalloc stats 2021-05-03 12:10:48 +02:00
Marco Neumann 136c35cb88 feat: implement transaction handling for catalog
Closes #1253.
2021-05-03 10:04:35 +02:00
Marko Mikulicic 234c899631
feat: Add gauge metric 2021-04-30 14:10:42 +02:00
Edd Robinson 13fbf2e68d refactor: plumb registry to gRPC server 2021-04-29 14:00:05 +01:00
Andrew Lamb 74d35ce9a4
chore: update deps (#1365) 2021-04-29 10:52:43 +00:00
Andrew Lamb a64e622f6e
feat: Add interactive SQL repl (#1332)
* feat: Add interactive SQL repl

* fix: try and fix test

* fix: remove test for prompt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-28 20:41:02 +00:00
Raphael Taylor-Davies 7ca1da3fcd
feat: pushdown table and partition key predicates to catalog (#736) (#1327)
* feat: catalog predicate pushdown (#736)

* chore: fix lints

* chore: review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-27 15:31:47 +00:00
Raphael Taylor-Davies 0a835436ac
feat: use bitmasks within MUB (#1274) (#1289)
* feat: use bitmasks within MUB (#1274)

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-26 18:00:16 +00:00
Raphael Taylor-Davies e393d16e36
chore: update arrow versions (#1300) 2021-04-26 10:11:25 +00:00
Edd Robinson 87de656d23 feat: metrics test helpers 2021-04-23 15:58:48 +00:00
Edd Robinson 97b2369140 refactor: swap existing metrics for THE NEW WAY 2021-04-23 15:58:48 +00:00
Edd Robinson ea909f45ad feat: RED metrics 2021-04-23 15:58:48 +00:00
Raphael Taylor-Davies 74c25f541d
feat: fast MUB dictionary arrow conversion (#1273)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-22 20:31:44 +00:00
Raphael Taylor-Davies fe4fa29930
feat: MUB benchmarks (#635) (#1271) 2021-04-22 11:15:32 +00:00
Marko Mikulicic 83d6550316 feat: Implement write_entry_downstream 2021-04-21 20:50:46 +00:00
Carol (Nichols || Goulding) f0b93c5c8c refactor: Rename the wal crate to write_buffer 2021-04-21 17:43:03 +00:00
Raphael Taylor-Davies fdb22c9d9b
chore: update tonic to 0.4.2 (#1268) 2021-04-21 11:54:03 +00:00
kodiakhq[bot] dc6637b448
Merge branch 'main' into jgm-tracing-logging 2021-04-20 20:16:40 +00:00
Edd Robinson 895827aa98 chore: disable SIMD in Arrow 2021-04-20 17:30:50 +00:00
Jacob Marble 5539887cd6 chore: put env_logger back 2021-04-19 15:48:30 -07:00
Jacob Marble 87396edc56 chore: incremental dependency update
This fixes a metrics CI error, and introduces two new CI errors.
2021-04-19 15:48:30 -07:00
Jacob Marble df9dd15ab8 feat(tracing): improve logs and tracing
Logs and traces are emitted via one pipeline. For now, it is not
possible to emit both, but it should be possible in a few weeks, as
tokio/tracing/tracing-subscriber is going through some refactoring recently.

All affected flags are well-documented, and I have tested all but the
OTLP output flags.

chore: clippy happy

chore: revert instrumentation changes

feat: add log format logfmt, log destinations stderr, stdout

chore: clippy happy
2021-04-19 15:48:30 -07:00
Marco Neumann fd0da7e74a chore: upgrade arrow and Rust
See https://github.com/apache/arrow/pull/10082 for upstream PR.
2021-04-19 14:00:04 +02:00
Edd Robinson 4b706141de refactor: log new row groups added to RB 2021-04-19 10:25:57 +00:00
Nga Tran 4c23ca8888 feat: full implementation of parquet's read_filter for review 2021-04-16 16:03:24 -04:00
Marko Mikulicic 878b1b318e feat: Initial scaffolding for routing layer
Part of #916

Adding first-class concept of ShardId in shard config, fixes #1156

NEXT:

- [ ] implement sharder
- [ ] implement `write_entry_downstream`
- [ ] add tests
2021-04-15 09:02:47 +00:00
Edd Robinson a3fc5e2474 refactor: change sync::RwLock to parking_lot 2021-04-14 19:18:03 +00:00
Andrew Lamb f5f768d750
feat: Add a dedicated threadpool for running queries (#1191)
* feat: use a dedicated tokio threadpool for running queries

* feat: plumb number of executor threads through to command line

thread through command line

* fix: Logical merge conflict

* fix: another logical conflict

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-14 10:48:09 +00:00
Edd Robinson 9834c845db test: add influxrpc tag_values benches
The initial benchmarks look like this on my i9 MBP:

```
Data in one open chunk and one closed chunk of mutable buffer/tag0/no_pred           1.00     91.0±2.55ms        ? ?/sec
Data in one open chunk and one closed chunk of mutable buffer/tag0/with_pred         1.00     11.5±0.72ms        ? ?/sec
Data in one open chunk and one closed chunk of mutable buffer/tag1/no_pred           1.00    120.3±5.10ms        ? ?/sec
Data in one open chunk and one closed chunk of mutable buffer/tag1/with_pred         1.00     11.2±0.22ms        ? ?/sec
Data in one open chunk and one closed chunk of mutable buffer/tag2/no_pred           1.00    203.2±8.45ms        ? ?/sec
Data in one open chunk and one closed chunk of mutable buffer/tag2/with_pred         1.00     11.2±0.21ms        ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag0/no_pred      1.00    100.3±3.73ms        ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag0/with_pred    1.00     31.2±1.80ms        ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag1/no_pred      1.00    126.7±2.29ms        ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag1/with_pred    1.00     33.0±1.70ms        ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag2/no_pred      1.00    212.0±6.86ms        ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag2/with_pred    1.00     18.1±0.99ms        ? ?/sec
Data in single open chunk of mutable buffer/tag0/no_pred                             1.00     98.7±6.08ms        ? ?/sec
Data in single open chunk of mutable buffer/tag0/with_pred                           1.00     11.2±0.37ms        ? ?/sec
Data in single open chunk of mutable buffer/tag1/no_pred                             1.00    118.9±3.97ms        ? ?/sec
Data in single open chunk of mutable buffer/tag1/with_pred                           1.00     11.7±0.64ms        ? ?/sec
Data in single open chunk of mutable buffer/tag2/no_pred                             1.00    202.1±8.49ms        ? ?/sec
Data in single open chunk of mutable buffer/tag2/with_pred                           1.00     11.1±0.27ms        ? ?/sec
Data in two read buffer chunks/tag0/no_pred                                          1.00    109.2±5.20ms        ? ?/sec
Data in two read buffer chunks/tag0/with_pred                                        1.00     44.2±1.83ms        ? ?/sec
Data in two read buffer chunks/tag1/no_pred                                          1.00    132.9±3.79ms        ? ?/sec
Data in two read buffer chunks/tag1/with_pred                                        1.00     41.7±2.43ms        ? ?/sec
Data in two read buffer chunks/tag2/no_pred                                          1.00    222.4±7.00ms        ? ?/sec
Data in two read buffer chunks/tag2/with_pred                                        1.00     27.9±0.92ms        ? ?/sec
```
2021-04-14 09:36:39 +00:00
kodiakhq[bot] 8e0ee48018
Merge branch 'main' into ntran/query_local_parquet 2021-04-13 22:38:56 +00:00
Andrew Lamb 518df742df
chore: update arrow deps (#1195) 2021-04-13 18:05:03 +00:00
Nga Tran 4a6d6bd7ad feat: initial work for querying data from parquet file in object store 2021-04-13 13:57:46 -04:00
Raphael Taylor-Davies 55a77914b1
feat: basic snapshot caching (#1184) 2021-04-13 17:10:28 +00:00
Nga Tran 7f77a01e61 chore: merged main to branch and resolved conflicts 2021-04-12 12:09:04 -04:00
Nga Tran 453aeaf1a0 feat: Add tests for writing RB chunks to Object Store 2021-04-09 17:39:23 -04:00
Jake Goulding 16cc37e7f3
chore: update cloud-storage to 0.9 (#1165)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-04-09 14:37:51 +00:00
Marko Mikulicic e76980928b feat: Implement Update API 2021-04-08 22:25:36 +00:00
Carol (Nichols || Goulding) 1552c0113a Merge remote-tracking branch 'origin/main' into feature-query 2021-04-08 14:40:12 -04:00
Edd Robinson a34de76c49 refactor: wire read buffer tracker in 2021-04-08 18:20:37 +00:00