Commit Graph

8223 Commits (6417e7dc2a9ee17e5bca13b1faf97f801091adce)

Author SHA1 Message Date
Carol (Nichols || Goulding) 6417e7dc2a
feat: Extract sharder to its own crate 2022-06-15 10:01:45 -04:00
Marco Neumann 3bd24b67ba
feat: extend flight client to accept multiple (changing) schemas (#4853)
* feat: extend flight client to accept multiple (changing) schemas

See #4849.

Originally I intended not to use Flight at all for the new
ingester<>querier protocol. However since flight also deals with
dictionary batches and multiple batches and the gRPC protocol that I
would write would look very similar, I will use Flight with a bit more
flexible message types.

The rough idea for the protocol is the following stream:

- for each partition:
  1. "none" message with partition metadata
  2. for each chunk (can have different schemas under certain
     circumstances):
     1. "schema" message (resets dictionary state)
     2. (optional) dictionary batch messages
     3. one or more "record batch" message

The nice thing about it is that the same arrow client works also for the
existing client<>querier protocol since there we just send:

1. "schema" message (no app metadata)
2. (optional) dictionary batch messages
3. zero, one or more "record batch" message (no app metadata)

* refactor: separate high- and low-level flight client

It is very unlikely that a user will use the high-level batch-producing
functionality and the low-level stuff within the same session. So let's
split this into to clients (high-level uses the low-level one
internally) to avoid confusion.

Also add documentation on our protocol handling.

* refactor: enumerate all variants in match statement to better catch errors in the future
2022-06-15 11:38:08 +00:00
Andrew Lamb 005610b172
refactor: remove some `&` use in iox_catalog (#4862)
* refactor: remove some `&` use in iox_catalog

* fix: Update data_types/src/lib.rs
2022-06-15 11:31:49 +00:00
Andrew Lamb 394c84f3e8
chore: Update CI checks to verify data generator build (#4857)
* chore: Update CI checks to verify data generator build

* fix: bench verify test

* docs: Update .circleci/config.yml

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-15 10:31:14 +00:00
Andrew Lamb 164e75f328
refactor: Remove unused `Option` (#4839)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-15 10:24:51 +00:00
dependabot[bot] 232dc897df
chore(deps): Bump clap from 3.2.1 to 3.2.4 (#4860)
Bumps [clap](https://github.com/clap-rs/clap) from 3.2.1 to 3.2.4.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/clap_complete-v3.2.1...v3.2.4)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-15 07:28:53 +00:00
Nga Tran b682dbbc2e
chore: Add debug info of sort_key for ingester (#4859)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 20:39:17 +00:00
Andrew Lamb 7eed3ba0b7
fix: fix feature flags for iox_data_generator build (#4858) 2022-06-14 19:43:22 +00:00
Andrew Lamb c8f70b8933
feat: log query from querier to ingester at `info` level (#4856)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 18:35:50 +00:00
Andrew Lamb eca3b6b9a1
fix: reduce memory usage in ingester with less buffering prior to query engine (#4830)
* refactor: remove another buffer copy in ingester

* docs: Update arrow_util/src/util.rs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 18:22:55 +00:00
Carol (Nichols || Goulding) e875a92cf8
feat: Log time spent requesting ingester partitions (#4806)
* feat: Log time spent requesting ingester partitions

Fixes #4558.

* feat: Record a metric for the duration queriers wait on ingesters

* fix: Use DurationHistogram instead of U64 Histogram

* test: Add a test for the ingester ms metric

* feat: Add back the logging to provide both logging and metrics for ingester duration

* refactor: Use sample_count method on metrics

* feat: Record ingester duration separately for success or failure

* fix: Create a separate test for the ingester metrics

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 17:58:19 +00:00
Andrew Lamb 7d2a5c299f
refactor: remove one buffer copy in the ingester (#4855)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 17:15:36 +00:00
Andrew Lamb e91d00b10c
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `16.0.0 (#4851)
* chore: TEMP Update DataFusion to pre-release

* chore: update arrow et al to 16.0.0

* chore: Run cargo hakari tasks

* fix: update reader read_dictionary API

* chore: Update to real Datafusion release

* fix: Update parquet API

* fix: update test

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-06-14 16:31:40 +00:00
dependabot[bot] 23c9e38ea7
chore(deps): Bump clap from 3.1.18 to 3.2.1 (#4848)
* chore(deps): Bump clap from 3.1.18 to 3.2.1

Bumps [clap](https://github.com/clap-rs/clap) from 3.1.18 to 3.2.1.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v3.1.18...clap_complete-v3.2.1)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: fix clap deprecations

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 15:42:18 +00:00
Marco Neumann f7fbc67b00
feat: extend duration histograms down to 1ms (#4854)
5ms are quite long considering that many requests take way below 100ms in
total. Let's add two more levels on the lower end of the spectrum.

Since we only do not use data-dependent histograms (i.e. do not include
table or namespace names), the overhead should be acceptable.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 15:30:00 +00:00
Andrew Lamb 34e8659876
refactor: consolidate plan creation from `QueryChunk`s in `iox_query` (#4837)
* refactor: consolidate plan creation from Chunks

* docs: update docstrings

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 14:36:07 +00:00
Dom 8260e58a4a
Merge pull request #4852 from influxdata/dom/partition-key-type
refactor: PartitionKey type
2022-06-14 15:11:05 +01:00
Dom Dwyer b41ea1d718 refactor: PartitionKey type
This commit changes the code base to use a new reference-counted
PartitionKey type wrapper, instead of passing a bare String around.

This allows the compiler to type check & verify usage of the partition
key, instead of passing a bare string around. By reference counting the
underlying string, we reduce memory usage for some use cases.
2022-06-14 14:47:56 +01:00
dependabot[bot] 4a94a21b4a
chore(deps): Bump getrandom from 0.2.6 to 0.2.7 (#4847)
Bumps [getrandom](https://github.com/rust-random/getrandom) from 0.2.6 to 0.2.7.
- [Release notes](https://github.com/rust-random/getrandom/releases)
- [Changelog](https://github.com/rust-random/getrandom/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-random/getrandom/compare/v0.2.6...v0.2.7)

---
updated-dependencies:
- dependency-name: getrandom
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-14 07:46:23 +00:00
Andrew Lamb 78ae4caf06
chore: Update DataFusion (#4842)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 07:36:55 +00:00
dependabot[bot] 7381d4f489
chore(deps): Bump reqwest from 0.11.10 to 0.11.11 (#4843)
Bumps [reqwest](https://github.com/seanmonstar/reqwest) from 0.11.10 to 0.11.11.
- [Release notes](https://github.com/seanmonstar/reqwest/releases)
- [Changelog](https://github.com/seanmonstar/reqwest/blob/master/CHANGELOG.md)
- [Commits](https://github.com/seanmonstar/reqwest/compare/v0.11.10...v0.11.11)

---
updated-dependencies:
- dependency-name: reqwest
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-14 07:29:33 +00:00
Carol (Nichols || Goulding) 6b9a712ba8
fix: Use debug rather than dbg in test helpers (#4841) 2022-06-13 20:17:24 +00:00
Marco Neumann 2b84e5c087
feat: measure "probably reloaded" cache loads (#4813)
To roughly gauge how much data we re-load into cached (i.e. data that
was already loaded but was later evicted due to LRU pressure or TTL
eviction) this change introduces a new metric that estimates if a cache
entry that is requested from the loader was already seen before (using a
probabilistic filter).
2022-06-13 13:51:45 +00:00
Marco Neumann 66623fe0cd
feat: expose query semaphore metrics (#4836)
The groundwork for that was already done, just needed a bit of wiring.
This might help us to judge timeouts.
2022-06-13 09:36:50 +00:00
Andrew Lamb ddf61c5e98
refactor: Consolidate `Selection` creation, add tests (#4832)
* refactor: Consolidate Selection --> DataFusion projection

* fix: remove now unused function
2022-06-10 18:30:43 +00:00
Andrew Lamb 9fdbfb05e7
refactor: Use scan_and_filter in ReorgPlanner (#4822)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-10 17:31:25 +00:00
Nga Tran 99f1f0a10c
chore: Revert "feat: compact all overlapped files no matter how large they are (#4779)" (#4831)
This reverts commit 3e89daa0d4.
2022-06-10 15:52:00 +00:00
kodiakhq[bot] e437dff4c1
Merge pull request #4820 from influxdata/cn/duration
refactor: Use DurationHistogram in more places
2022-06-10 14:29:26 +00:00
kodiakhq[bot] dd8d44e24f
Merge branch 'main' into cn/duration 2022-06-10 14:23:09 +00:00
Nga Tran 13c57d524a
feat: Change data type of catalog partition's sort_key from a string to an array of string (#4801)
* feat: Change data type of catalog Postgres partition's sort_key from a string to an array of string

* test: add column with comma

* fix: use new protonuf field to avoid incompactible

* fix: ensure sort_key is an empty array rather than NULL

* refactor: address review comments

* refactor: address more comments

* chore: clearer comments

* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql

* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql

* fix: Rename migration so it will be applied after

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2022-06-10 13:31:31 +00:00
Andrew Lamb dc992209be
test: account for active writes when reporting readable status (#4782)
* test: account for active writes when reporting readable status

* fix: logical merge conflict
2022-06-10 12:59:09 +00:00
Marko Mikulicic c09f6f6bc9
chore: Incrementally migrate sort_key to array type (#4826)
This PR is the first step where we add a new column sort_key_arr whose content we'll manually migrate from sort_key.

When we're done with this, we'll merge https://github.com/influxdata/influxdb_iox/pull/4801/ (whose migration script must be adapted slightly to rename the `sort_key_arr` column back to `sort_key`).

All this must be done while we shut down the ingesters and the compactors.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-10 11:35:43 +00:00
Andrew Lamb 900ab16293
refactor: remove unused error variants (#4819)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-10 11:23:28 +00:00
Andrew Lamb 11cec18edc
refactor: Move `scan_and_filter` into a `common` module for reuse (#4823)
* refactor: remove unused error variants

* refactor: move scan_and_filter into a module so it can be reused

* docs: update comments about pruning
2022-06-10 11:15:47 +00:00
dependabot[bot] c4bd52ee6f
chore(deps): Bump handlebars from 4.3.0 to 4.3.1 (#4824)
Bumps [handlebars](https://github.com/sunng87/handlebars-rust) from 4.3.0 to 4.3.1.
- [Release notes](https://github.com/sunng87/handlebars-rust/releases)
- [Changelog](https://github.com/sunng87/handlebars-rust/blob/master/CHANGELOG.md)
- [Commits](https://github.com/sunng87/handlebars-rust/compare/v4.3.0...v4.3.1)

---
updated-dependencies:
- dependency-name: handlebars
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-10 11:04:19 +00:00
Luke Bond aa97c918b3
docs: fix README for influxdb_iox_client (#4825)
Closes #4816
2022-06-10 09:13:17 +00:00
Andrew Lamb 50697906b1
refactor: Make `DMLWrite::sequence_number` a `SequenceNumber` (#4817) 2022-06-09 19:36:37 +00:00
Carol (Nichols || Goulding) 1c7cbaf5ae
refactor: Use DurationHistogram in more places 2022-06-09 14:20:51 -04:00
Marco Neumann faaa2c9823
feat: add LRU cache eviction counter (#4814)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-09 13:33:45 +00:00
Marco Neumann 9accb5912e
chore: lower job count in `build_dev` (#4815) 2022-06-09 13:26:35 +00:00
Marco Neumann 4e5842dec7
feat: expose hit-miss metrics for querier caches (#4811)
* feat: `MetricsCache`

* feat: expose hit-miss metrics for querier caches

* refactor: `MetricsCache` -> `CacheWithMetrics`

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-09 13:07:40 +00:00
Dom b5879894b7
Merge pull request #4812 from influxdata/dom/no-wraparound-tests
refactor(data_types): no Timestamp wraparound
2022-06-09 13:45:11 +01:00
Dom Dwyer d1436c9f06 refactor(data_types): no Timestamp wraparound
This commit changes addition/subtraction of Timestamp values to panic if
they would trigger under/overflow rather than silently wrapping around.
2022-06-09 13:23:03 +01:00
Andrew Lamb 2ec7764fdd
refactor: rename builder like predicate methods to be `with_` (#4808)
* refactor: rename builder like predicate methods to be `with_`

* fix: merge conflict

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-09 11:26:03 +00:00
Andrew Lamb d8331e8679
fix: do not return 'readable' until a write is completely readable (#4778)
* fix: do not return readable until a write is completely readable

* docs: Add diagram with partially buffered write

* refactor: account for actively buffering during update rather than fixup

* fix: fixup

* fix: use checked_sub

Co-authored-by: Marco Neumann <marco@crepererum.net>

* fix: checked_sub calculation

Co-authored-by: Marco Neumann <marco@crepererum.net>
2022-06-09 11:15:15 +00:00
Andrew Lamb 107e5f7284
docs: Add some docs about `StreamSplit` (#4810)
* docs: Add some docs about `StreamSplit`

* docs: fix struct name
2022-06-09 10:53:34 +00:00
Andrew Lamb 5e4fcfaa4d
refactor: reduce mut usage in Predicate (#4807)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-09 10:46:01 +00:00
Marco Neumann 2e3ba83795
refactor: expose `CacheGetStatus` (and improve tests) (#4804)
* refactor: expose `CacheGetStatus` (and improve tests)

- add a `CacheGetStatus` which tells the user if the request was a hit
  or miss (or something inbetween)
- adapt some tests to use the status (only the tests where this could be
  relevant)
- move the test suite from using `sleep` to proper barriers (more stable
  under high load, more correct, potentially faster)

* refactor: improve `abort_and_wait` checks

* docs: typos and improve wording

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: `FutureExt2` -> `EnsurePendingExt`

* refactor: `Queried` -> `MissAlreadyLoading`

* docs: explain `abort_or_wait` more

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-06-09 07:32:46 +00:00
Andrew Lamb f34282be2c
fix: Do not run DataFusion optimizer pass twice (#4809)
* fix: Do not run DataFusion optimizer pass twice

* docs: improve docstring and logging
2022-06-08 21:01:22 +00:00
Andrew Lamb 46de8d6cb3
refactor: remove redundant code in predicate (#4805) 2022-06-08 15:03:26 +00:00