Commit Graph

8663 Commits (b21799acaea38be920b1cc5942a3243abd37dbbf)

Author SHA1 Message Date
Andrew Lamb b21799acae
chore: Update datafusion, get `date_bin` (#5340)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-09 11:01:37 +00:00
Raphael Taylor-Davies dfa862fd53
chore: temporary allow http always (#5357) 2022-08-09 10:54:42 +00:00
Andrew Lamb 7219f512c3
fix: update sort key in catalog before adding parquet file to catalog (#5333)
* fix: update sort key before parquet file

* fix: Remove left over debugging

* fix: fix bug, improve logging

* chore: move debug log after catalog update, improve args and docs
2022-08-09 10:27:51 +00:00
Raphael Taylor-Davies ccb45d7bac
chore: update to rusoto-less object_store (#5342)
* chore: update to rusoto-less object_store

* chore: Run cargo hakari tasks

* chore: further fixes

* chore: document workaround

* chore: review feedback

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-09 09:06:03 +00:00
kodiakhq[bot] a4923a709a
Merge pull request #5341 from influxdata/dom/instrument-kafka
feat: instrument Kafka produce latency
2022-08-09 08:27:05 +00:00
kodiakhq[bot] eebeae0fc6
Merge branch 'main' into dom/instrument-kafka 2022-08-09 08:19:54 +00:00
Andrew Lamb 172f893368
fix: fix logging typo in querier (#5345)
* fix: fix logging typo

* fix: fix type in typo fix ;(
2022-08-09 06:34:06 +00:00
Marco Neumann 3f55335d91
fix: create proper storage regex request via CLI (#5343)
The literal for regex comparisons within the storage API are NOT
strings but regex nodes. This was a mistake in #5281.
2022-08-08 15:41:24 +00:00
Dom Dwyer 87e4290e1f refactor(write_buffer): database_name -> topic_name
Previously IOx mapped a single database to a single kafka topic - this
is no longer the case, so referring to the kafka topic name as the
"database name" name is confusing.
2022-08-08 15:24:35 +02:00
Dom Dwyer c133cf22c6 refactor: use kafka produce instrumentation
This commit changes the IOx write buffer initialisation code to add the
KafkaProducerMetrics instrumentation to the per-partition Kafka clients.
2022-08-08 15:24:35 +02:00
Dom Dwyer 284a3069ce feat: Kafka client produce() instrumentation
Adds a decorator over the underlying kafka client to capture the latency
distribution of the low-level kafka writes, independent of the
aggregation/DML batching framework that sits "above" this client.

The latency measurements include the serialisation overhead, protocol
overhead, and actual network I/O.
2022-08-08 15:24:35 +02:00
Dom Dwyer d003fe0047 refactor: const KafkaPartition::new()
Allows this fn to be called from const contexts (useful in test setups).
2022-08-08 14:56:03 +02:00
Dom Dwyer 323788767d refactor: impl TimeProvider for Arc<TimeProvider>
This allows the MockProvider to be used in tests with consuming code
that uses generics/static dispatch instead of a dyn TimeProvider, while
still retaining a ref to the MockProvider instance.
2022-08-08 14:56:03 +02:00
Marco Neumann cd0dc42b4a
refactor: use a single chunk filter/pruning step in querier (#5338)
We already prune all chunks in the query-access layer. There's no need
to do that another time (which is actually the first time) in
`QuerierTable::chunks`. The time savings we get from feeding less chunks
into the state reconciling should be negligible. On the pro-side however
we get a more streamlined data flow and actually correct chunk pruning
metrics. Also see #5336.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-08 12:55:14 +00:00
Andrew Lamb f3913f89e3
chore: Update datafusion (to get fix for pruning bug) (#5339)
* chore: Update datafusion

* chore: Update AggregateSelector API
2022-08-08 12:28:21 +00:00
Marco Neumann 5f407ec8cd
chore: ignore a few profiling-related files (#5337)
* chore: git-ignore heaptrack output

* chore: git-ignore perf outputs
2022-08-08 12:04:06 +00:00
Andrew Lamb f9d0e37144
chore: reduce h2 and hyper logging level in tests (#5332)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-08 09:39:26 +00:00
dependabot[bot] 3a697df261
chore(deps): Bump sqlparser from 0.19.0 to 0.20.0 (#5335)
Bumps [sqlparser](https://github.com/sqlparser-rs/sqlparser-rs) from 0.19.0 to 0.20.0.
- [Release notes](https://github.com/sqlparser-rs/sqlparser-rs/releases)
- [Changelog](https://github.com/sqlparser-rs/sqlparser-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/sqlparser-rs/sqlparser-rs/compare/v0.19.0...v0.20.0)

---
updated-dependencies:
- dependency-name: sqlparser
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-08 08:23:31 +00:00
Andrew Lamb 7066c4e679
fix: make it clear rpc_predicates are only ever specialized when a schema is known (#5315)
* fix: make it clear rpc_predicates are only ever specialized when a schema is known

* fix: handle case of no schema

* fix: Update predicate/src/rpc_predicate.rs
2022-08-06 10:56:53 +00:00
Nga Tran b71c1a09ea
feat: only sleep when there are neither hot nor cold partitions to compact (#5329)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-05 16:36:36 +00:00
Andrew Lamb 38a0cdbb4a
fix: Install cargo deny in ci image (#5317)
* fix: install cargo deny in ci image

* fix: Update docker/Dockerfile.ci

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: Apply suggestions from code review

Co-authored-by: Marco Neumann <marco@crepererum.net>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-05 15:51:35 +00:00
Marco Neumann fc1870ff76
fix: chunk pruning stats (#5319)
- emit a warning if we cannot even attempt to prune chunks due to an
  error. This is always either a missing feature or a bug (even though
  it does not impact correctness but _only_ performance). Also see
  https://github.com/influxdata/conductor/issues/1107
- change metrics to clearly differentiate between "could not prune" and
  "not pruned"
- add new "not pruned" observer hook (this was missing for some reason,
  the "pruned" hook existed though)
2022-08-05 10:50:31 +00:00
kodiakhq[bot] 4898a7f1e3
Merge pull request #5303 from influxdata/cn/upgrade-cold-nonoverlapping-l0
feat: Compact cold partitions; upgrade a single non-overlapping level 0 file to level 1 without running compaction
2022-08-04 21:02:28 +00:00
Carol (Nichols || Goulding) facc967320
fix: Specify hot or cold in more log messages 2022-08-04 16:55:48 -04:00
Carol (Nichols || Goulding) c9d66c30b1
fix: Make this field name consistent
With the other fields on this struct and with the corresponding field on
the clap block struct.
2022-08-04 16:55:48 -04:00
Carol (Nichols || Goulding) da0b031c44
feat: Add parameters to limit total memory usage of cold partition compaction 2022-08-04 16:55:48 -04:00
Carol (Nichols || Goulding) 9d8f94d0d7
fix: Remove an unneeded sleep
The cold case won't make a hot busy loop (hah), we'll just go back to
working on the hot partitions if there's no cold partitions to do.
2022-08-04 16:55:48 -04:00
Carol (Nichols || Goulding) e1c45e836a
test: Remove copypastaed assertions that duplicate a different test 2022-08-04 16:55:48 -04:00
Carol (Nichols || Goulding) cb6442018e
test: Add more test cases varying number of partitions per sequencer 2022-08-04 16:55:48 -04:00
Carol (Nichols || Goulding) d55f45a5c2
feat: Run compaction of hot partitions a configurable number of times more than cold 2022-08-04 16:55:48 -04:00
Carol (Nichols || Goulding) 827e82cfb8
feat: Upgrade one level 0, non-overlapping file without compacting
Fixes #1078.
2022-08-04 16:55:47 -04:00
Carol (Nichols || Goulding) c1d016a00a
feat: Upgrade cold level 0 files when they have no overlaps 2022-08-04 16:55:47 -04:00
Carol (Nichols || Goulding) 9052eabe50
feat: Separate out hot/cold partition compaction and filtering
Cold partition compaction will (in the next commit) upgrade a level 0
file without any overlaps rather than running compaction.

Cold partition filtering gathers all level 0 files in the (already
deemed cold) partition with all overlapping level 1 files, and does not
limit the set of files being compacted by their number or size.
2022-08-04 16:55:47 -04:00
Carol (Nichols || Goulding) fc62c82722
feat: Select cold partitions 2022-08-04 16:55:47 -04:00
Carol (Nichols || Goulding) 6e9c752230
refactor: Extract current compaction into a fn for 'hot' partitions 2022-08-04 16:55:47 -04:00
Andrew Lamb e82214ed38
chore: fix `cargo audit`, update deps to get new chrono (#5316)
* chore: update deps to get new chrono

* chore: Run cargo hakari tasks

* chore: migrate away from deprecated API

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-04 20:49:28 +00:00
Andrew Lamb e0ea335b70
fix: Support RegExMatch and RegExNotMatch predicates on `_field` (#5301)
* test: add tests for regex_match_on_field

* feat: more general `_field` predicate handling

* fix: remove old comment

* fix: update tests

* fix: improve test a little more

* fix: fmt

* fix: Update predicate/src/rpc_predicate/field_rewrite.rs

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>

* fix: Handle predicates that can not be evaluated

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-04 19:42:16 +00:00
Raphael Taylor-Davies b5ea7fe441
chore: add libprotobuf-dev to CI image (#5269)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-04 18:10:24 +00:00
Marco Neumann 9851a5e357
docs: extend profiling docs with stuff I've learned (#5313)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-04 16:34:07 +00:00
Marco Neumann 0d714878ca
feat: chunk pruning metrics (#5273)
* refactor: make could-not-prune reason a static string

* refactor: introduce `QuerierTableArgs`

* feat: chunk pruning metrics

Closes #4974.

* refactor: address review comments

* refactor: use static typing for not-pruned reason

* refactor: pass chunk to not-pruned observer and use it for some metrics

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-04 15:29:21 +00:00
kodiakhq[bot] 96419b78e0
Merge pull request #5311 from influxdata/dom/instrument-kafka-produce
build: bump rskafka to latest
2022-08-04 15:20:45 +00:00
kodiakhq[bot] 0ba3ae1e0d
Merge branch 'main' into dom/instrument-kafka-produce 2022-08-04 15:13:49 +00:00
kodiakhq[bot] 600617ec08
Merge pull request #5307 from influxdata/dom/instrument-agg
feat: instrument kafka aggregated DML batch size
2022-08-04 14:57:40 +00:00
Dom Dwyer 36d36c507c ci: bump redpanda version 2022-08-04 16:57:28 +02:00
kodiakhq[bot] 76d3a12dab
Merge branch 'main' into dom/instrument-agg 2022-08-04 14:49:10 +00:00
Dom Dwyer 77fd967517 feat: instrument kafka aggregated DML batch size
The Kafka write buffer implementation (and only the Kafka impl) merges
together successive DML writes for the same namespace & partition within
a window of time.

This commit records the number of DML writes that have been merged
together to form a single batched op before it is dispatched to Kafka.
2022-08-04 16:48:56 +02:00
dependabot[bot] e8231b2986
chore(deps): Bump serde_json from 1.0.82 to 1.0.83 (#5297)
* chore(deps): Bump serde_json from 1.0.82 to 1.0.83

Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.82 to 1.0.83.
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](https://github.com/serde-rs/json/compare/v1.0.82...v1.0.83)

---
updated-dependencies:
- dependency-name: serde_json
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-04 14:28:29 +00:00
Marco Neumann e24cecd926
fix: buffer allocation while reading parquet files (#5312)
Work around https://github.com/apache/arrow-rs/issues/2321 by limiting
reader batch size to number of rows (based on file-level metadata).

Fixes https://github.com/influxdata/conductor/issues/1103 .
2022-08-04 14:21:05 +00:00
Andrew Lamb 3989ac1386
refactor: remove `split_members` and use `split_conjunction` from upstream DataFusion (#5308)
* refactor: remove split_members and use split_conjunction from datafusion

* fix: clippy
2022-08-04 13:58:59 +00:00
Marco Neumann eea8270e83
fix: `compute_split_time` with small step sizes (#5309)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-04 13:40:30 +00:00