Commit Graph

315 Commits (e7d75a5513bfca6e414ffe4e7b3a7570189c3879)

Author SHA1 Message Date
Marco Neumann c59dd01742
refactor: use concrete inner type in `CacheWithMetrics` (#5522)
The API user still CAN use dynamic dispatch but doesn't have to. This
also simplifies the generics a bit.

This is similar to #5520.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-01 06:05:59 +00:00
Marco Neumann c0dda14cef
refactor: use concrete backend type in `CacheDriver` (#5520)
This removes some `Box<dyn ...>` indirection when the user doesn't want
it (you still can, but don't have to) and makes the whole type handling
easier to understand.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-31 14:58:25 +00:00
Andrew Lamb 6669d85fb4
chore: Update datafusion + arrow/parquet to `21.0.0` (#5519)
* chore: Update arrow/arrow-flight/parquet to 21.0.0

* chore: Update datafusion pin

* chore: Fix arrow update script

* chore: Update Cargo.lock

* chore: Update for new API
2022-08-31 13:30:47 +00:00
Marco Neumann fecbbd9fa1
refactor: improve namespace caching in querier (#5492)
1. Cache converted schema instead of catalog schema. This safes a buch
   of memcopies during conversion.
2. Simplify creation of new chunks, we now only need a `CachedTable`
   instead of a namespace and a table schema.

In an artificial benchmark, this removed around 10ms from the query
(although that was prior to #5467 which moved schema conversion one
level up). Still I think it is the cleaner cache design.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-30 11:42:21 +00:00
Marco Neumann 430536f05f
refactor: use a single timestamp in policy backend (#5508)
* refactor: use a single timestamp in policy backend

Prior to this PR we had at least 1 `TimeProvider::now` calls per GET
request (for caches that only used LRU) and up to 3 calls (caches with
LRU + refresh + TTL). Let's instead use a single timestamp that is
created by the policy backend itself (instead of the policies). This has
the following consequences:

- **efficiency:** `SystemProvider::now` is not free, even though under Linux
  this doesn't result in a syscall, it uses the stdlib time system which
  also checks for monotonicity
- **consistency:** All changes for a single trigger (e.g. a
  GET cache call) now use a single timestamp instead of slightly
  increasing ones. I argue this is the better semantic, simpler to
  understand and better to debug.

For some (slightly artificial) local performance experiment, this shaves
off around 2ms per single-table SQL query. However I expect that there might
be more degenerated cases (e.g. multi-table SQL queries or some
InfluxRPC requests that hit multiple tables).

The majority of this patch is moving the `TimeProvider` from the
policies into the policy backend.

* docs: explain `now` parameter
2022-08-30 11:23:25 +00:00
Carol (Nichols || Goulding) 1b49ad25f7
refactor: Rename KafkaTopicId to TopicId 2022-08-29 14:27:02 -04:00
Carol (Nichols || Goulding) 58f0b63cdc
refactor: Rename KafkaTopic to Topic or TopicMetadata or topic name as appropriate 2022-08-29 14:27:02 -04:00
Carol (Nichols || Goulding) cb52683a1a
fix: Redo uses after rebase 2022-08-29 14:08:33 -04:00
Carol (Nichols || Goulding) 74c9529062
fix: Rename KafkaPartition to ShardIndex 2022-08-29 14:07:18 -04:00
Carol (Nichols || Goulding) 6443858870
fix: Rename compactor option from sequencer to shard 2022-08-29 14:06:45 -04:00
Carol (Nichols || Goulding) 95b7529079
fix: Rename more test values to shard 2022-08-29 14:06:45 -04:00
Carol (Nichols || Goulding) fe9c474620
fix: rustfmt 2022-08-29 14:06:45 -04:00
Carol (Nichols || Goulding) 952a3ea498
fix: Return querier sharding to use sequencer ID 2022-08-29 14:06:44 -04:00
Carol (Nichols || Goulding) 698f1a47ff
refactor: Rename test structures from sequencer to shard where appropriate 2022-08-29 14:06:44 -04:00
Jake Goulding 4abf21c724
refactor: Rename Sequencer (and its entourage) to Shard 2022-08-29 14:06:43 -04:00
Sam Arnold 05657ea068
fix: optimizations for metadata fetch and chunk pruning (#5467)
* fix: hoist repeated computation out of chunk creation

We have hundreds of chunks per table, so it is beneficial to only
do common work once.

* chore: remove TableCache as it is no longer used

* fix: prune chunks both before and after metadata fetch

Fetching the metadata for all the chunks in a table is expensive,
especially when we have a narrow time range query that only
needs a few chunks.

* chore: fix clippy

* fix: fix up some last tests

* fix: review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-29 14:59:05 +00:00
Marco Neumann 3a4a17a48e
feat: refresh namespace cache before expiration (#5449)
Closes #5318.
2022-08-29 11:52:18 +00:00
Dom Dwyer abf26767c1 refactor: infallible JumpHash initialisation
This doesn't really need to be fallible but forces propagation of a ton
of error handling - no shards is always a sign of something being very
wrong, and can be caught in the caller if it's for some reason an
acceptable state / can be recovered from.
2022-08-24 13:18:57 +02:00
Marco Neumann f34f99c5ed
refactor: port LRU cache backend to policy framework (#5406)
* refactor: port LRU cache backend to policy framework

Closes #5320.

* test: extend `test_oversized_entries`

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-17 14:43:24 +00:00
Andrew Lamb 7f0ae53d6f
chore: Update to (almost) released object_store 0.4.0 (#5419)
* chore: update object_store

* chore: update hakari config

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-17 13:44:48 +00:00
Marco Neumann 49ab568ca8
refactor: convert `remove_if` feature to policy framework (#5398)
* refactor: allow `ChangeRequest` to carry a lifetime

Let's not restrict our change functions to `'static` because this would
require us to clone loads of data to achieve predicate-based
`remove_if`.

* refactor: convert `remove_if` feature to policy framework

Decided to drop the "shared" functionality. We only use the small
`remove_if` bit which is way easier to reason about.

For #5320.

* refactor: address review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-16 08:23:27 +00:00
Marco Neumann 0ccefa0d0c
refactor: port TTL backend to policy framework (#5396)
* refactor: port TTL backend to policy framework

Note that this is "just" a port, it does NOT change how TTL works. This
will be done in #5318.

Helps with #5320.

* fix: ensure inner backend is empty

* test: add some smoke test
2022-08-15 16:48:16 +00:00
Carol (Nichols || Goulding) b982bdaf2f
fix: Derive Eq when we derive PartialEq and members can derive Eq
Allow this in generated code that we don't control, though.

Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq
2022-08-11 15:04:06 -04:00
Andrew Lamb b834bc630c
chore: more readability improvements to sort keys (#5366)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-10 17:59:25 +00:00
Andrew Lamb 16ddc5efc6
chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360)
* chore: Update datafusion and arrow

* chore: Update Cargo.lock

* chore: update to Decimal128

* chore: Update tonic/prost/pbjson/etc

* chore: Run cargo hakari tasks

* fix: doctest in generated types

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-09 17:30:44 +00:00
Andrew Lamb 172f893368
fix: fix logging typo in querier (#5345)
* fix: fix logging typo

* fix: fix type in typo fix ;(
2022-08-09 06:34:06 +00:00
Marco Neumann cd0dc42b4a
refactor: use a single chunk filter/pruning step in querier (#5338)
We already prune all chunks in the query-access layer. There's no need
to do that another time (which is actually the first time) in
`QuerierTable::chunks`. The time savings we get from feeding less chunks
into the state reconciling should be negligible. On the pro-side however
we get a more streamlined data flow and actually correct chunk pruning
metrics. Also see #5336.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-08 12:55:14 +00:00
Marco Neumann fc1870ff76
fix: chunk pruning stats (#5319)
- emit a warning if we cannot even attempt to prune chunks due to an
  error. This is always either a missing feature or a bug (even though
  it does not impact correctness but _only_ performance). Also see
  https://github.com/influxdata/conductor/issues/1107
- change metrics to clearly differentiate between "could not prune" and
  "not pruned"
- add new "not pruned" observer hook (this was missing for some reason,
  the "pruned" hook existed though)
2022-08-05 10:50:31 +00:00
Marco Neumann 0d714878ca
feat: chunk pruning metrics (#5273)
* refactor: make could-not-prune reason a static string

* refactor: introduce `QuerierTableArgs`

* feat: chunk pruning metrics

Closes #4974.

* refactor: address review comments

* refactor: use static typing for not-pruned reason

* refactor: pass chunk to not-pruned observer and use it for some metrics

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-04 15:29:21 +00:00
Nga Tran 34ccc9c7f5 chore: Revert "chore: Revert "refactor: bump batch size (#5251)" (#5288)" (#5300)
This reverts commit 471b8be92f.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-04 13:19:46 +00:00
Marco Neumann 840e4801b8
feat: make querier RAM pool split a proper feature (#5283)
* feat: make querier RAM pool split a proper feature

- use propre pool names
- expose sizing via CLI/env

Closes https://github.com/influxdata/conductor/issues/1102.

* refactor: improve naming and docs

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-03 15:27:23 +00:00
Marco Neumann 663a20d743
refactor: remove `--ingster-address` (#5255)
Closes #5002.
2022-08-03 15:05:01 +00:00
Nga Tran 471b8be92f
chore: Revert "refactor: bump batch size (#5251)" (#5288)
This reverts commit bb172f8fa8.
2022-08-03 14:23:45 +00:00
Marco Neumann 8e2443d879
feat: use two RAM pools in querier (#5271)
Quick&Dirty implementation of a RAM-pool split to see if this has any
effect. I expect the querier performance to improve due to this because
large read buffers can no longer evict precious metadata.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-02 15:14:26 +00:00
Marco Neumann ee491cbbfc
fix: re-enable querier read buffer cache (#5268)
This reverts commit 82913743f1 / #5252.

I misjudged the cache hit ratio for the RB, see
https://github.com/influxdata/k8s-infra/pull/4548

So let's bring back the RB cache until we have some form of parquet
cache in place.
2022-08-02 08:37:30 +00:00
Marco Neumann a8f6d579c8
feat: add metric for predicate-based cache entry removal (#5257) 2022-08-02 07:44:53 +00:00
Marco Neumann fec6b18d80
feat: add metric for TTL cache expiration (#5256)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-02 07:00:30 +00:00
Marco Neumann 82913743f1
refactor: disable querier read buffer cache (#5252)
Let's try and see how this performs in prod.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-01 15:43:22 +00:00
Marco Neumann bb172f8fa8
refactor: bump batch size (#5251)
This is what DataFusion uses by default and I don't see a reason why we
should use such small batch sizes.

The affect is probably only visible in certain filter-aggregate queries
that don't focus on a single series (because there we likely end up with
1 or 2 batches only, esp. after #5250) for coarse-grained filters, esp.
  when the filter key is not the first sort key.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-01 13:49:58 +00:00
dependabot[bot] fbd39844d8
chore(deps): Bump async-trait from 0.1.56 to 0.1.57 (#5247)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.56 to 0.1.57.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.56...0.1.57)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-01 08:30:33 +00:00
Andrew Lamb 9215a534d0
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` (#5229)
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0`

* chore: Run cargo hakari tasks

* fix: Update for API changes

* fix: clippy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-28 08:10:47 +00:00
Marco Neumann 9a9a1a4777
feat: limit per-table chunk data for every query (#5223)
* feat: `QueryChunk::as_any`

* feat: allo `ChunkPruner::prune_chunks` to fail

* feat: limit per-table chunk data for every query

Closes #5211.

* fix: address review comments

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-07-27 13:20:05 +00:00
Marco Neumann 85c186f5b8
feat: cache projected chunk schemas in querier (#5213)
* feat: cache projected chunk schemas in querier

Ref #5202.

* refactor: simplify size calculations

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-27 08:23:20 +00:00
Andrew Lamb 495bbe48f2
refactor: Reduce boiler plate calling `SpanRecorder::child` (#5180)
* refactor: call SpanRecorder::child

* refactor: update more locations

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-22 11:11:45 +00:00
Marco Neumann 0f54281d24 feat: trace namespace cache
For #5129.
2022-07-21 16:10:06 +02:00
Marco Neumann 9031ed390b feat: trace parquet_file cache
For #5129.
2022-07-21 16:10:06 +02:00
Marco Neumann 4c5227292f feat: trace partition cache
For #5129.
2022-07-21 16:10:06 +02:00
Marco Neumann ff88702749
feat: wire up cache tracing (1/2) (#5170)
* feat: trace tombstone cache

For #5129.

* feat: trace table cache

For #5129.

* feat: trace read buffer cache

For #5129.

* feat: trace processed_tombstones cache

For #5129.

* refactor: improve span name

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-21 13:59:55 +00:00
Nga Tran 69cb3f2b19
refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151)
* refactor: remove min_sequnce_number

* fix: typos

* fix: remove min_sequencer_number from new files from merging main

* fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier

* test: add test_compactor_collision back but modify the input to make it work woth new changes

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-21 13:51:54 +00:00
Marco Neumann b35502ce61
feat: cache tracing (#5164)
* feat: cache tracing

Add tracing to the metrics cache wrapper. The extra arguments for GET
and PEEK make this quite simple, because the wrapper can just extend the
inner args with the trace information.

We currently terminate the span in `querier::cache` (i.e. only pass in
`None`, so no tracing will occur) to keep this PR rather small. This
will be changed in subsequent PRs.

For #5129.

* fix: typo

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-21 11:54:22 +00:00