Commit Graph

543 Commits (71625043e2b393eecc803e7c30fb3554c7a7881c)

Author SHA1 Message Date
Marko Mikulicic b5faa37152
fix: Plumb tracing header name env/flag to client (#8189)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-07 21:07:29 +00:00
dependabot[bot] 26a6113a37
chore(deps): Bump async-trait from 0.1.70 to 0.1.71 (#8163)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.70 to 0.1.71.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.70...0.1.71)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-06 09:58:51 +00:00
Marco Neumann 35d93f9475
fix: include `PartitionHashId` in size estimations (#8153)
As for the other types: size estimations are conservative, so we assume
the value behind the `Arc` is owned by the estimating party.
2023-07-05 10:42:39 +00:00
dependabot[bot] b5c9628f0f
chore(deps): Bump async-trait from 0.1.69 to 0.1.70 (#8148)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.69 to 0.1.70.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.69...0.1.70)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-05 09:05:13 +00:00
Carol (Nichols || Goulding) b76fdab1a4
refactor: Move querier::df_stats to iox_query::chunk_statistics so it can be shared with ingester 2023-07-03 17:24:55 +02:00
Marco Neumann ce6a2fb613
refactor: remove `QueryChunk::column_values` (#8111)
Similar to #8109.

This was once implemented by the RUB but as it stands right now, no
chunk implements this anymore.

If we ever want to bring this back, we should use the output of
`QueryChunk::data` instead (i.e. use a data-based implementation instead
of a per-chunk one).

Closes #8096.
2023-07-03 09:03:21 +00:00
Marco Neumann 1b8b3ae4c3
refactor: bundle projection schema calculation (#8108)
* refactor: convert projection mask earlier

* refactor: bundle projection schema calculation

Same as #8102 but for the projected schema. This now has a nice side
effect:

1. there is no longer a per chunk cache lookup
2. there is no longer ANY per chunk async computation
3. we no longer need an early pruning stage for the chunks (we've used
   to do that so we can throw away chunks before doing the more
   expensive part of the chunk creation)

This nicely streamlines and simplifies the code.

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-30 08:27:30 +00:00
Marco Neumann b982ee180e
refactor: remove `QueryChunk::column_names` (#8109)
This interface was once specially implemented by the RUB. The only
actual implementation of it is within the querier that just forwards it
to a simple schema scan. Lift this semantic to `iox_query_influxrpc`
instead so all the chunks can use it.

If we ever want to optimize this again, we should use `QueryChunk::data`
instead (i.e. instead of implementing it within the chunk it should use
the data method and do something smart based on that).

First half of #8096.
2023-06-29 13:43:10 +00:00
Marco Neumann dcb4a9bb5c
refactor: fuse `QueryChunk` and `QueryChunkMeta` (#8107)
Closes #8095.
2023-06-29 11:02:48 +00:00
Marco Neumann 4638b89d93
refactor: migrate retention to proper predicates (#8092)
Do not (ab)use per-chunk delete predicates for the retention policy.
Instead use a per-table predicate.

This makes the code way cleaner, since the scoping is correct (i.e.
delete predicates are a table-wide attribute, not a chunk-based one) and
it is consistent time predicates that the user providers (e.g. via
`WHERE time > x`).

It also allows us to remove delete predicates (in their current,
non-scalable form) from the query path. A potential future version would
likely not use per chunk predicates (and "is processed" markers) but use
the timestamp / chunk order to determine to which data the predicate
should be applied.

Note that the lowering of the retention policy changed slightly from

```text
(time > (now() - retention)) AND (time < MAX)
```

to

```text
time > (now() - retention)
```

Since the `MAX` cut is just an artifact of the lowering and was unnecessary.

Closes #7409.
Closes #7410.
2023-06-29 08:36:37 +00:00
Marco Neumann ac236b5553
refactor: bundle partition cache requests (#8102)
* test: add regression test for high number of partition cache accesses

* refactor: bundle partition cache requests

Instead of accessing the partition cache for every single ingester
partition and parquet file, just collect all the partitions first and
request every partition only ones. Since the cache system needs to do
some locking and some bookkeeping (e.g. for LRU), this alone should be a
minimal perf win (the cache is quite efficient, so this might not be
measurable). However it also enables batching for catalog requests in
the future, see #8089.

* fix: typo

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-29 08:13:48 +00:00
dependabot[bot] b15c6062a9
chore(deps): Bump tokio from 1.28.2 to 1.29.0 (#8100)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.28.2 to 1.29.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.28.2...tokio-1.29.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-28 13:18:08 +00:00
Marco Neumann 9775e150b2
refactor: single entry point for partition cache (#8093)
For #8089 I would like to request each partition only once. Since
internally we store both the sort key and the column ranges in one cache
value anyways, there is no reason to offer two different methods to look
them up.

This only changes the `PartitionCache` interface. The actual lookups are
still separate, but will be changed in a follow-up.
2023-06-27 16:22:13 +00:00
Marco Neumann 9d8b620cd2
refactor: gather column ranges after decoding (#8090)
We need to decode the ingester data in a serial fashion (since it is a
data stream). Cache access during that phase is costly since we cannot
parallize that. To avoid that, we gather the column ranges AFTER
decoding and calculate the chunk statistics accordingly.

This refactoring also removes the partition sort key from ingester
partitions since they are not required anymore. They are a leftover of
the old physical query planning. They were not marked as "unused" since
they were used by some test code.

Required for #8089.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-27 14:44:06 +00:00
Marco Neumann 1d101bde5f
fix: panics in querier->ingester circuit breaker (#8080)
The circuit breaker needs to act on concurrent requests to the same
ingester. To do that, it performs the following steps per request:

1. check current circuit state (if open, then exit here)
2. perform request (if closed or as a half-open test request)
3. change circuit state based on results

Now only step 1 and step 3 hold locks to allow concurrency. This means
that in the meantime, the circuit state might change. To check that, the
circuit state has a generation counter.

The bug now was an overly strong assumption on the generation counter /
state change. Namely that if we are in step 3 and the state is
"half-open", then nobody else could have changed the state in the
meantime because for a single ingester, there can only be one test
request for the half-open state. While the latter part of this is
correct, the former is wrong. Namely we could have started in step 1
with a closed circuit and ended in a half-open one. Namely if the
following sequence happen:

1. request, blocks on upstream
2. circuit breaks
3. some time passes
4. a half-open requests starts, blocks on upstream
5. request from step 1 returns, finds itself confused

This now fixes the assertion (both in case that the request from step 1
succeeds and fails).

Includes tests for the two scenarios (`test_late_failure_after_half_open`,
`test_late_ok_after_half_open`) and an additional one that I came up with
while thinking about the issue (`test_late_failure_after_recovery`, was
passing on `main` but still good to have).

Fixes #8065.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-27 14:09:18 +00:00
dependabot[bot] 6e7b838b52
chore(deps): Bump insta from 1.29.0 to 1.30.0 (#8059)
Bumps [insta](https://github.com/mitsuhiko/insta) from 1.29.0 to 1.30.0.
- [Changelog](https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mitsuhiko/insta/compare/1.29.0...1.30.0)

---
updated-dependencies:
- dependency-name: insta
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 07:45:41 +00:00
Carol (Nichols || Goulding) d991e12fbb
feat: Send PartitionHashId from ingesters to queriers 2023-06-22 09:01:22 -04:00
Carol (Nichols || Goulding) 62ba18171a
feat: Add a new hash column on the partition and parquet file tables
This will hold the deterministic ID for partitions.

Until all existing partitions have this value, this is optional/nullable.

The row ID still exists and is used as the main foreign key in the
parquet_file and skipped_compaction tables.

The hash_id has a unique index so that we can look up records based on
it (if it's available).

If the parquet file record has a partition_hash_id value, use that to
generate the object storage path instead of the partition_id.
2023-06-22 09:01:22 -04:00
Marco Neumann 4e18a5f9e8
refactor: remove querier state reconciler (#8046)
The reconciler is a leftover from the Kafka-based write path. It doesn't
do anything anymore.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-22 09:03:46 +00:00
Marco Neumann e72566e0e5
refactor: clean up querier server interface (#8045)
Move all the gRPC assembly into one single place: `ioxd_querier`. This
way `querier` no longer depends on `service_*` (except for
`service_common` which doesn't really implement gRPC but only the
namespace/database entry point).
2023-06-22 08:57:24 +00:00
Marco Neumann c9349a685f
refactor: remove pointless handler abstraction (#8044)
If your abstraction has one implementation, it ain't an abstraction.
2023-06-22 08:30:42 +00:00
Marco Neumann 686aa51b43
refactor: remove dead querier code (#8034)
Mostly leftovers from previous designs / iterations.
2023-06-22 07:33:18 +00:00
Marco Neumann 83a5037e61
feat: query support for custom partitioning (#8025)
* feat: querier-specific stat creation routine

* feat: prune querier chunks using partition col ranges

* feat: add table client

* test: custom partitioning

* fix: correctly set up stats for chunks with col subsets

* fix: flaky test

* refactor: remove obsolete dead_code markers

* feat: add partition template to `create_namespace`

* test: extend custom partitioning end2end tests

* fix: explain shuffling, make it actual deterministic
2023-06-21 09:03:19 +00:00
Andrew Lamb 5889c96501
chore: Update `datafusion` and other dependencies (#7981)
* chore: Update DatFaFusion pin

* chore: Update other dependencies

* chore: Update hakari

* fix: Update for API changes

* fix: Update explain plan

* fix: Update influxql plans

* fix: rustdoc links

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-16 09:48:55 +00:00
Marco Neumann 93ecb78ab9
feat: cache decoded partition value ranges (#8002)
Currently this only works for tags. We may want to decode the time
template as well at some point.

For #7974.
2023-06-16 09:38:34 +00:00
Marco Neumann 64f573c13f
feat: cache partition template in querier (#7987)
* feat: impl `Eq` for `TablePartitionTemplateOverride`

* feat: `TablePartitionTemplateOverride::size`

* feat: cache partition template in querier

Required for #7974.
2023-06-15 10:30:56 +00:00
Marco Neumann 3e26567e05
refactor: cache slices instead of vecs (#7989)
Immutable `Box<Vec<T>>`/`Arc<Vec<T>>` are better stored as
`Box<[T]>`/`Arc<[T]>` because:

- allocation always exact (no need for `shrink_to_fit`)
- smaller (the fat pointer is just the memory address and the length, no
  capacity required)
- less allocation (`Box`/`Arc` -> slice instead of `Box`/`Arc` -> `Vec`
  -> buffer); in fact the vector itself was offen missing in the
  accounting code

Found while I was working on #7987.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-15 08:01:55 +00:00
Marco Neumann 453a361d3c
feat: catalog parquet file cache TTL (#7975)
Avoid that the querier accesses files that were flagged for deletion a
long time ago. This would happen if the following conditions hold:

- we have very long-running querier pods (e.g. over holidays)
- the table doesn't receive any writes (or the partition if we ever
  change the cache granularity), hence the querier is never informed
  that its state is out-of-date
- a compactor runs a cold compaction, and by doing so flags a file for
  deletion
- the GC finally wants to delete it

This is mostly a safety measure to prevent weird internal server errors
that should nearly never happen. On the other hand I do not want to hunt
Heisenbugs.
2023-06-12 14:02:47 +00:00
Andrew Lamb 17c0d837b3
chore: Update DataFusion, arrow, object_store pins (#7942)
* chore: Update DataFusion, arrow, object_store pins

* chore: Update for hakari

* chore: Update for new APIs

* fix: update test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-07 17:08:31 +00:00
Marco Neumann fa5011197c
refactor: migrate `iox_query` to use DataFusion statistics (#7908)
This is the major part of #7470. Additional clean ups (e.g. to remove
the actual types from `data_types`) will follow.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-02 09:18:59 +00:00
Andrew Lamb 1ff76b7bf2 chore: use workspace dependencies for `object_store` 2023-05-26 07:03:42 -04:00
Andrew Lamb c1a448e930
feat: Add decoded payload type and size to querier <--> ingester tracing (#7870)
* feat: Add decoded payload type and size to querier <--> ingester tracing

* feat: add aggregate sizes

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-26 10:05:14 +00:00
Andrew Lamb d68a399a7b
fix: fix span name (#7868)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-25 17:40:43 +00:00
Dom Dwyer 928a4d163e
build: remove unused dependencies from crates
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).

I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!

    https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html

This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
2023-05-23 14:55:43 +02:00
Marco Neumann 31b8813760
feat: hide `system.queries` table from prod by default (#7810)
Introduce a new header called `iox-debug` which when set enables certain
debug features. The first one will be the `system.queries` table which
is a process-local, namespace-scoped query log. In most prod setups this
is only useful for debugging and will confuse the user a lot because
when multiple queries are deployed then the K8s routing decides which
pod/process the users hits. This leads to an inconsistent view. However
the log is still useful for debugging.

This also wires the "debug header set" flag through the Flight ticket,
because JDBC proved (integration tests FTW!) that headers are only
passed to `GetFlightInfo` but not to `DoGet` and the ticket must encode
all the relevant information.

Closes #7119.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-22 12:29:24 +00:00
Andrew Lamb 6344fe8c3f
chore: Add rationale for `clippy::future_not_send` (#7822)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-18 16:58:56 +00:00
Dom 6aa634c1b9
Merge branch 'main' into cn/move-peas 2023-05-15 13:29:42 +01:00
dependabot[bot] fba9836f2a
chore(deps): Bump pin-project from 1.0.12 to 1.1.0
Bumps [pin-project](https://github.com/taiki-e/pin-project) from 1.0.12 to 1.1.0.
- [Release notes](https://github.com/taiki-e/pin-project/releases)
- [Changelog](https://github.com/taiki-e/pin-project/blob/main/CHANGELOG.md)
- [Commits](https://github.com/taiki-e/pin-project/compare/v1.0.12...v1.1.0)

---
updated-dependencies:
- dependency-name: pin-project
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-05-15 02:02:32 +00:00
Carol (Nichols || Goulding) 1770d0f4d8
fix: Move ingester-querier gRPC communication to its own crate 2023-05-12 13:28:30 -04:00
Carol (Nichols || Goulding) 92e5036943
fix: Size of ColumnSet shouldn't be using ChunkId (#7786) 2023-05-12 14:58:03 +00:00
Carol (Nichols || Goulding) cc41216382
fix: Undo the addition of a TableInfo type; store partition_template on TableSchema 2023-05-09 14:54:59 +02:00
Carol (Nichols || Goulding) 596673d515
refactor: Create a new ColumnsByName type to abstract over TableSchema columns
And allow usage of just the columns when that's all that's needed
without leaking the BTreeMap implementation detail everywhere
2023-05-09 14:54:58 +02:00
Carol (Nichols || Goulding) 1f1dcc947d
fix: Don't change how the compactor gets the table schema 2023-05-09 14:54:58 +02:00
Carol (Nichols || Goulding) 58d9c40ffd
feat: If namespace or table partition templates are specified, use those 2023-05-09 14:54:57 +02:00
Carol (Nichols || Goulding) 56916cf942
fix: Rename ingester2 to ingester 2023-05-08 12:03:05 -04:00
Andrew Lamb 2860d87fe1
chore: Update DataFusion (#7756)
* chore: Update DataFusion pin

* chore: Update explain plans

* chore: Run cargo hakari tasks

---------

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2023-05-05 18:58:18 +00:00
Carol (Nichols || Goulding) 621caab2e9
fix: Remove unused parquet_max_sequence_number metadata 2023-05-03 10:57:27 -04:00
Carol (Nichols || Goulding) dfa184e296
fix: Make ingester UUID an expected, required field of IngesterPartition 2023-05-03 10:45:02 -04:00
Marco Neumann 0556fdae53
refactor: remove `QueryChunk::partition_sort_key` (#7680)
As of #7250 / #7449 the partition sort key is no longer required for
query planning. Instead we use a combination of
`QueryChunk::partition_id` and `QueryChunk::sort_key` which is more
robust and easier to reason about.

Removing it simplifies the querier code a lot since we no longer need to
have a sort key for the ingester chunks and also don't need to "sync"
the sort key between chunks for consistency.
2023-04-27 10:54:41 +00:00
Marco Neumann 2bf867ea0a
refactor: do not block on querier cache warm-up (#7679)
Warming up a cache should not block the planning, it is a mere signal to
the cache system to start to fetch data. See code comment for more
details.

This lowers the query latency in a few cases. I've seen at least one
trace were this would have been useful. This will never make things
worse (because the cache system drives the request to completion anyways).

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-04-27 08:57:55 +00:00