Commit Graph

568 Commits (0e478af470c719e8b697e1638f35ac7baa97e261)

Author SHA1 Message Date
Carol (Nichols || Goulding) 92ae8e4084
refactor: Extract a convenience constructor for Deterministic transition ids 2023-08-02 10:17:23 -04:00
Carol (Nichols || Goulding) a9b0daef8e
fix: Make partition identifier a oneof protobuf field 2023-08-02 10:17:23 -04:00
Carol (Nichols || Goulding) 308d7f3d4b
feat: Use TransitionPartitionId everywhere in the querier 2023-08-02 10:17:22 -04:00
Carol (Nichols || Goulding) ffa09b0911
fix: Update impl QueryChunk for QuerierParquetChunk 2023-08-02 10:17:22 -04:00
Carol (Nichols || Goulding) ea33c06946
fix: Update IngesterChunk's implementation of QueryChunk to return TransitionPartitionId
This doesn't get querier compiling yet.
2023-08-02 10:17:22 -04:00
Carol (Nichols || Goulding) e4b9455344
feat: Have QueryChunk return a reference from partition_id() 2023-08-02 10:17:22 -04:00
Carol (Nichols || Goulding) 13d51f40df
fix: Make partition_id optionally sent from ingesters to queriers 2023-08-02 10:17:21 -04:00
Marco Neumann aa7a38be55
fix: re-design LRU cache to be deadlock-free (#8345)
* fix: re-design LRU cache to be deadlock-free

Fixes #8334.

* test: explain test

* test: add regression test

* docs: extend "overdelete" section
2023-07-31 13:04:34 +00:00
Carol (Nichols || Goulding) 4a9e76b8b7
feat: Make parquet_file.partition_id optional in the catalog (#8339)
* feat: Make parquet_file.partition_id optional in the catalog

This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>

This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.

* fix: Support transition partition ID in the catalog service

* fix: Use transition partition ID in import/export

This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.

The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.

* feat: Support looking up Parquet files by either kind of Partition id

Regardless of which is actually stored on the Parquet file record.

That is, say there's a Partition in the catalog with:

Partition {
    id: 3,
    hash_id: abcdefg,
}

and a Parquet file that has:

ParquetFile {
    partition_hash_id: abcdefg,
}

calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.

This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.

* fix: Use and set new partition ID fields everywhere they want to be

---------

Co-authored-by: Dom <dom@itsallbroken.com>
2023-07-31 12:40:56 +00:00
Marco Neumann edf77c73d8
fix: avoid panic when clock goes backwards (#8322)
I've seen at least one case in prod where the UTC clock goes backwards.
The `TimeProvider` and `Time` interface even warns about that. However
there was a `Sub` impl that would panic if that happens and even though
this was documented, I think we can do better and just not offer a
panicky interface at all.

So this removes the `Sub` impl. and replaces all uses with
`checked_duration_since`.
2023-07-24 12:10:41 +00:00
dependabot[bot] cd31492e5b
chore(deps): Bump async-trait from 0.1.71 to 0.1.72 (#8317)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.71 to 0.1.72.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.71...0.1.72)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-24 10:07:18 +00:00
Marco Neumann 748e66731c
feat: batch partition catalog requests in querier (take 2) (#8299)
* feat: batch partition catalog requests in querier

This is mostly wiring that builds on top of the other PRs linked to #8089.

I think we eventually could make the batching code nicer by adding
better wrappers / helpers, but lets do that if we have other batched
caches and this patterns proofs to be useful.

Closes #8089.

* test: extend `test_multi_get`

* test: regression test for #8286

* fix: prevent auto-flush CPU looping

* fix: panic when loading different tables at the same time

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-24 08:24:10 +00:00
Marco Neumann d3432198b6
revert: batch partition catalog requests in querier (#8269) (#8283)
Panics in prod.

This reverts commit 0c347e8e64.
2023-07-20 09:42:40 +00:00
Marco Neumann 0c347e8e64
feat: batch partition catalog requests in querier (#8269)
This is mostly wiring that builds on top of the other PRs linked to #8089.

I think we eventually could make the batching code nicer by adding
better wrappers / helpers, but lets do that if we have other batched
caches and this patterns proofs to be useful.

Closes #8089.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-20 08:31:49 +00:00
kodiakhq[bot] ebba032399
Merge branch 'main' into cn/all-over-again 2023-07-17 14:46:48 +00:00
Carol (Nichols || Goulding) cf046d0b3e
refactor: Extract a from implementation for creating TransitionPartitionId 2023-07-17 10:34:01 -04:00
Carol (Nichols || Goulding) a9b788b58f
feat: Collate chunks based on their partition hash id if they have it 2023-07-17 10:34:01 -04:00
dependabot[bot] 4c0e5db3a5
chore(deps): Bump insta from 1.30.0 to 1.31.0 (#8242)
Bumps [insta](https://github.com/mitsuhiko/insta) from 1.30.0 to 1.31.0.
- [Changelog](https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mitsuhiko/insta/compare/1.30.0...1.31.0)

---
updated-dependencies:
- dependency-name: insta
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-17 14:01:21 +00:00
Carol (Nichols || Goulding) 313baca8b6
fix: Use sort_by rather than sort_by_key to use references
These places are sorting by `PartitionId` currently, which implements
`Copy`, but are about to be changed to be sorted on `PartitionHashId`,
which does not implement `Copy`.
2023-07-17 09:56:55 -04:00
kodiakhq[bot] 699fb70616
Merge branch 'main' into savage/propagate-tracing-spans-from-router-to-ingester 2023-07-14 12:28:56 +00:00
Dom Dwyer 7f7d1f2ee7
fix(ingester): projection without time column
The ingester can project arbitrary columns at query time, and has no
special requirement that the "time" column be part of that projection.

Because the timestamp summary generation explicitly requires the time
column to exist, it panics when there's no "time" column in the
projection - this is a bit of a modelling mismatch more than anything.
2023-07-13 14:22:48 +02:00
kodiakhq[bot] e73116a122
Merge branch 'main' into cn/query-catalog-with-either-partition-identifier 2023-07-12 14:51:02 +00:00
Fraser Savage 458b1bf1a6
feat(ingester): Extract SpanContext from RPC write request
Ensure that if a `SpanContext` type is present in the request that the
trace ID is used for spans in the RPC write path.
2023-07-12 14:22:58 +01:00
Andrew Lamb b24f9c81ba
chore: Update DataFusion pin, updates for API changed (#8199) 2023-07-11 13:36:38 +00:00
Carol (Nichols || Goulding) eec31b7f00
feat: Abstract over which partition ID type we're using to get a partition from the catalog 2023-07-10 10:43:20 -04:00
Marko Mikulicic b5faa37152
fix: Plumb tracing header name env/flag to client (#8189)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-07 21:07:29 +00:00
dependabot[bot] 26a6113a37
chore(deps): Bump async-trait from 0.1.70 to 0.1.71 (#8163)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.70 to 0.1.71.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.70...0.1.71)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-06 09:58:51 +00:00
Marco Neumann 35d93f9475
fix: include `PartitionHashId` in size estimations (#8153)
As for the other types: size estimations are conservative, so we assume
the value behind the `Arc` is owned by the estimating party.
2023-07-05 10:42:39 +00:00
dependabot[bot] b5c9628f0f
chore(deps): Bump async-trait from 0.1.69 to 0.1.70 (#8148)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.69 to 0.1.70.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.69...0.1.70)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-07-05 09:05:13 +00:00
Carol (Nichols || Goulding) b76fdab1a4
refactor: Move querier::df_stats to iox_query::chunk_statistics so it can be shared with ingester 2023-07-03 17:24:55 +02:00
Marco Neumann ce6a2fb613
refactor: remove `QueryChunk::column_values` (#8111)
Similar to #8109.

This was once implemented by the RUB but as it stands right now, no
chunk implements this anymore.

If we ever want to bring this back, we should use the output of
`QueryChunk::data` instead (i.e. use a data-based implementation instead
of a per-chunk one).

Closes #8096.
2023-07-03 09:03:21 +00:00
Marco Neumann 1b8b3ae4c3
refactor: bundle projection schema calculation (#8108)
* refactor: convert projection mask earlier

* refactor: bundle projection schema calculation

Same as #8102 but for the projected schema. This now has a nice side
effect:

1. there is no longer a per chunk cache lookup
2. there is no longer ANY per chunk async computation
3. we no longer need an early pruning stage for the chunks (we've used
   to do that so we can throw away chunks before doing the more
   expensive part of the chunk creation)

This nicely streamlines and simplifies the code.

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-30 08:27:30 +00:00
Marco Neumann b982ee180e
refactor: remove `QueryChunk::column_names` (#8109)
This interface was once specially implemented by the RUB. The only
actual implementation of it is within the querier that just forwards it
to a simple schema scan. Lift this semantic to `iox_query_influxrpc`
instead so all the chunks can use it.

If we ever want to optimize this again, we should use `QueryChunk::data`
instead (i.e. instead of implementing it within the chunk it should use
the data method and do something smart based on that).

First half of #8096.
2023-06-29 13:43:10 +00:00
Marco Neumann dcb4a9bb5c
refactor: fuse `QueryChunk` and `QueryChunkMeta` (#8107)
Closes #8095.
2023-06-29 11:02:48 +00:00
Marco Neumann 4638b89d93
refactor: migrate retention to proper predicates (#8092)
Do not (ab)use per-chunk delete predicates for the retention policy.
Instead use a per-table predicate.

This makes the code way cleaner, since the scoping is correct (i.e.
delete predicates are a table-wide attribute, not a chunk-based one) and
it is consistent time predicates that the user providers (e.g. via
`WHERE time > x`).

It also allows us to remove delete predicates (in their current,
non-scalable form) from the query path. A potential future version would
likely not use per chunk predicates (and "is processed" markers) but use
the timestamp / chunk order to determine to which data the predicate
should be applied.

Note that the lowering of the retention policy changed slightly from

```text
(time > (now() - retention)) AND (time < MAX)
```

to

```text
time > (now() - retention)
```

Since the `MAX` cut is just an artifact of the lowering and was unnecessary.

Closes #7409.
Closes #7410.
2023-06-29 08:36:37 +00:00
Marco Neumann ac236b5553
refactor: bundle partition cache requests (#8102)
* test: add regression test for high number of partition cache accesses

* refactor: bundle partition cache requests

Instead of accessing the partition cache for every single ingester
partition and parquet file, just collect all the partitions first and
request every partition only ones. Since the cache system needs to do
some locking and some bookkeeping (e.g. for LRU), this alone should be a
minimal perf win (the cache is quite efficient, so this might not be
measurable). However it also enables batching for catalog requests in
the future, see #8089.

* fix: typo

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-29 08:13:48 +00:00
dependabot[bot] b15c6062a9
chore(deps): Bump tokio from 1.28.2 to 1.29.0 (#8100)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.28.2 to 1.29.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.28.2...tokio-1.29.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-28 13:18:08 +00:00
Marco Neumann 9775e150b2
refactor: single entry point for partition cache (#8093)
For #8089 I would like to request each partition only once. Since
internally we store both the sort key and the column ranges in one cache
value anyways, there is no reason to offer two different methods to look
them up.

This only changes the `PartitionCache` interface. The actual lookups are
still separate, but will be changed in a follow-up.
2023-06-27 16:22:13 +00:00
Marco Neumann 9d8b620cd2
refactor: gather column ranges after decoding (#8090)
We need to decode the ingester data in a serial fashion (since it is a
data stream). Cache access during that phase is costly since we cannot
parallize that. To avoid that, we gather the column ranges AFTER
decoding and calculate the chunk statistics accordingly.

This refactoring also removes the partition sort key from ingester
partitions since they are not required anymore. They are a leftover of
the old physical query planning. They were not marked as "unused" since
they were used by some test code.

Required for #8089.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-27 14:44:06 +00:00
Marco Neumann 1d101bde5f
fix: panics in querier->ingester circuit breaker (#8080)
The circuit breaker needs to act on concurrent requests to the same
ingester. To do that, it performs the following steps per request:

1. check current circuit state (if open, then exit here)
2. perform request (if closed or as a half-open test request)
3. change circuit state based on results

Now only step 1 and step 3 hold locks to allow concurrency. This means
that in the meantime, the circuit state might change. To check that, the
circuit state has a generation counter.

The bug now was an overly strong assumption on the generation counter /
state change. Namely that if we are in step 3 and the state is
"half-open", then nobody else could have changed the state in the
meantime because for a single ingester, there can only be one test
request for the half-open state. While the latter part of this is
correct, the former is wrong. Namely we could have started in step 1
with a closed circuit and ended in a half-open one. Namely if the
following sequence happen:

1. request, blocks on upstream
2. circuit breaks
3. some time passes
4. a half-open requests starts, blocks on upstream
5. request from step 1 returns, finds itself confused

This now fixes the assertion (both in case that the request from step 1
succeeds and fails).

Includes tests for the two scenarios (`test_late_failure_after_half_open`,
`test_late_ok_after_half_open`) and an additional one that I came up with
while thinking about the issue (`test_late_failure_after_recovery`, was
passing on `main` but still good to have).

Fixes #8065.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-27 14:09:18 +00:00
dependabot[bot] 6e7b838b52
chore(deps): Bump insta from 1.29.0 to 1.30.0 (#8059)
Bumps [insta](https://github.com/mitsuhiko/insta) from 1.29.0 to 1.30.0.
- [Changelog](https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md)
- [Commits](https://github.com/mitsuhiko/insta/compare/1.29.0...1.30.0)

---
updated-dependencies:
- dependency-name: insta
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-23 07:45:41 +00:00
Carol (Nichols || Goulding) d991e12fbb
feat: Send PartitionHashId from ingesters to queriers 2023-06-22 09:01:22 -04:00
Carol (Nichols || Goulding) 62ba18171a
feat: Add a new hash column on the partition and parquet file tables
This will hold the deterministic ID for partitions.

Until all existing partitions have this value, this is optional/nullable.

The row ID still exists and is used as the main foreign key in the
parquet_file and skipped_compaction tables.

The hash_id has a unique index so that we can look up records based on
it (if it's available).

If the parquet file record has a partition_hash_id value, use that to
generate the object storage path instead of the partition_id.
2023-06-22 09:01:22 -04:00
Marco Neumann 4e18a5f9e8
refactor: remove querier state reconciler (#8046)
The reconciler is a leftover from the Kafka-based write path. It doesn't
do anything anymore.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-22 09:03:46 +00:00
Marco Neumann e72566e0e5
refactor: clean up querier server interface (#8045)
Move all the gRPC assembly into one single place: `ioxd_querier`. This
way `querier` no longer depends on `service_*` (except for
`service_common` which doesn't really implement gRPC but only the
namespace/database entry point).
2023-06-22 08:57:24 +00:00
Marco Neumann c9349a685f
refactor: remove pointless handler abstraction (#8044)
If your abstraction has one implementation, it ain't an abstraction.
2023-06-22 08:30:42 +00:00
Marco Neumann 686aa51b43
refactor: remove dead querier code (#8034)
Mostly leftovers from previous designs / iterations.
2023-06-22 07:33:18 +00:00
Marco Neumann 83a5037e61
feat: query support for custom partitioning (#8025)
* feat: querier-specific stat creation routine

* feat: prune querier chunks using partition col ranges

* feat: add table client

* test: custom partitioning

* fix: correctly set up stats for chunks with col subsets

* fix: flaky test

* refactor: remove obsolete dead_code markers

* feat: add partition template to `create_namespace`

* test: extend custom partitioning end2end tests

* fix: explain shuffling, make it actual deterministic
2023-06-21 09:03:19 +00:00
Andrew Lamb 5889c96501
chore: Update `datafusion` and other dependencies (#7981)
* chore: Update DatFaFusion pin

* chore: Update other dependencies

* chore: Update hakari

* fix: Update for API changes

* fix: Update explain plan

* fix: Update influxql plans

* fix: rustdoc links

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-16 09:48:55 +00:00
Marco Neumann 93ecb78ab9
feat: cache decoded partition value ranges (#8002)
Currently this only works for tags. We may want to decode the time
template as well at some point.

For #7974.
2023-06-16 09:38:34 +00:00