* refactor: make partition key parsing more flexible
* feat: decode time portion of the partition key
Helpful for #8705 because we can prune partitions earlier during the
query planning w/o having to consider their parquet files at all.
* refactor: "projected schema" cache inputs must be normalized
Normalizing under the hood and returning normalized schemas w/o the user
knowing about it is a good source for subtle bugs.
* refactor: do not normalize projected schema by name
Normalizing makes it harder to predict the output and potentially
requires additional string lookups just to work with the schema.
* fix: typos
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Martin Hilton <mhilton@influxdata.com>
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Martin Hilton <mhilton@influxdata.com>
Even though all subfields of `CachedPartition` are `Arc`ed, the size of
this structure grows and copying more and more fields around for every
cache access gets quite expensive. `Arc` the whole thing and simplify
management a bit.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update DataFusion pin
* chore: Update for new API
* fix: Update for API
* fix: update compactor test
* fix: Update to patched version of arrow 46.0.0
* fix: map `DataFusionError::Configuration` to an internal error
* fix: do not use deprecated API
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: teach querier to use sort_key_ids
* chore: add an assert to capture bugs
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
`Predicate` is InfluxRPC specific and contains way more than just
filter expression.
Ref #8097.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This should prevent the CPU-bound DataFusion runtime from stalling our
main IO runtime.
This is similar to:
- `iox_query::exec::cross_rt_stream`
- https://github.com/apache/arrow-rs/pull/4015
- https://github.com/apache/arrow-rs/pull/4040
Note: I currently have no concrete evidence that this is an issue, but
worker stalling in tokio is really hard to debug and I would like to be
better safe than sorry.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Remove LIST operations from `CachedObjectStore` because:
- **not used/desired:** The querier should NEVER use LIST operations
on the object store. All the planning is done using catalog data.
- **misleading interface:** The `CachedObjectStore` -- that stores
parquet data -- should not implement uncached LIST operations,
because this is misleading. This operation will never be cached.
Or in other words: less code, less potential bugs.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: fill catalog sort_key_ids for partition with coming data
* test: sort_key_ids has empty array for newly create partition
* test: name of non-existing column
* chore: add comments to ask Andrew about the code
* chore: make comments clearer
* chore: fix a comment to avoid failure in doc
* chore: add comment for the panic if column name of sort key not found
* fix: during import files the partition has to be created with empty sort key first. Then after its files are created, the partition will be uodated with sort key
* chore: remove no longer needed comments after the bug in build_catalog test is fixed
* chore: address review comments
* refactor: Use ColumnSet type
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* chore: fix a clippy
---------
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* refactor: replace `Predicate` w/ `&[Expr]` in querier internals
First step towards #8097. This replaces most internal usages of
`Predicate` with the more appropriate `&[Expr]` within the querier code.
This is also triggered by #8443 because the new ingester protocol shall
not use `Predicate` anymore.
Note that the querier still uses `Predicate` for a few interfaces. These
will be fixed later:
- the current ingester RPC version
- chunk pruning
- `QuerierNamespace::chunks`
* fix: docs
* feat: `RemoveIfHandle::remove_if_and_get_with_status`
* fix: avoid tracing flood
Do not create a span for every partition that we get from the cache
system.
Ref https://github.com/influxdata/idpe/issues/17884.
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Make parquet_file.partition_id optional in the catalog
This will acquire a short lock on the table in postgres, per:
<https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads>
This allows us to persist data for new partitions and associate the
Parquet file catalog records with the partition records using only the
partition hash ID, rather than both that are used now.
* fix: Support transition partition ID in the catalog service
* fix: Use transition partition ID in import/export
This commit also removes support for the `--partition-id` flag of the
`influxdb_iox remote store get-table` command, which Andrew approved.
The `--partition-id` filter was getting the results of the catalog gRPC
service's query for Parquet files of a table and then keeping only the
files whose partition IDs matched. The gRPC query is no longer returning
the partition ID from the Parquet file table, and really, this command
should instead be using `GetParquetFilesByPartitionId` to only request
what's needed rather than filtering.
* feat: Support looking up Parquet files by either kind of Partition id
Regardless of which is actually stored on the Parquet file record.
That is, say there's a Partition in the catalog with:
Partition {
id: 3,
hash_id: abcdefg,
}
and a Parquet file that has:
ParquetFile {
partition_hash_id: abcdefg,
}
calling `list_by_partition_not_to_delete(PartitionId(3))` should still
return this Parquet file because it is associated with the partition
that has ID 3.
This is important for the compactor, which is currently only dealing in
PartitionIds, and I'd like to keep it that way for now to avoid having
to change Even More in this PR.
* fix: Use and set new partition ID fields everywhere they want to be
---------
Co-authored-by: Dom <dom@itsallbroken.com>
I've seen at least one case in prod where the UTC clock goes backwards.
The `TimeProvider` and `Time` interface even warns about that. However
there was a `Sub` impl that would panic if that happens and even though
this was documented, I think we can do better and just not offer a
panicky interface at all.
So this removes the `Sub` impl. and replaces all uses with
`checked_duration_since`.
* feat: batch partition catalog requests in querier
This is mostly wiring that builds on top of the other PRs linked to #8089.
I think we eventually could make the batching code nicer by adding
better wrappers / helpers, but lets do that if we have other batched
caches and this patterns proofs to be useful.
Closes#8089.
* test: extend `test_multi_get`
* test: regression test for #8286
* fix: prevent auto-flush CPU looping
* fix: panic when loading different tables at the same time
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This is mostly wiring that builds on top of the other PRs linked to #8089.
I think we eventually could make the batching code nicer by adding
better wrappers / helpers, but lets do that if we have other batched
caches and this patterns proofs to be useful.
Closes#8089.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
These places are sorting by `PartitionId` currently, which implements
`Copy`, but are about to be changed to be sorted on `PartitionHashId`,
which does not implement `Copy`.
The ingester can project arbitrary columns at query time, and has no
special requirement that the "time" column be part of that projection.
Because the timestamp summary generation explicitly requires the time
column to exist, it panics when there's no "time" column in the
projection - this is a bit of a modelling mismatch more than anything.