The motivations are:
1. The API uses a SINGLE predicate and adds that to many chunks. With
`Arc<Vec<...>>` you gain nothing, with `Vec<Arc<...>>` the predicate
is only stored once (in many vectors)
2. While we currently add predicates blindly to all chunks, we can be way
smarter in the future and prune out tables, partitions or even single
chunks (based on statistics). With that, it will be rare that many
chunks share the exact same set of predicates.
3. It would be nice if we could de-duplicate predicates when writing them
to the preserved catalog without needing to repeat the pruning
discussed in point 2. This is way easier to implement whan chunks
exists in `Arc`s.
4. As a side-note: the `Arc<Vec<...>>` wasn't really cloned around but
instead was created many time. So the new version should be more
memory efficient out of the box.
* fix: Capture query execution traces for storage gRPC queries as well
* refactor: remove debugging droppings
* refactor: do not Box::pin within TracedStream
* refactor: Use Futures::TryStreamExt rather than custom collect function
* fix: remove wild println
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Translate DataFusion execution metrics to IOx Spans
* fix: add end to end test to ensure plumbing is hookedup
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Two reasons:
1. I wanna decouple `parquet_file` from `query` (nearly done, needs a
small follow-up PR).
2. `predicate` will have more and more features (like serialization)
which justifies a new home
The query processing was implicitly relying on the order provided by the
catalog. This had two issues:
- this ordering was not defined in the API contract (neither via docs
nor via typing)
- the order was based on chunk IDs which is not adequate in some cases
(e.g. when chunks are created while a persistence operations is in
progress)
Now we explicitly sort chunks by `(order, ID)`.
Fixes#1963.
* chore: Update datafusion deps to pre-release
* refactor: Update IOx to use new datafusion Statistics
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: update datafusion and other deps
* fix: Update InfluxRPC frontend with new op types
* fix: Update test output for new column names
* fix: typos and unintended changes
* fix: Update query_tests
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: nocache feature code rot
The MBChunk::snapshot code when using the "nocache" option no longer
compiles - this commit updates it to match the not(nocache) code.
* build: use updated broken_intra_doc_links name
The broken_intra_doc_links lint was renamed
rustdoc::broken_intra_doc_links
https://doc.rust-lang.org/rustdoc/lints.html