- Extract some shared values
- Remove an unneeded Arc::clone
- Change expects that don't provide much clarity to unwraps
- Give the test a more distinctive and less redundant name
Isolate the actual client from the query planning parts
(`Ingester{Chunk,Partition}`) so we can hook up the V2 client in #8350.
The PR looks large, but it just moves code around and decouples the
error handling.
Adds initialisation code to the routers to instantiate an
AntiEntropyActor, pre-populate the Merkle Search Tree during schema
warmup, and maintain it at runtime.
Allow an owned, compact content summary snapshot of the merkle search
tree state to be read from the MST actor.
This snapshot describes the structure of the MST in a compact/efficient
representation suitable for exchanging over the network between peers.
- take span ctx directly instead of the execution context (see point
below)
- use the original trace ID (i.e. the one that we get via HTTP header),
NOT some internal span/trace because the latter is only available for
sampled requests, while the former one is generally more available (we
also do that for the stdout logs btw.)
- minor code clean ups
This is prep work for #8774.
* chore: Update DataFusion pin again
* chore: update for different type
* fix: statistics
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat(querier): convert timezone sent from ingester
In order to facilitate the change of default timezone from None to
UTC make the querier able to convert the timezone sent from the
ingester into its preferred type. This can convert from None to UTC
or UTC to None and should allow the interaction between ingesters
and queriers with differing settings for the default timezone.
To allow testing of both conversions, the type checking has been
made more liberal when converting an arrow schema to an IOx one.
* fix: fmt
* fix: lint
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
For #8350, we won't have all the record batches from the ingester during
planning but we'll stream them during the execution. Technically the
DF plan is already based on streams, it's just `QueryChunkData` that
required a materialized `Vec<RecordBatch>`. This change moves the stream
creation up so a chunk can decide to either use `QueryChunkData::in_mem`
(which conveniently creates the stream) or it can provide its own
stream.
There where like 3 layers (metrics, observer, pruner) that all only had
a single implementation. IIRC this is a leftover from older code where
`iox_query` was more involved in query pruning. With #8705 however the
chunk pruning is pushed even closer to the source (i.e. the querier
code) and it is just more practical to perform the metric management
directly in the querier code (this was the case already, it was just
somewhat hidden by the interfaces). This also allows us to add metrics
for #8705 more easily.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Use output of #8725 within the column ranges of the querier. Currently
this won't have any effect since the column ranges are only used to
prune parquet files and parquet files come with their own, more precise
time range (and that information has priority). However for #8705 we
want to use it to prune partitions before needing to deal with the
parquet files.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>