* chore: Make it clear there is only a single DataFusion memory pool
* fix: assert there is a single bridge
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
`Predicate` is InfluxRPC specific and contains way more than just
filter expression.
Ref #8097.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: replace `Predicate` w/ `&[Expr]` in querier internals
First step towards #8097. This replaces most internal usages of
`Predicate` with the more appropriate `&[Expr]` within the querier code.
This is also triggered by #8443 because the new ingester protocol shall
not use `Predicate` anymore.
Note that the querier still uses `Predicate` for a few interfaces. These
will be fixed later:
- the current ingester RPC version
- chunk pruning
- `QuerierNamespace::chunks`
* fix: docs
* feat: push sort through union
* test: add a few more tests based on review feedback
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Thus using the PartitionHashId if one is available. This does not
compile yet because of all the uses of QueryChunk, but iox_query
compiles and passes its tests.
* chore: Update datafusion pin
* fix: Update for change in API
* chore: Update plan
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
When a long running query is in process and the querier is shutting
down, it might happen that the executor (= thread pool and tokio
executor responsible for the CPU-bound DataFusion execution) is shut
down while the query is running. From a "systems interaction" PoV I
think this is totally fine and I would like to avoid some weird
ref-counting. Or in other words: if the system is shutting down, shut it
down.
However the error was treated as "internal" which is not useful. The
client should rather be informed that its server was gone and that it is
OK (and desired) to retry. So as per
<https://grpc.github.io/grpc/core/md_doc_statuscodes.html> I think this
should signal "unavailable".
This change wires the error code in such a way that the gRPC service
layer can properly inspect it and then changes the error mapping.
Ref https://github.com/influxdata/idpe/issues/17917 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
These places are sorting by `PartitionId` currently, which implements
`Copy`, but are about to be changed to be sorted on `PartitionHashId`,
which does not implement `Copy`.
The ingester can project arbitrary columns at query time, and has no
special requirement that the "time" column be part of that projection.
Because the timestamp summary generation explicitly requires the time
column to exist, it panics when there's no "time" column in the
projection - this is a bit of a modelling mismatch more than anything.
Similar to #8109.
This was once implemented by the RUB but as it stands right now, no
chunk implements this anymore.
If we ever want to bring this back, we should use the output of
`QueryChunk::data` instead (i.e. use a data-based implementation instead
of a per-chunk one).
Closes#8096.
This interface was once specially implemented by the RUB. The only
actual implementation of it is within the querier that just forwards it
to a simple schema scan. Lift this semantic to `iox_query_influxrpc`
instead so all the chunks can use it.
If we ever want to optimize this again, we should use `QueryChunk::data`
instead (i.e. instead of implementing it within the chunk it should use
the data method and do something smart based on that).
First half of #8096.
* feat: metrics for main tokio runtime
* feat: instrument executor tokio runtime
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Do not (ab)use per-chunk delete predicates for the retention policy.
Instead use a per-table predicate.
This makes the code way cleaner, since the scoping is correct (i.e.
delete predicates are a table-wide attribute, not a chunk-based one) and
it is consistent time predicates that the user providers (e.g. via
`WHERE time > x`).
It also allows us to remove delete predicates (in their current,
non-scalable form) from the query path. A potential future version would
likely not use per chunk predicates (and "is processed" markers) but use
the timestamp / chunk order to determine to which data the predicate
should be applied.
Note that the lowering of the retention policy changed slightly from
```text
(time > (now() - retention)) AND (time < MAX)
```
to
```text
time > (now() - retention)
```
Since the `MAX` cut is just an artifact of the lowering and was unnecessary.
Closes#7409.
Closes#7410.
* chore: Update datafusion + arrow/arrow-flight/parquet to version `42.0.0`
* chore: Update for new APIs
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update DataFusion pin
* chore: Update API changes
* chore: Don't use deprecated API
* chore: Run cargo hakari tasks
* chore: Update tests due to changes in logical plan nodes from DF update
* chore: Fix broken links in docs
* chore: Adjust changes to expected output
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
See optimizer pipeline here:
5d0bb68c5b/iox_query/src/physical_optimizer/mod.rs (L33-L35)
After generating the naive initial plan w/ many partitions, we must
consider more than 100 partitions to split the key space.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update DataFusion pin
* chore: Update cargo
* fix: update for API changes
* fix: Update plans
* chore: Update for new api
* fix: Update plans
* chore: Update for API changes more
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This is the major part of #7470. Additional clean ups (e.g. to remove
the actual types from `data_types`) will follow.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>