* chore: Update DataFusion pin to get median fix
* chore: Update for new Expr node
* test: add test for median
* test: add test for coercion of strings to timestamps
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: account for memory allocations in InfluxRPC group outputs
This should prevent the querier from OOMing.
See https://github.com/influxdata/idpe/issues/16614 .
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* refactor: pull out constant
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* refactor: DF-driven on-demand mem limit instead of ahead-of-time heuristics
Closes#6310.
* refactor: rename and tune default exec mem limits
* fix: ingester2 bits after rebase
* fix: ignore fields when considering tag predicates
* chore: update test to not use time column in predicate
* chore: update with review feedback
* chore: update tests to avoid fields refs in RPC preds
This is more like what would be coming off the wire from
Influx RPC.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: create namespace API call in router
Co-authored-by: Nga Tran <nga-tran@live.com>
* chore: treat retention as ns except in CLI
* fix: overflow in nanosecond calc
* fix: retention test after changing it from hours to ns
* chore: comment clarification in cli; better response type for error in ns API
* fix: correct some rebase mistakes
* chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const
* fix: ns autocreation test was wrong after rebase
* fix: mem catalog has default 1hr retention, accidently removed in rebase
* chore: remove mem catalogs default 1hr retention; make it settable in sets & router
Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: only push safe select expression through de-dup
Fixes#6066.
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* fix: rebase
* test: ensure we do not split ORs
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* chore: Update datafusion pin + api code
* chore: Run cargo hakari tasks
* refactor: combine_sort_key is more idomatic and add rationale comments
* refactor: satisfy borrow checker and updated comments
* fix: Add test case for combine_sort_key
* fix: Apply suggestions from code review
Co-authored-by: Marco Neumann <marco@crepererum.net>
* fix: Add back test for deeply nested expression
* fix: Update output ordering
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: tests in the reorg planner and query tests for merging parquet files
* fix: use 20 files
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Removes the need to leak the PartitionProvider outside of the ingester
crate.
This will allow the PartitionProvider to utilise a
DeferredLoad<TableName> without having to make the DeferredLoad and
TableName pub.
* test: use `EXPLAIN ANALYZE` for SQL metric tests
Needs a bit more infra (due to normalization), but this seems to be
worth it so we can easily hook up more metrics in the future.
* docs: explain regexes
It should be always clear from the context to which table a chunk
belongs.
I think having a table name bound to a chunk goes back to a time where
chunks had multiple tables.
Helps with #6049.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Changes the DmlDelete to contain the NamespaceId for which it should be
applied, propagating this value over the wire.
Like the existing IDs within the DmlWrite, these values are marked
unsafe to use due to avoid the consumers utilising them accidentally
during deployment. Unlike DmlWrite, the DmlDelete is completely unused,
so this is less of an issue.
This commit is part of a two-part change in order to add the table &
namespace IDs to the write buffer wire format. This commit forms the
first half; changing the producer to send the IDs.
In this commit the new ID values are never read on the consumer side,
ensuring there is no consumer dependency on them. This ensures they
remain operational during a rollout, where the consumer may be updated
to the latest code dependent on the IDs before the producer is updated
to send them. This also ensures we have a window of time where where the
consumers can be rolled back after being updated, and still handle
replaying messages in Kafka.
Changes the DmlWrite type to require a PartitionKey be specified,
instead of accepting an Option.
This requirement was already in place - the write buffer upheld an
invariant that all writes contained a partition key value (was not
"None") or it panicked at runtime when attempting to enqueue the write.
It is now possible to encode this invariant in the type system, which is
what this change does.
We basically assume everywhere that a column falls into one of the three
known categories (time, tag, field), so lets encode this in our type
system instead of defining "unknown" as "undefined behavior, may or may
not crash".
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: `IOxReadFilterNode` can always accumulate statistics
`IOxReadFilterNode` used to not emit statistics if one chunk has
duplicates or delete predicates. This is wrong (or at least overly
conservative), because the node itself (or the chunks themselves) do NOT
perform dedup or delete predicate filtering. Instead this is done is
done by parent nodes (`DeduplicateExec` and `FilterExec`) and its their
job to propagate statistics correctly.
Helps w/ #5897.
* test: explain setup
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Use the proper top-level DataFusion context and register the object
store there.
Note that we still hide the `ParquetExec` behind an opaque record batch
stream. Fixing that is next on my list.
Helps with #5897.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>