We basically assume everywhere that a column falls into one of the three
known categories (time, tag, field), so lets encode this in our type
system instead of defining "unknown" as "undefined behavior, may or may
not crash".
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Since we log trace IDs to allow easier correlation of logs no matter
what the `sampled` flag says, we should also parse these logs if we
don't have a tace collector at all.
In practice, this won't make a difference since we always deploy with a
trace collector, but it also makes the code easier to reason about.
Helps with #5975.
* feat: have a logical plan that is aware of no-deduplication
* feat: build physical scan plan that does not do deduplication
* chore: cleaup
* test: logical plans for scan with and without deduplication
* chore: clean up and a small refactor
* refactor: remove asserts on plan and rename make enable_deduplication default
* refactor: rename disable_deduplication to enable_deduplication
* chore: temporarily disable circle filter to build & push PRs
* chore: allow build & push of container image for branches using param
* chore: indentation fix in circle config
* chore: rename build_perf to release_branch
We don't support non-null tags, non-null fields, or nullable timestamps. Let's
just remove this from `schema` so that this never happens on accident.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Databases are NOT required to project chunks (in practice this is only
done by the querier for ingester-based chunks). Instead `iox_query`
should (and already does) add the right stream adapters to project
chunks or to create NULL-columns. Removing the special handling from the
test setup makes it easier to understand and also less likely that
`iox_query` starts to rely on this behavior.
Helps with #5897.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
- remove generic that is basically unused (`group_potential_duplicates`
is always called w/ `Arc<dyn QueryChunk>`)
- remove half-baked `impl` that is unused
Helps w/ #5897.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: cardinality of each batch should use row count of the batch
* chore: cleanup
* fix: auto-merge conflict
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: `IOxReadFilterNode` can always accumulate statistics
`IOxReadFilterNode` used to not emit statistics if one chunk has
duplicates or delete predicates. This is wrong (or at least overly
conservative), because the node itself (or the chunks themselves) do NOT
perform dedup or delete predicate filtering. Instead this is done is
done by parent nodes (`DeduplicateExec` and `FilterExec`) and its their
job to propagate statistics correctly.
Helps w/ #5897.
* test: explain setup
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Now that we read throw `ParquetExec`, `ROW_GROUP_READ_SIZE` is no longer
used.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This reverts commit c63312ce12.
This change fixed a low-priority alert when there was no traffic flowing
through the system. The loss in TTBR value fidelity due to bucketing is
a greater concern as it affects live, high-volume clusters and hinders
operational insight.
Use the proper top-level DataFusion context and register the object
store there.
Note that we still hide the `ParquetExec` behind an opaque record batch
stream. Fixing that is next on my list.
Helps with #5897.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>