* refactor: Move `flightsql` code into its own module
* fix: get schema from LogicalPlan
* refactor: use arrow_flight::sql::Any instead of prost_types::any
* fix: cleanup docs and avoid as_ref
* fix: Use Bytes
* fix: use Any::pack
* fix: doclink
* refactor: Drop Expr::UnaryOp to simplify tree traversal
The UnaryOp doesn't provide and additional value and complicates
walking the AST, as literal values wrapped in a UnaryOp(Minus, ...)
require extra handling when reducing time range expressions, etc.
This change also is true to the InfluxQL Go implementation,
which represents whole number literals as signed integers unless
they exceed i64::MAX.
* chore: Refactor all usages of format!("{}", ?) to ?.to_string()
Per https://github.com/influxdata/influxdb_iox/pull/6600#discussion_r1072028895
* refactor: remove unused code
* refactor: make fn private
* feat: safely stream data from one tokio runtime to another
Closes#6577.
* refactor: review comments
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* docs: improve
* test: explain
* test: make tests more tricky
* refactor: improve error message
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Parse IANA timezone strings to chrono_tz::Tz
* feat: Visitors can customise the return error type
This avoids having to remap errors from `&'static str` to the caller's
error type, and will be used in a future PR for time range expressions.
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
- name exec driver thread (instead of using the default that `thread::spawn`
gives us)
- provide number to every worker thread (both for the dedicatd executor
and for the main runtime)
- shorten thread names (current naming too long for most debug tools)
* feat: InfluxQL learns how to plan some queries
Also added a means to test the planner and execution
* chore: Update module docs
* chore: Document the planner functions
* chore: Update end_to_end_cases crate
* chore: Clarify why `SLIMIT` and `SOFFSET` return `NotImplemented`
* chore: Address lint issues
* chore: Fix rustdoc link issue
* chore: Remove InfluxQL tests from query_tests crate
Will follow conventions established by @carols10cents when
new query_tests crate is merged.
* chore: `now` field
`now` is a DataFusion built-in scalar function
* chore: remove unused code
* chore: Add additional arithmetic expression tests
* chore: Establish pattern for identifying and tracking InfluxQL issues
* chore: Add tests for case sensitivity issues
* chore: group tests into modules and functions
This avoids mass rewriting of insta snapshots as new
tests are added to each function. When tests are added in the middle,
existing snapshots are renamed (-N+1, -N+2, etc) resulting in
having to review numerous additional snapshots.
The current version is barely readable because the logged schema w/ all
it's metadata is soooo long.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Add timestamp data type
* feat: Add with_quiet API to suppress output to STDOUT
* fix: Field name resolution to match InfluxQL
* refactor: Allow TestChunks to be directly accessed
This will be useful when testing the InfluxQL planner.
* fix: Add Timestamp case to var_ref module
* feat: Add InfluxQL compatible column naming
* chore: Add doc comment.
* fix: keywords may be followed by a `!` such as `!=`
* fix: field_name improvements
* No longer clones expressions
* Explicitly handle all Expr enumerated items
* more tests
* fix: collision with explicitly aliased column
Fixes case where column is explicitly aliased to an auto-named variant.
Test case added to validate.
* chore: Move logic to context, in line with DataFusion SQL
* chore: Add ordering for InfluxQL data types
Ordering is used to determine automatic casting operations. If two
field columns are present in an expression, one float and one integer,
the integer should be cast to a float, such that the final expression
will be a float.
* chore: Add DerefMut trait to collection types
Will allow these collections to be mutated when traversing the InfluxQL
AST.
* chore: Add influxql module with initial AST normalisation implementation
* chore: Add more unit tests and docs
* chore: Run cargo hakari tasks
* chore: Fix link
* chore: Support regular expression expansion and Call expressions
* chore: Add tests for walk_expr functions
* chore: Add insta snapshot files
* chore: Add docs and make API accessible to the crate
* chore: Move to Arc<dyn SchemaProvider> for use in influxql planner
* chore: Move code back; it is better encapsulated here
* chore: Remove redundant attribute
* chore: Improve regex compatibility with InfluxQL / Go
* chore: Style improvement.
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
This commit changes the behaviour of the persist system to enable
optimal parallelism of persist operations, and improve the accuracy of
the outstanding job bound / back-pressure.
Previously all persist operations for a given partition were
consistently hashed to a single worker task. This serialised persistence
per partition, ensuring all updates to the partition sort key were
serialised. However, this also unnecessarily serialises persist
operations that do not need to update the sort key, reducing the
potential throughput of the system; in the worst case of a single
partition receiving all the writes, only one worker would be persisting,
and the other N-1 workers would be idle.
After this change, the sort key is inspected when enqueuing the persist
operation and if it can be determined that no sort key update is
necessary (the typical case), then the persist task is placed into a
global work queue from which all workers consume. This allows for
maximal parallelisation of these jobs, and the removes the per-worker
head-of-line blocking.
In the case that the sort key does need updating, these jobs continue to
be consistently hashed to a single worker, ensuring serialised sort key
updates only where necessary.
To support these changes, the back-pressure system has been changed to
account for all outstanding persist jobs in the system, regardless of
type or assigned worker - a logical, bounded queue is composed together
of a semaphore limiting the number of persist tasks overall, and a
series of physical, unbounded queues - one to each worker & the global
queue. The overall system remains bounded by the
INFLUXDB_IOX_PERSIST_QUEUE_DEPTH value, and is now simpler to reason
about (it is independent of the number of workers, etc).
* fix: account for memory allocations in InfluxRPC group outputs
This should prevent the querier from OOMing.
See https://github.com/influxdata/idpe/issues/16614 .
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* refactor: pull out constant
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* fix: gRPC errors regarding group cols
- missing group col prev. produced an "internal error" but should be
"invalid argument"
- duplicate group cols produced a panic but should also be "invalid
argument"
* docs: clarify
* refactor: DF-driven on-demand mem limit instead of ahead-of-time heuristics
Closes#6310.
* refactor: rename and tune default exec mem limits
* fix: ingester2 bits after rebase
* fix: check schemas in `pretty_print_batches`
I think most users of this function (and `assert_batches_eq`) assume
that all batches have the same schema. If not, `pretty_print_batches`
may either fail producing an actual table (some rows may have more or
less columns) or silently produce a table that looks "alright".
* fix: equalize schemas where it is required/desired
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Have a single global test executor w/ reasonable defaults. Also don't
require tests to join/await executor shutdowns (most tests forget this
anyways and will get a runtime warning).
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
`RecordBatch` offers zero-copy slicing, so there is no need to store the
row range manually. This makes #6216 simpler.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
`None` was only used for testing and even than we should probably have a
proper executor instead of panicking for some methods.
Found while working on #6216.
* fix: ignore fields when considering tag predicates
* chore: update test to not use time column in predicate
* chore: update with review feedback
* chore: update tests to avoid fields refs in RPC preds
This is more like what would be coming off the wire from
Influx RPC.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Introduce InfluxQL to Flight
All InfluxQL queries will fail with an error
* chore: Temper protobuf lint
* chore: Finalize flight.proto changes; fix tests
* chore: Add tests for InfluxQL planner
* chore: Update docs
* chore: Update docs
* chore: Rename back to original
* chore: Use .into() rather than cast
* chore: Use function rather than field
* chore: Improved InfluxQL planner name
* chore: Restore `impl Into<String>` argument
* chore: Add a comment that Go clients are unable to execute InfluxQL
* chore: Add a test for the `--lang` argument and InfluxQL
* fix: only push safe select expression through de-dup
Fixes#6066.
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* fix: rebase
* test: ensure we do not split ORs
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* chore: Update datafusion pin + api code
* chore: Run cargo hakari tasks
* refactor: combine_sort_key is more idomatic and add rationale comments
* refactor: satisfy borrow checker and updated comments
* fix: Add test case for combine_sort_key
* fix: Apply suggestions from code review
Co-authored-by: Marco Neumann <marco@crepererum.net>
* fix: Add back test for deeply nested expression
* fix: Update output ordering
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: tests in the reorg planner and query tests for merging parquet files
* fix: use 20 files
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: use `EXPLAIN ANALYZE` for SQL metric tests
Needs a bit more infra (due to normalization), but this seems to be
worth it so we can easily hook up more metrics in the future.
* docs: explain regexes
It should be always clear from the context to which table a chunk
belongs.
I think having a table name bound to a chunk goes back to a time where
chunks had multiple tables.
Helps with #6049.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: simplify `QueryChunk` data access
We have only two types for chunks (now that the RUB is gone):
1. In-memory RecordBatches
2. Parquet files
Loads of logic is duplicated in the different `read_filter`
implementations. Also `read_filter` hides a solid amount of logic from
DataFusion, which will prevent certain (future) optimizations. To enable #5897
and to simplify the interface, let the chunks return the data (batches
or metadata for parquet files) directly and let `iox_query` perform the
actual heavy-lifting.
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
With #5963 merged, all chunks now provide a summary (even though it may
not contain data for all columns). So let's make it mandatory, which
also removes a few 🙈-style `.except(...)` calls.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Use the table summary instead. This allows us to have a single mechanism
that both IOx and DataFusion understand. This basically lifts the "basic
table summary" mechanism that the querier uses to `iox_query` and let
the compactor and ingester use the same mechanism.
While not strictly necessary, simplifying the `QueryChunk[Meta]`
interface helps with #5897.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This commit makes use of the partition buffer state machine introduced
in https://github.com/influxdata/influxdb_iox/pull/5943.
This commit significantly changes the buffering, and querying, of data
from a partition, swapping out the existing "DataBuffer" for the new
state machine implementation (itself simplified due to temporary lack of
incremental snapshot generation, see #5944).
This commit simplifies the query path, removing multiple types that
wrapped one-another to pass around various state necessary to perform a
query, with various query functions needing different types or
combinations of types. The query path now operates using a single type
(named "QueryAdaptor") that provides a queryable interface over the set
of RecordBatch returned from a partition.
There is significantly increased testing of the PartitionData itself,
covering data in various states and the ordering of returned RecordBatch
(to ensure correct materialisation of updates). There are also
invariants upheld by the type system / compiler to minimise the
complexities of working with empty batches & states, and many asserts
that ensure (mostly existing!) invariants are upheld.
We basically assume everywhere that a column falls into one of the three
known categories (time, tag, field), so lets encode this in our type
system instead of defining "unknown" as "undefined behavior, may or may
not crash".
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: have a logical plan that is aware of no-deduplication
* feat: build physical scan plan that does not do deduplication
* chore: cleaup
* test: logical plans for scan with and without deduplication
* chore: clean up and a small refactor
* refactor: remove asserts on plan and rename make enable_deduplication default
* refactor: rename disable_deduplication to enable_deduplication
Databases are NOT required to project chunks (in practice this is only
done by the querier for ingester-based chunks). Instead `iox_query`
should (and already does) add the right stream adapters to project
chunks or to create NULL-columns. Removing the special handling from the
test setup makes it easier to understand and also less likely that
`iox_query` starts to rely on this behavior.
Helps with #5897.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
- remove generic that is basically unused (`group_potential_duplicates`
is always called w/ `Arc<dyn QueryChunk>`)
- remove half-baked `impl` that is unused
Helps w/ #5897.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: `IOxReadFilterNode` can always accumulate statistics
`IOxReadFilterNode` used to not emit statistics if one chunk has
duplicates or delete predicates. This is wrong (or at least overly
conservative), because the node itself (or the chunks themselves) do NOT
perform dedup or delete predicate filtering. Instead this is done is
done by parent nodes (`DeduplicateExec` and `FilterExec`) and its their
job to propagate statistics correctly.
Helps w/ #5897.
* test: explain setup
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Now that we read throw `ParquetExec`, `ROW_GROUP_READ_SIZE` is no longer
used.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Use the proper top-level DataFusion context and register the object
store there.
Note that we still hide the `ParquetExec` behind an opaque record batch
stream. Fixing that is next on my list.
Helps with #5897.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: replace `croaring` with `roaring`
With the read buffer gone, roaring bitmaps are only used to calculate
series sets and these calculations are pretty much possible with the
pure-Rust version. Also I don't deem that that performance-critical
(compared to the roaring bitmaps in the read buffer core).
This removes a bunch of dependencies, mostly because `bindgen` is gone.
This also removes our "croaring architecture detection" hack.
* refactor: replace manual roaring sets with arrow
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: Failing tests for unsupported queries
* fix: Catch unsupported SQL operations and error rather than return nothing
* test: Document a few more error messages that come through DataFusion
* refactor: Extract a Step to make query error tests nicer to read and write
* fix: update tests for new error codes
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: initial step to identify where the projection should be provided
* feat: start getting columns of all expressions
* chore: format
* test: test for the table_chunk_stream
* fix: fix a compile error. Thanks @alamb
* test: full tests for table_chunk_stream
* chore: cleanup
* fix: do not cut any columns in case all fields are needed
* test: add one more test case of reading all columns
* refactor: move code that identify columbs ot push down to a function. Add the use of field_columns
* chore: cleanup
* refactor: make sream_from_batch support empty batches
* chore: cleanup
* chore: fix clippy after auto merge
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
- treat OOM protection as "resource exhausted"
- use `DataFusionError` in more places instead of opaque `Box<dyn Error>`
- improve conversion from/into `DataFusionError` to preserve more
semantics
Overall, this improves our error handling. DF can now return errors like
"resource exhausted" and gRPC should now automatically generate a
sensible status code for it.
Fixes#5799.
* chore: Upgrade to Rust 1.64
* fix: Use iter find instead of a for loop, thanks clippy
* fix: Remove some needless borrows, thanks clippy
* fix: Use then_some rather than then with a closure, thanks clippy
* fix: Use iter retain rather than filter collect, thanks clippy
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: concurrent table scan in "field columns"
Similar to #5647 and #5649.
* docs: improve
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: concurrent table planning in InfluxRPC
Some InfluxRPC can scan multiple tables. Prior to this PR we were always
scanning the tables in sequence, adding up potential latencies (catalog,
ingester, object store). There is no reason we need to do this,
"ordinary" SQL queries would not serialize this way either.
So let's scan tables concurrently. This add concurrency to:
- read filter
- read group
- read window aggregate
There are other query types that could benefit from a similar treatment.
They will be changed in a follow-up.
* docs: improve
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* test: explain `Send` assertion
* refactor: change `CONCURRENT_TABLE_JOBS` to 10
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Only store context, settings (if any) and the schema interner within the
de-duplicator. Extract a new `Chunks` type that handles the chunk
classification and can passed around in a somewhat clean fashion.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
In our data model, a chunk always belongs to a partition[^1], so let's
not make this attribute optional. The optional value only leads to
-- mostly surprising -- conditional behavior, ranging from "do not equalize
the partition sort key" (querier) to "always consider the chunk overlapping"
(iox_query when dealing with ingester chunks).
[^1]: This is even true when the chunk belongs to a parquet file that is not
yet added to the catalog, contrary to what a comment in the ingester
stated. The catalog and data model used by the querier are two totally
different things.
* fix: apply selection in `TestChunk::read_filter`
TBH I have no idea how this worked so well before, but the chunks are
expected to apply the given selection. This is because
`IOxReadFilterNode::execute` will wrap the `QueryChunk::read_filter`
output into a `SchemaAdapterStream` and this one expects that there are
no input columns that are absent in the output schema (i.e. it will only
add null columns, it won't remove any). Funnily the `SchemaAdapterStream`
error will blame DataFusion for the mess.
* test: make `test_storage_rpc_tag_values_grouped_by_measurement_and_tag_key` a bit harder
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: remove dead code
* refactor: `Deduplicator::build_scan_plan` consumes `self`
There is no good reason to use the same `Deduplicator` twice. In
contrast I'm quite sure that this would lead to nasty bugs, because
`split_overlapped_chunks` exists early in some cases so the 2nd plan
would have old and new chunks mixed together.
* ci: use same feature set in `build_dev` and `build_release`
* ci: also enable unstable tokio for `build_dev`
* chore: update tokio to 1.21 (to fix console-subscriber 0.1.8
* fix: "must use"
* fix: hoist repeated computation out of chunk creation
We have hundreds of chunks per table, so it is beneficial to only
do common work once.
* chore: remove TableCache as it is no longer used
* fix: prune chunks both before and after metadata fetch
Fetching the metadata for all the chunks in a table is expensive,
especially when we have a narrow time range query that only
needs a few chunks.
* chore: fix clippy
* fix: fix up some last tests
* fix: review comments
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: make terminology in iox_query::Provider consistent (remove super notation)
* refactor: be more specific about *which* sort key is meant
* refactor: rename another sort_key --> output_sort_key
* refactor: rename additional sort_key to output_sort_key
* refactor: rename sort_key --> chunk_sort_key
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
- emit a warning if we cannot even attempt to prune chunks due to an
error. This is always either a missing feature or a bug (even though
it does not impact correctness but _only_ performance). Also see
https://github.com/influxdata/conductor/issues/1107
- change metrics to clearly differentiate between "could not prune" and
"not pruned"
- add new "not pruned" observer hook (this was missing for some reason,
the "pruned" hook existed though)
* chore: update deps to get new chrono
* chore: Run cargo hakari tasks
* chore: migrate away from deprecated API
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
* refactor: make could-not-prune reason a static string
* refactor: introduce `QuerierTableArgs`
* feat: chunk pruning metrics
Closes#4974.
* refactor: address review comments
* refactor: use static typing for not-pruned reason
* refactor: pass chunk to not-pruned observer and use it for some metrics
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This is what DataFusion uses by default and I don't see a reason why we
should use such small batch sizes.
The affect is probably only visible in certain filter-aggregate queries
that don't focus on a single series (because there we likely end up with
1 or 2 batches only, esp. after #5250) for coarse-grained filters, esp.
when the filter key is not the first sort key.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: `QueryChunk::as_any`
* feat: allo `ChunkPruner::prune_chunks` to fail
* feat: limit per-table chunk data for every query
Closes#5211.
* fix: address review comments
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
- remove `IOxSessionContext::default()` because untracked contexts
should only be created by tests
- remove `Option<IOxSessionContext>` because it is a typed workaround
for `IOxSessionContext::default`
Tests should use `IOxSessionContext::testing` and all _normal_ users
should create proper contexts.
I suspect this will help tracing or at least prevent silent regressions.
See #5129.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: Fix SeriesKey sort order for special _measurement and _field
* fix: Update expected test output
* fix: Update more tests
* fix: Re-sort tag key when using binary encoding
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* feat: initial implementation for selecting compaction candidates
* feat: 2 catalog functions to choose the most thorughput partitions to compact and the selecting candidate function itself
* test: tests for the new 2 queries
* feat: more tests and metrics for chooing compaction candidates
* chore: Apply self suggestions from self review
* chore: cleanup
* chore: fix doc comment
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* refactor: address review comments
* fix: get the right time provider for the tests
* refactor: remove the left over compaction_
* fix: typos
* fix: make the param name and env name consistent
* refactor: make relevant iSomething to uSomething
* fix: typo
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
There were some instances were we forgot to pass context (and therefore
tracing) information to `InfluxRpcPlanner`. This removes the `Default`
implementation requires to always pass a context when creating
`InfluxRpcPlanner` to prevent this type of bug.
Ref #5129.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This should help a lot once #5032 is implemented. Currently it doesn't
really make a difference.
See #5037, which also proposes a more advanced but more complex system.
The team however agreed to try something simple first.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: remove parquet chunk ID to `ChunkMeta`
* refactor: return `Arc` from `QueryChunk::summary`
This is similar to how we handle other chunk data like schemas. This
allows a chunk to change/refine its "believe" over its own payload while
it is passed around in the query stack.
Helps w/ #5032.
* fix: optimize field columns for all-time predicates
Also fix timestamp range to allow selecting points at MAX_NANO_TIME
* fix: clamp end to MIN_NANO_TIME for safety
* refactor: add contains_all method to TimestampRange
Instead of using some hand-rolled timestamp-based logic (or just
"unknown") all over the place, just use logic introduced in #5017.
This requires slightly improved table summaries within the querier that
at least has min/max for the timestamp column. For that, the former
`IngesterChunk`-specific `calculate_summary` method was extended to
`create_basic_summary` to include that data and is now also used by
`QuerierParquetChunk`.
Note: `QuerierRBChunk` already has detailled metrics that are provided
by the read buffer implementation.
Should we ever need even better pruning for `QuerierParquetChunk` (or
`IngesterChunk`) then we _only_ need add extra data to the table
summaries.
Closes#4976.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* docs: fix comment
* test: add test for delete behaviour
* fix: tag_keys optimization for empty predicate
Also need to eliminate 'true' predicates from simplified predicate so
is_empty works correctly.
* refactor: use lit instead of spelling out literal true
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Prior to this change background tasks that we feed into `AdapterStream`
can panic but that would just end the stream without any user-visible
error (except for the panic message on stdout/stderr).
This was found while developing #4964. I have proposed another fix in #4966
but found that I actually developed an existing solution a 2nd time:
`watch_task`. But I also see a major issue with the existing API: one
can create `AdapterStream` with ordinary tokio tasks that are not
watched at all, leaving the burden to the implementor to check for that
(and actually we forgot that in `parquet_file`).
So this change takes a slightly different approach:
The `AdapterStream` does NOT accept ordinary join handles any longer but
requires that you pass a "watched task". The newly introduced
`WatchedTask` does the same as we did manually before: wrapping a future
into a tokio task, watch it and wrap the watcher into a task.
It is now way more difficult to do anything stupid (sure you can still
mix up the tasks and the channels, but we need at least some flexibility
here to allow for "split" and potential future fan-in/out constructs).
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update datafusion pin
* fix: Update for api
* fix: Explicitly set coalsce batch size
* fix: Update batch size as well
* fix: update tests for new explain plan, and improved coercion
* refactor: change level 1 to level 2 preparing for next design changes
* fix: make level-2 consistent everywhere
* chore: remove unused comments
* refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent
* chore: add correspinding constants for the comapction levels in the comments
Co-authored-by: Dom <dom@itsallbroken.com>
* refactor: simplify sort key calculation
* refactor: use schema from catalog instead from file
* refactor: do not request parquet file MD in compactor
* test: ensure that `QueryableParquetChunk` works correctly
* feat: split times of compacting results based on the max file size
* feat: cosider max file size while computing split time
* test: tests for comput_split_time
* feat: first step to teach the function split_the_steam to know how to split data into n streams using n-1 input PhysicalExprs
* feat: make StreamSplitNode support a list of expression
* docs: explain how StreamSplitNode works
* feat: Teach compute_split_time to split a time range into many contiguous ranges and split compacted result into multiple non-overlapped files based on the config comapction_max_size_bytes
* chore: cleanup
* chore: clean up doc
* chore: address review comments
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: TEMP Update DataFusion to pre-release
* chore: update arrow et al to 16.0.0
* chore: Run cargo hakari tasks
* fix: update reader read_dictionary API
* chore: Update to real Datafusion release
* fix: Update parquet API
* fix: update test
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
* feat: Change data type of catalog Postgres partition's sort_key from a string to an array of string
* test: add column with comma
* fix: use new protonuf field to avoid incompactible
* fix: ensure sort_key is an empty array rather than NULL
* refactor: address review comments
* refactor: address more comments
* chore: clearer comments
* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql
* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql
* fix: Rename migration so it will be applied after
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* chore: Update datafusion deps
* fix: fix for changes in ScalarValue
* fix: fix for using TableSource rather than TableProvider
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This reverts commit 00b5c1b296.
This change reverts the StreamSplitExec plan to using bounded, blocking
channels, with the possibility of deadlock added to the docs.
This is now tolerable because of the concurrent consumption of both
output partitions in the compactor.
* fix: ensure that query tokio background tasks are canceled
While I am not entirely sure if this explains some of the memory leaks I
am seeing in prod, not canceling the tasks correctly certainly makes
debugging way harder and also renders certain form of throttling (e.g.
max. concurrent queries) somewhat ineffective.
Note that parquet file downloads are currently NOT canceled because
tokios `spawn_blocking` cannot be canceled.
* refactor: `Vec` -> `Option`
* refactor: `spawn_blocking` creates a join handle, even though it is useless
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* ci: fix cargo deny
* chore: downgrade `socket2`, version 0.4.5 was yanked
* chore: rename `query` to `iox_query`
`query` is already taken on crates.io and yanked and I am getting tired
of working around that.