* feat: add analysis to find time predicates
* refactor: propagate time range to gap fill logical node
* refactor: propagate time range to GapFillExec
* refactor: code review feedback
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: IOx learns InfluxQL time-range expression → DF logical Expr
IOx now understand the how to evaluate an InfluxQL time-range filter
expression and transform that to a DataFusion logical expression.
* chore: move time range expression to independent functions
There is no need for these to be part of the `InfluxQLToLogicalPlan`
struct and makes them easier to test.
* chore: support scalar now on either side of binary expression
* chore: improve error messages
* chore: address clippy concerns
* chore: add tests for time ranges
* chore: add a test where time appears on the right-hand side
Ensure time is correctly identified on the right-hand side of a
conditional expression.
* chore: add tests that specify a timezone
* chore: Run cargo hakari tasks
* chore: fix linting issues
* chore: Remove unnecessary line
* chore: Feedback: Add API to parse a conditional expression
Based on feedback from @alamb, we don't want to hide the error from
parsing a `ConditionalExpression`. To do this, we use the
public API, `parse_statements` as a model and provide a new API,
`parse_conditional_expression`, which returns a `Result` with the error
being a `ParseError`. Additionally, `ConditionalExpression` implements
the `FromStr` API using the `parse_conditional_expression` API.
* chore: PR feedback reverting this change
I believe my intention was to update all instances in the match, but
never completed the change. Will leave for another day.
* chore: PR feedback add additional comments
* chore: rustfmt
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
* refactor: Move `flightsql` code into its own module
* fix: get schema from LogicalPlan
* refactor: use arrow_flight::sql::Any instead of prost_types::any
* fix: cleanup docs and avoid as_ref
* fix: Use Bytes
* fix: use Any::pack
* fix: doclink
* refactor: Drop Expr::UnaryOp to simplify tree traversal
The UnaryOp doesn't provide and additional value and complicates
walking the AST, as literal values wrapped in a UnaryOp(Minus, ...)
require extra handling when reducing time range expressions, etc.
This change also is true to the InfluxQL Go implementation,
which represents whole number literals as signed integers unless
they exceed i64::MAX.
* chore: Refactor all usages of format!("{}", ?) to ?.to_string()
Per https://github.com/influxdata/influxdb_iox/pull/6600#discussion_r1072028895
* refactor: remove unused code
* refactor: make fn private
* feat: safely stream data from one tokio runtime to another
Closes#6577.
* refactor: review comments
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* docs: improve
* test: explain
* test: make tests more tricky
* refactor: improve error message
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Parse IANA timezone strings to chrono_tz::Tz
* feat: Visitors can customise the return error type
This avoids having to remap errors from `&'static str` to the caller's
error type, and will be used in a future PR for time range expressions.
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
- name exec driver thread (instead of using the default that `thread::spawn`
gives us)
- provide number to every worker thread (both for the dedicatd executor
and for the main runtime)
- shorten thread names (current naming too long for most debug tools)
* feat: InfluxQL learns how to plan some queries
Also added a means to test the planner and execution
* chore: Update module docs
* chore: Document the planner functions
* chore: Update end_to_end_cases crate
* chore: Clarify why `SLIMIT` and `SOFFSET` return `NotImplemented`
* chore: Address lint issues
* chore: Fix rustdoc link issue
* chore: Remove InfluxQL tests from query_tests crate
Will follow conventions established by @carols10cents when
new query_tests crate is merged.
* chore: `now` field
`now` is a DataFusion built-in scalar function
* chore: remove unused code
* chore: Add additional arithmetic expression tests
* chore: Establish pattern for identifying and tracking InfluxQL issues
* chore: Add tests for case sensitivity issues
* chore: group tests into modules and functions
This avoids mass rewriting of insta snapshots as new
tests are added to each function. When tests are added in the middle,
existing snapshots are renamed (-N+1, -N+2, etc) resulting in
having to review numerous additional snapshots.
The current version is barely readable because the logged schema w/ all
it's metadata is soooo long.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Add timestamp data type
* feat: Add with_quiet API to suppress output to STDOUT
* fix: Field name resolution to match InfluxQL
* refactor: Allow TestChunks to be directly accessed
This will be useful when testing the InfluxQL planner.
* fix: Add Timestamp case to var_ref module
* feat: Add InfluxQL compatible column naming
* chore: Add doc comment.
* fix: keywords may be followed by a `!` such as `!=`
* fix: field_name improvements
* No longer clones expressions
* Explicitly handle all Expr enumerated items
* more tests
* fix: collision with explicitly aliased column
Fixes case where column is explicitly aliased to an auto-named variant.
Test case added to validate.
* chore: Move logic to context, in line with DataFusion SQL
* chore: Add ordering for InfluxQL data types
Ordering is used to determine automatic casting operations. If two
field columns are present in an expression, one float and one integer,
the integer should be cast to a float, such that the final expression
will be a float.
* chore: Add DerefMut trait to collection types
Will allow these collections to be mutated when traversing the InfluxQL
AST.
* chore: Add influxql module with initial AST normalisation implementation
* chore: Add more unit tests and docs
* chore: Run cargo hakari tasks
* chore: Fix link
* chore: Support regular expression expansion and Call expressions
* chore: Add tests for walk_expr functions
* chore: Add insta snapshot files
* chore: Add docs and make API accessible to the crate
* chore: Move to Arc<dyn SchemaProvider> for use in influxql planner
* chore: Move code back; it is better encapsulated here
* chore: Remove redundant attribute
* chore: Improve regex compatibility with InfluxQL / Go
* chore: Style improvement.
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
This commit changes the behaviour of the persist system to enable
optimal parallelism of persist operations, and improve the accuracy of
the outstanding job bound / back-pressure.
Previously all persist operations for a given partition were
consistently hashed to a single worker task. This serialised persistence
per partition, ensuring all updates to the partition sort key were
serialised. However, this also unnecessarily serialises persist
operations that do not need to update the sort key, reducing the
potential throughput of the system; in the worst case of a single
partition receiving all the writes, only one worker would be persisting,
and the other N-1 workers would be idle.
After this change, the sort key is inspected when enqueuing the persist
operation and if it can be determined that no sort key update is
necessary (the typical case), then the persist task is placed into a
global work queue from which all workers consume. This allows for
maximal parallelisation of these jobs, and the removes the per-worker
head-of-line blocking.
In the case that the sort key does need updating, these jobs continue to
be consistently hashed to a single worker, ensuring serialised sort key
updates only where necessary.
To support these changes, the back-pressure system has been changed to
account for all outstanding persist jobs in the system, regardless of
type or assigned worker - a logical, bounded queue is composed together
of a semaphore limiting the number of persist tasks overall, and a
series of physical, unbounded queues - one to each worker & the global
queue. The overall system remains bounded by the
INFLUXDB_IOX_PERSIST_QUEUE_DEPTH value, and is now simpler to reason
about (it is independent of the number of workers, etc).
* fix: account for memory allocations in InfluxRPC group outputs
This should prevent the querier from OOMing.
See https://github.com/influxdata/idpe/issues/16614 .
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* refactor: pull out constant
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* fix: gRPC errors regarding group cols
- missing group col prev. produced an "internal error" but should be
"invalid argument"
- duplicate group cols produced a panic but should also be "invalid
argument"
* docs: clarify
* refactor: DF-driven on-demand mem limit instead of ahead-of-time heuristics
Closes#6310.
* refactor: rename and tune default exec mem limits
* fix: ingester2 bits after rebase
* fix: check schemas in `pretty_print_batches`
I think most users of this function (and `assert_batches_eq`) assume
that all batches have the same schema. If not, `pretty_print_batches`
may either fail producing an actual table (some rows may have more or
less columns) or silently produce a table that looks "alright".
* fix: equalize schemas where it is required/desired
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Have a single global test executor w/ reasonable defaults. Also don't
require tests to join/await executor shutdowns (most tests forget this
anyways and will get a runtime warning).
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>