Introduce a new header called `iox-debug` which when set enables certain
debug features. The first one will be the `system.queries` table which
is a process-local, namespace-scoped query log. In most prod setups this
is only useful for debugging and will confuse the user a lot because
when multiple queries are deployed then the K8s routing decides which
pod/process the users hits. This leads to an inconsistent view. However
the log is still useful for debugging.
This also wires the "debug header set" flag through the Flight ticket,
because JDBC proved (integration tests FTW!) that headers are only
passed to `GetFlightInfo` but not to `DoGet` and the ticket must encode
all the relevant information.
Closes#7119.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This adds a command to `influxdb_iox` that can take a WAL segment file
and regenerate all write operation entries, writing to stdout or namespaced
files within a target directory, using table ID as the measurement name
in the case where there is no catalog access at point of regeneration.
If the test setup calls `Step::Persist` to persist on-demand, that
means it shouldn't be used with `ChunkStage::Parquet`, which tries to
persist as fast as possible. This will fail the test with a hopefully
helpful message to prevent this.
* refactor: Change catalog configuration so it is entirely dsn based / support end to end testing without postgres
Restores code from https://github.com/influxdata/influxdb_iox/pull/7708
Revert "revert: PR #7708"
This reverts commit c9cfe05f8d.
* fix: merge
* fix: Update new test
* test: add dedup test for multiple partitions and ranges
* refactor: remove `RedudantSort` optimizer pass
Similar to #7807 this is now covered by DataFusion, as demonstrated by
the fact that all query tests (incl. explain tests) still pass.
The good thing is: passes that are no longer required don't require any
upstreaming, so this also closes#7411.
* test: reproducer for idpe_17556
* fix: `ParquetSortness` and partial opt
1. correctly handle cases where `ParquetSortness` would optimize one
child branch but not the other
2. handle cases where `ParquetSortness` recusion should stop a bit
clearer (using `TreeNodeRewriter`)
3. rename query tests to be a bit clearer
4. add test case with many (but not too many) duplicate files and an
ingester (basically a prod use case where the compactor is slightly
behind)
---------
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: add tests for the desired contract for parsing measurements from line protocol
* fix: restrict null chars in measurement
* chore: make an explicit Measurement type
* refactor: have iox lp parser match influxdb contract, for acceptance of eq in measurements
* test: create end_to_end test to confirm same write-then-read behavior with `=` in measurements, is the same as influxdb
* test: add test for gap fill query missing time bounds
* chore: update unit test
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: add expected xdbc type info value to jdbc test
* chore: add query skeleton to kick off plan_get_xdbc_type_info()`
* chore: implement a minimun version query for type info
* chore: rewrite `plan_get_xdbc_type_info` to use a static recrod batch
* chore: construct create_params as a string list
* chore: add create_params column in e2e test result
* chore: re-define create_params list items to be non-nullable
* chore: remove comment
* chore: refactor TYPE_INFO_RECORD_BATCH using XdbcTypeInfo struct and rewrite metadata for character types
chore: lint
chore: lint doc
chore: lint doc use automatic link
* chore: add unimplemented error msg
* chore: add `INTEGER`, `FLOAT`, `TIMESTAMP`, `INTERVAL` and remove `CHAR`, `TEXT`, `STRING`
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: racing JDBC tests
The JDBC tests have been flaky since adding additional tests. Use
the makefile to build the client to avoid the clients racing.
* chore: pre-download JDBC drive in integration test
* fix: remove stray lockfile
* refactor(authz): move extract_header_token into authz
Move the extract_header_token method into the authz package so that
it can be shared by the query path. The method is renamed to reflect
the fact that it can now also extract a token from gRPC metadata.
The extract_token function is now a little more generic to allow
it to be used with HTTP header values and gRPC metadata values.
* feat(service_grpc_flight): JDBC compatible Handshake
While testing some JDBC based clients we found that some, Tableau
in this case, cannot be configured with authoriztion tokens. In
these cases we need to be able to support username/password. The
approach taken is to ignore the username and make the token the
password. This is the same approach being taken throughout the
product.
To facilitate this the Flight RPC Handshake command has been extended
to look for Basic authorization credentials and respond with the
appropriate Bearer authorization header.
While adding end-to-end tests the subprocess commands were causing
a deadlock. These have been changed to using the tonic::process
module.
There are also some small changes to the JDBC test application where
the hardcoded values were clashing with the authorization parameters.
* fix: lint
* chore: apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* chore: review suggestion
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: WIP support CommandGetXdbcTypeInfo metadata endpoint with tests
* chore: update test case and add jdbc test
* chore: uncomment jdbc getCoumns test
* chore: lint
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: add tests on month and year date_bin
* fix: add IOX_COMPARE: uuid to get deterministics name for output parquet_file in the explain
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore(flightsql): rename Namespace to Database in error message
* chore(flightsql): rename Namespace to Database in test error msg
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
- No `ON` clause
- No `WHERE` clause
- No time restriction yet
- No `FROM <db>.<retention>`
Ref https://github.com/influxdata/idpe/issues/17360 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Returns a `NotImplemented` error when attempting to execute a
selector query, which projects a single selector function and additional
tags or fields until #7533 is implemented.
Introduced `error` module to simplify error handling and ensure
consistency of error messages.
* chore: Update datafusion and arrow/parquet to 37, tonic to 0.9.1
* refactor: Update for FieldRef and other API changes
* fix: Update field size calculation
* fix: Use `NullBuffer` directly
* fix: remove outdated comment
* chore: Update test for tonic
* chore: Run cargo hakari tasks
* chore: cargo update
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: support CommandGetCrossReference metadata endpoint with tests
* chore: create two tables in the test for GetCrossReference endpoint
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: support returning the database name if all the keys refer to the same database
* test: add test cases to check for same, different, and no database in request header
* chore: lint
* chore: more lint
* refactor: replace empty string with None for database_name
* refactor: simplify logic for NoFlightSQLDatabase error
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: support CommandGetImportedKeys metadata endpoint with tests
* chore: remove comments that is no longer valid
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Specialises test output formatting for each language
* Also fixes an error uncovered in the `write_columnar` when tag
columns are `NULL`
Closes#7145
* chore: Run cargo hakari tasks
* chore: Add sorted output until #7513 is addressed
* chore: clippy 📋
* feat: Add `options` to `write_columnar`
* Added ability to configure border rendering, including removing
borders. This helps avoid variable width issues with EXPLAIN output,
which tends to vary and cause flaky test failures.
* chore: rustfmt 🧹
* chore: update expected output
* chore: clarify what "this" is
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
* feat: support `database`, `bucket`, and `bucket-name` as grpc header names
* chore: lint
* chore: update doc to accept `database`, `bucket`, and `bucket-name` as parameter names
* chore: update doc to only show `database` as the parameter name
* refactor: consolidate header names into a const vec and update comments on database
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This was "internal". The mapping works like this: we take the
`DataFusionError` and call `find_root` which should traverse the
`External(...)` chain (even through Arrow) to find the last error that
is not within the Arrow/DataFusion land. This is then mapped by us.
`DataFusionError::External(...)` is no further inspected and mapped
straight to "internal". I think this if fine because in the end we're
mostly dealing w/ DataFusion stuff anyways.
I've slightly changed the error mapping in the planner to emit
`DataFusionError::Plan(...)` instead which we map to "invalid argument".
I think this is way better for the user.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Within our query tests and our CLI, we've used to print out empty
query responses as:
```text
++
++
```
This is pretty misleading. Why are there no columns?! The reason is that
while Flight provides us with schema information, we often have zero
record batches (because why would the querier send an empty batch). Now
lets fix this by creating an empty batch on the client side based on the
schema data we've received. This way, people know that there are columns
but no rows:
```text
+-------+--------+------+------+
| count | system | time | town |
+-------+--------+------+------+
+-------+--------+------+------+
```
An alternative fix would be to pass the schema in addition to
`Vec<RecordBatch>` to the formatting code, but that seemed to be more
effort.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: add CommandGetPrimaryKeys metadata endpoint and tests
* chore: update schema for the returned record batch
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: end-to-end tests for authorization
Add tests to validate the behaviour of the authorization machinery
in the write and query paths.
In order to facilitate this an authorizer implentation has been
added to the the test helpers that runs an authorizer gRPC service
for the use of tests. The gRPC service is started in the process
that is running the test and listens on a OS-assigned port number.
The authorization service cannot be shared between tests so a
non-shared cluster must be used when the authorizer is configured.
The influxdb_iox_client has been enhanced so that the user can
configure additional headers in the flight client, which is used
for SQL and InfluxQL queries. This uses the same interface as the
Flight SQL client has for the same job.
* chore: fix lint errors
* chore: review suggestion
Consolate the authorization tests into fewer tests in order to avoid
repeating set-up and tear-down unnecessarily.
* fix: Add sort operator after window aggregate operator
Closes#7460
* fix: Refactor `LIMIT` and `OFFSET` implementation
These changes should allow the `limit` function to be used
generically with any plan following the same conventions.
* chore: No need to reorder this
* chore: Add documentation to the `limit` function
* feat: Support LIMIT and OFFSET with GROUP BY
* fix: Compile error
* chore: Improve function name and comment
* chore: rustfmt
* chore: fix clippy warnings
Allowing the too-many-arguments warning for project_select,
as it will require some refactoring after this PR has already
been reviewed. It may be refactored in the future when subqueries are
implemented
This commit adds a client method to invoke the
UpdateNamespaceServiceProtectionLimits RPC API, providing a user
friendly way to do this through the IOx command line.
- Only redact the actual time value in `FilteExec`, not the entire
expression. This preserves important information about filter
pushdowns.
- Apply similar time filter to `ParquetExec` because with #6098 we will
push down more filters into `ParquetExec`, including retention
policies.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: update gap fill planner rule to use LOCF
* chore: cargo fmt
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: Add an e2e test for write replication
* fix: Pass through rpc_write_replicas configuration to RpcWrite handler
---------
Co-authored-by: Dom <dom@itsallbroken.com>
* feat(flightsql): Add support for table_schema in GetTables and complies
feat(fightsql): Add support for table_schema in GetTables
support actual schema
it compiles / does not fail
* chore: resolve merge conflict
* chore: make table_schema optional
* test: update e2e test for `include_schema` = true
* chore: remove info!() and update test `flightsql_schema_matches`
* chore(deps): Bump rustix from 0.36.11 to 0.37.3 (#7308)
* chore(deps): Bump rustix from 0.36.11 to 0.37.3
Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.36.11 to 0.37.3.
- [Release notes](https://github.com/bytecodealliance/rustix/releases)
- [Commits](https://github.com/bytecodealliance/rustix/compare/v0.36.11...v0.37.3)
---
updated-dependencies:
- dependency-name: rustix
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore: Run cargo hakari tasks
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: make real error for existing columns
* chore: user match instead of unwrap() on column names
* chore: use datafusion::physical_plan::collect() to get record batches
* chore: use `concat_batches` to combine multiple batches into single one and fix db schema test
* chore: add doc comment for GetTables
* chore: remove pretty print
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Display failed query
Allows a user to immediately identify the failed query.
* feat: API improvements to InfluxQL parser
* feat: Extend `SchemaProvider` trait to query for UDFs
* fix: We don't want the parser to panic on overflows
* fix: ensure `map_type` maps the timestamp data type
* feat: API to map a InfluxQL duration expression to a DataFusion interval
* chore: Copied APIs from DataFusion SQL planner
These APIs are private but useful for InfluxQL planning.
* feat: Initial aggregate query support
* feat: Add an API to fetch a field by name
* chore: Fixes to handling NULLs in aggregates
* chore: Add ability to test expected failures for InfluxQL
* chore: appease rustfmt and clippy 😬
* chore: produce same error as InfluxQL
* chore: appease clippy
* chore: Improve docs
* chore: Simplify aggregate and raw planning
* feat: Add support for GROUP BY TIME(stride, offset)
* chore: Update docs
* chore: remove redundant `is_empty` check
Co-authored-by: Christopher M. Wolff <chris.wolff@influxdata.com>
* chore: PR feedback to clarify purpose of function
* chore: The series_sort can't be empty, as `time` is always added
This was originally intended as an optimisation when executing an
aggregate query that did not group by time or tags, as it will produce
N rows, where N is the number of measurements queried.
* chore: update comment for clarity
---------
Co-authored-by: Christopher M. Wolff <chris.wolff@influxdata.com>
* feat: add optional param to GetTables
* chore: add the third param to query plan
* feat: add table_types param
* chore: clippy
* test: add test cases with filters
* chore: update query to avoid SQL injection
* refactor: update where clause and cleanup
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: Add getTables jdbc_client example
* feat: add `CommandGetTables` in FlightSqlClient
* feat: add `CommandGetTables` in flightsql cmd and planner
* test: add e2e test for `CommandGetTables`
* chore: clippy
* chore: comment out the test with filters
* test: update jdbc test expected value for tables
---------
Co-authored-by: Chunchun <14298407+appletreeisyellow@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>