* test: add test for gap fill query missing time bounds
* chore: update unit test
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: add expected xdbc type info value to jdbc test
* chore: add query skeleton to kick off plan_get_xdbc_type_info()`
* chore: implement a minimun version query for type info
* chore: rewrite `plan_get_xdbc_type_info` to use a static recrod batch
* chore: construct create_params as a string list
* chore: add create_params column in e2e test result
* chore: re-define create_params list items to be non-nullable
* chore: remove comment
* chore: refactor TYPE_INFO_RECORD_BATCH using XdbcTypeInfo struct and rewrite metadata for character types
chore: lint
chore: lint doc
chore: lint doc use automatic link
* chore: add unimplemented error msg
* chore: add `INTEGER`, `FLOAT`, `TIMESTAMP`, `INTERVAL` and remove `CHAR`, `TEXT`, `STRING`
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Still insert them into the database and associate them with namespaces,
but don't ever query them back out.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This PR renames the CLI option and variables throughout the compactor to be `df_concurrency` (and similar) instead of `job_concurrency` (and similar).
From the perspective of the compactor, the most noteworthy consequence of this limit is on concurrency within Data Fusion. This is the limit on the number of DataFusion jobs the compactor may start concurrently.
* fix: racing JDBC tests
The JDBC tests have been flaky since adding additional tests. Use
the makefile to build the client to avoid the clients racing.
* chore: pre-download JDBC drive in integration test
* fix: remove stray lockfile
* refactor(authz): move extract_header_token into authz
Move the extract_header_token method into the authz package so that
it can be shared by the query path. The method is renamed to reflect
the fact that it can now also extract a token from gRPC metadata.
The extract_token function is now a little more generic to allow
it to be used with HTTP header values and gRPC metadata values.
* feat(service_grpc_flight): JDBC compatible Handshake
While testing some JDBC based clients we found that some, Tableau
in this case, cannot be configured with authoriztion tokens. In
these cases we need to be able to support username/password. The
approach taken is to ignore the username and make the token the
password. This is the same approach being taken throughout the
product.
To facilitate this the Flight RPC Handshake command has been extended
to look for Basic authorization credentials and respond with the
appropriate Bearer authorization header.
While adding end-to-end tests the subprocess commands were causing
a deadlock. These have been changed to using the tonic::process
module.
There are also some small changes to the JDBC test application where
the hardcoded values were clashing with the authorization parameters.
* fix: lint
* chore: apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* chore: review suggestion
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: WIP support CommandGetXdbcTypeInfo metadata endpoint with tests
* chore: update test case and add jdbc test
* chore: uncomment jdbc getCoumns test
* chore: lint
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: add tests on month and year date_bin
* fix: add IOX_COMPARE: uuid to get deterministics name for output parquet_file in the explain
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: move authz-addr flag into router-specific config
* refactor: move authz-addr flag into querier-specific config
* refactor: remove global AuthzConfig which is now redundant with the pushdown to individual configs. Keep constant the env vars used universally.
* chore: make errors lowercase, and use the required bool for the authz-addr flag
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Tests that use the in-memory catalog are creating different shards that
then creates old-style Parquet file paths, but in production, everything
uses the transition shard now. To make the tests more like production,
only ever create and use the transition shard, and stop checking for
different shard IDs.
* chore(flightsql): rename Namespace to Database in error message
* chore(flightsql): rename Namespace to Database in test error msg
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Provide a configuration item for the ingester2 that controls the maximum
incoming RPC message size.
Raises the maximum from the default 4MiB to a more reasonable 100MiB.
Provide a configuration item for the router (in RPC mode) that controls
the maximum outgoing RPC message size when communicating with an
Ingester.
Raises the maximum from the default 4MiB to 100MiB. This does not
increase exposure to memory-based DOS, as writes are size-limited by the
HTTP layer to 10MiB, preventing a user from submitting a write this
large (or larger!) across the RPC boundary.
This is helpful to test changes in our defaults but also for testing.
Required for https://github.com/influxdata/idpe/issues/17474 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
- No `ON` clause
- No `WHERE` clause
- No time restriction yet
- No `FROM <db>.<retention>`
Ref https://github.com/influxdata/idpe/issues/17360 .
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Returns a `NotImplemented` error when attempting to execute a
selector query, which projects a single selector function and additional
tags or fields until #7533 is implemented.
Introduced `error` module to simplify error handling and ensure
consistency of error messages.
This creates a separate option for the number of minutes *without* a
write that a partition must have before being considered for cold
compaction.
This is a new CLI flag so that it can have a different default from hot
compaction's compaction_partition_minute_threshold.
I didn't add "hot" to compaction_partition_minute_threshold's name so
that k8s-idpe doesn't have to change to continue running hot compaction
as it is today.
Then use the relevant threshold earlier, when creating the
PartitionsSourceConfig, to make it clearer which threshold is used
where.
Right now, this will silently ignore any CLI flag specified that isn't
relevant to the current compaction mode. We might want to change that to
warn or error to save debugging time in the future.
* chore: Update datafusion and arrow/parquet to 37, tonic to 0.9.1
* refactor: Update for FieldRef and other API changes
* fix: Update field size calculation
* fix: Use `NullBuffer` directly
* fix: remove outdated comment
* chore: Update test for tonic
* chore: Run cargo hakari tasks
* chore: cargo update
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: support CommandGetCrossReference metadata endpoint with tests
* chore: create two tables in the test for GetCrossReference endpoint
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: support returning the database name if all the keys refer to the same database
* test: add test cases to check for same, different, and no database in request header
* chore: lint
* chore: more lint
* refactor: replace empty string with None for database_name
* refactor: simplify logic for NoFlightSQLDatabase error
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: support CommandGetImportedKeys metadata endpoint with tests
* chore: remove comments that is no longer valid
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Specialises test output formatting for each language
* Also fixes an error uncovered in the `write_columnar` when tag
columns are `NULL`
Closes#7145
* chore: Run cargo hakari tasks
* chore: Add sorted output until #7513 is addressed
* chore: clippy 📋
* feat: Add `options` to `write_columnar`
* Added ability to configure border rendering, including removing
borders. This helps avoid variable width issues with EXPLAIN output,
which tends to vary and cause flaky test failures.
* chore: rustfmt 🧹
* chore: update expected output
* chore: clarify what "this" is
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
* feat: support `database`, `bucket`, and `bucket-name` as grpc header names
* chore: lint
* chore: update doc to accept `database`, `bucket`, and `bucket-name` as parameter names
* chore: update doc to only show `database` as the parameter name
* refactor: consolidate header names into a const vec and update comments on database
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This was "internal". The mapping works like this: we take the
`DataFusionError` and call `find_root` which should traverse the
`External(...)` chain (even through Arrow) to find the last error that
is not within the Arrow/DataFusion land. This is then mapped by us.
`DataFusionError::External(...)` is no further inspected and mapped
straight to "internal". I think this if fine because in the end we're
mostly dealing w/ DataFusion stuff anyways.
I've slightly changed the error mapping in the planner to emit
`DataFusionError::Plan(...)` instead which we map to "invalid argument".
I think this is way better for the user.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Within our query tests and our CLI, we've used to print out empty
query responses as:
```text
++
++
```
This is pretty misleading. Why are there no columns?! The reason is that
while Flight provides us with schema information, we often have zero
record batches (because why would the querier send an empty batch). Now
lets fix this by creating an empty batch on the client side based on the
schema data we've received. This way, people know that there are columns
but no rows:
```text
+-------+--------+------+------+
| count | system | time | town |
+-------+--------+------+------+
+-------+--------+------+------+
```
An alternative fix would be to pass the schema in addition to
`Vec<RecordBatch>` to the formatting code, but that seemed to be more
effort.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: add CommandGetPrimaryKeys metadata endpoint and tests
* chore: update schema for the returned record batch
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: end-to-end tests for authorization
Add tests to validate the behaviour of the authorization machinery
in the write and query paths.
In order to facilitate this an authorizer implentation has been
added to the the test helpers that runs an authorizer gRPC service
for the use of tests. The gRPC service is started in the process
that is running the test and listens on a OS-assigned port number.
The authorization service cannot be shared between tests so a
non-shared cluster must be used when the authorizer is configured.
The influxdb_iox_client has been enhanced so that the user can
configure additional headers in the flight client, which is used
for SQL and InfluxQL queries. This uses the same interface as the
Flight SQL client has for the same job.
* chore: fix lint errors
* chore: review suggestion
Consolate the authorization tests into fewer tests in order to avoid
repeating set-up and tear-down unnecessarily.
Adds a single-tenant mode (CST) to the IOx routers.
Single-tenancy mode differs in two main ways:
* V1 write endpoint is partially supported
* V2 write endpoint ignores "org" parameter
The "normal" mode is "multi tenant" which is the default operational
mode, and all existing behaviour remains unchanged. Single tenant mode
can be enabled by specifying INFLUXDB_IOX_SINGLE_TENANCY=true.
Request parsing is delegated to two implementations of the
WriteParamExtractor trait, one each for CST and MT - the logic of each
"mode" is defined within these files and all other functionality is
common between the two.
This commit also renames some of the error types for clarity
(NoSpecified -> NoOrgBucketSpecified, other NotSpecified ->
NoQueryParams, etc).
Note: single tenant code requires testing
* fix: Add sort operator after window aggregate operator
Closes#7460
* fix: Refactor `LIMIT` and `OFFSET` implementation
These changes should allow the `limit` function to be used
generically with any plan following the same conventions.
* chore: No need to reorder this
* chore: Add documentation to the `limit` function
* feat: Support LIMIT and OFFSET with GROUP BY
* fix: Compile error
* chore: Improve function name and comment
* chore: rustfmt
* chore: fix clippy warnings
Allowing the too-many-arguments warning for project_select,
as it will require some refactoring after this PR has already
been reviewed. It may be refactored in the future when subqueries are
implemented
Unused redefinitions of Error were made in namespace commands.
This commit removes those and consolidates error definition to the main
namespace CLI module.
This commit adds a client method to invoke the
UpdateNamespaceServiceProtectionLimits RPC API, providing a user
friendly way to do this through the IOx command line.
* fix: default the write cli command to the http default port.
The all-in-one write api is based on influxdb cloud's v2 http api, which
uses the 8080 http default port. This changeset will
allow 'influxdb_iox write' to work against default influxdb_iox
all-in-one without needing to use the --host option to change the port.
It should not change behavior for existing users of `--host`. It adds a
new configuartion option call `--http-host` to set the http port
separately from the gRPC one.
* fix: fmt
* feat(service_grpc_flight): optional query authorization
Add support for requiriing namespace-level authorization for
arrow flight based query requests. These are the flight SQL commands
as well as the IOx-specific SQL over flight and InfluxQL over flight
protocols.
Supports the optional configuration of an authorization sidecar,
in the same manner as is used in the router. If this is configured
then all arrow flight gRPC requests that are implemented will require
a valid authorization token to be supplied in the request. For a
multi-legged operation such as GetFlightInfo + DoGet required for
FlightSQL then a valid authorization is required for every request.
Ideally this support would be implemented using some sort of
interceptor, however the namespace isn't known until the request
processing has been started. The authorization check is performed
as soon as possible once the desired operation is known.
The legacy "storage" API has no authorization checks. Care should
be taken to ensure this API is never exposed to an untrusted network.
* chore(service_grpc_flight): review suggestions
Implement some suggestions from reviewers. The main change is adding
authorization checks to the handshake command.
* chore(service_grpc_flight): remove authorization of handshake
The Handshake call is used by existing clients to verify the
connection. These clients do not send a namespace header with the
request meaning there is nothing to authorize against. Remove this
authorization for now to avoid breaking existing clients.
* refactor: implement Authorizer trait on Option
Based on a suggestion from Dom implement the Authorizer trait on
Option<T: Authorizer> so that the call sites no longer need to check
if an authorizer is configured. This simplifies the code at the
call sites.
To maximise the utility the signature has changed so that a optional
token is now used. When no authorizer is configured this will not
be looked at. When a token is required a new error will be returned
if no token was supplied.
* fix: suggestions from clippy
- Only redact the actual time value in `FilteExec`, not the entire
expression. This preserves important information about filter
pushdowns.
- Apply similar time filter to `ParquetExec` because with #6098 we will
push down more filters into `ParquetExec`, including retention
policies.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: update gap fill planner rule to use LOCF
* chore: cargo fmt
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: Add an e2e test for write replication
* fix: Pass through rpc_write_replicas configuration to RpcWrite handler
---------
Co-authored-by: Dom <dom@itsallbroken.com>
* feat(flightsql): Add support for table_schema in GetTables and complies
feat(fightsql): Add support for table_schema in GetTables
support actual schema
it compiles / does not fail
* chore: resolve merge conflict
* chore: make table_schema optional
* test: update e2e test for `include_schema` = true
* chore: remove info!() and update test `flightsql_schema_matches`
* chore(deps): Bump rustix from 0.36.11 to 0.37.3 (#7308)
* chore(deps): Bump rustix from 0.36.11 to 0.37.3
Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.36.11 to 0.37.3.
- [Release notes](https://github.com/bytecodealliance/rustix/releases)
- [Commits](https://github.com/bytecodealliance/rustix/compare/v0.36.11...v0.37.3)
---
updated-dependencies:
- dependency-name: rustix
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore: Run cargo hakari tasks
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: make real error for existing columns
* chore: user match instead of unwrap() on column names
* chore: use datafusion::physical_plan::collect() to get record batches
* chore: use `concat_batches` to combine multiple batches into single one and fix db schema test
* chore: add doc comment for GetTables
* chore: remove pretty print
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Display failed query
Allows a user to immediately identify the failed query.
* feat: API improvements to InfluxQL parser
* feat: Extend `SchemaProvider` trait to query for UDFs
* fix: We don't want the parser to panic on overflows
* fix: ensure `map_type` maps the timestamp data type
* feat: API to map a InfluxQL duration expression to a DataFusion interval
* chore: Copied APIs from DataFusion SQL planner
These APIs are private but useful for InfluxQL planning.
* feat: Initial aggregate query support
* feat: Add an API to fetch a field by name
* chore: Fixes to handling NULLs in aggregates
* chore: Add ability to test expected failures for InfluxQL
* chore: appease rustfmt and clippy 😬
* chore: produce same error as InfluxQL
* chore: appease clippy
* chore: Improve docs
* chore: Simplify aggregate and raw planning
* feat: Add support for GROUP BY TIME(stride, offset)
* chore: Update docs
* chore: remove redundant `is_empty` check
Co-authored-by: Christopher M. Wolff <chris.wolff@influxdata.com>
* chore: PR feedback to clarify purpose of function
* chore: The series_sort can't be empty, as `time` is always added
This was originally intended as an optimisation when executing an
aggregate query that did not group by time or tags, as it will produce
N rows, where N is the number of measurements queried.
* chore: update comment for clarity
---------
Co-authored-by: Christopher M. Wolff <chris.wolff@influxdata.com>
* feat: add optional param to GetTables
* chore: add the third param to query plan
* feat: add table_types param
* chore: clippy
* test: add test cases with filters
* chore: update query to avoid SQL injection
* refactor: update where clause and cleanup
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: Add getTables jdbc_client example
* feat: add `CommandGetTables` in FlightSqlClient
* feat: add `CommandGetTables` in flightsql cmd and planner
* test: add e2e test for `CommandGetTables`
* chore: clippy
* chore: comment out the test with filters
* test: update jdbc test expected value for tables
---------
Co-authored-by: Chunchun <14298407+appletreeisyellow@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat(authz): add authorization client.
Add a new authz crate to provide the interface for making authorization
checks from within IOx. This includes the default client that uses
the influxdata.iox.authz.v1 gRPC protocol. This feature is not used
by any IOx component yet.
* feat: optional authorization on write path
Support optionally enabling authorization checks on the /api/v2/write
handler. If an authrorizer is configured then the handler will
attempt to retrieve a token from the request's Authorization header.
If no such token exists then a response with a 401 error code is
returned. If the token is not valid, or does not have write permission
for the requested namespace then a response with a 403 error is
returned.
* chore: add unit test for authz in write handler
Add unit tests that test the correct functioning of the /api/v2/write
handler when an Authorizer is configured.
* chore(authz): use lazy connection
Change the initialization of the authz client to use a lazy connection.
This allows the client to be initialised synchronously.
* chore: Run cargo hakari tasks
* fix(authz): protolint complaints
* fix: authz tests
* fix: benches and lint
* chore: Update clap_blocks/src/authz.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* chore: Update authz/src/lib.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* chore: Update clap_blocks/src/authz.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* chore: review suggestions
* chore: review suggestions
Apply a number of suggestions from review comments. The main
behavioural change is that if the authz service is configured
applictions will perform a probe request to ensure it can communicate
before continuing startup.
* chore: Update router/src/server/http.rs
Co-authored-by: Dom <dom@itsallbroken.com>
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
Co-authored-by: Dom <dom@itsallbroken.com>
* fix: Remove the max_compact_size knob and hardcode a multiple
Rather than panic if the user hasn't set this knob in a particular way,
set the max_compact_size to the minimum value we need by multiplying
max_desired_file_size_bytes by MIN_COMPACT_SIZE_MULTIPLE.
Fixesinfluxdata/idpe#17259.
* refactor: Move computation of max_compact_size_bytes into compactor config
* test: change test setups to reflect the purposes of the tests
---------
Co-authored-by: NGA-TRAN <nga-tran@live.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Normalise name of Call expression to lowercase
Simplifies matching functions in planner, as they are guaranteed to be
lowercase.
This also ensures compatibility with InfluxQL when generating column
alias names, which are reflected in updated tests.
* chore: Ensure aggregate functions fail gracefully.
* feat: GROUP BY tag support
* feat: Ensure schema-level metadata is propagated
Requires: https://github.com/apache/arrow-rs/issues/3779
* chore: Add some tests to validate GROUP BY output
* chore: Add clarifying comment
* chore: Declare message in flight.proto
The metadata is public API, so best practice is to encode this in a way
that is most compatible for clients in other languages, and will also
document the history of schema changes.
Added tests to validate the metadata is encoded correctly.
* chore: Placate linters
* chore: Use correct column in test cases
* chore: Add `is_projected` to the TagKeyColumn message
`is_projected` is necessary to inform a client whether it should include
the tag key is used exclusively for the group key (false) or also
projected in the `SELECT` column list.
* refactor: Move constants to `schema` crate per PR feedback
* chore: rustfmt 🙄
* chore: Update docs for InfluxQlMetadata
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* refactor: n_threads and n_target_partitions are non-zero
Zero values will just panic. Prevent that earlier.
* fix: typo
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
---------
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* chore: Move to inline snapshots
* chore: Container for the DataFusion and IOx schema
* chore: Simplify using logical expression helper functions
* feat: Rewrite conditional expressions using InfluxQL rules
* feat: Add tests to validation conditional expression rewriting
* feat: Rewrite column expressions
* chore: Rewrite expression to use false when possible
This allows the planner to optimise away the entire logical plan to an
empty plan in many cases.
* feat: Complete cast postfix operator support
Added `unsigned` postfix operator, as the feature was mostly complete.
Closes#6895
* chore: Remove redundant attribute
* feat: initial implementation of the split
* feat: split many L0 files in groups and compact them into new and fewer L0 files
* test: remove iappropriate AllAtOnce test
* refactor: move file classification for initial target to its own function
* fix: pop the branch from start to end
* chore: address review comments
* feat: support splitting to many L1 files
* feat: only add extra round to compact level-n files to same level-n files if their files plus overlapped level-n-plus-1 over limit
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* chore: final cleanup and address comments
* chore: run fmt
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update datafusion
* chore: update the plans
* fix: update some plans
* chore: Update plans and port some explain plans to use insta snapshots
* fix: another plan
* chore: Run cargo hakari tasks
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This commit adds initial support for "soft" namespace deletion, where
the actual records & data remain, but are no longer queryable /
writeable.
Soft deletion is eventually consistent - users can expect to continue
writing to and reading from a bucket after issuing a soft delete call,
until the various components either restart, or have their caches
flushed.
The components treat soft-deleted namespaces differently:
* router: ignore soft deleted namespaces
* ingester: accept soft deleted namespaces
* compactor: accept soft deleted namespaces
* querier: ignore soft deleted namespaces
* various gRPC services: ignore soft deleted namespaces
This ensures that the ingester & compactor do not see rows "vanishing"
from the database, and continue to make forward progress.
Writes for the deleted namespace that are buffered in the ingester will
be persisted as normal, allowing us to support "un-delete" operations
where the system is restored to a the state at which the delete was
issued (rather than loosing the buffered data).
Follow-on work is required to ensure GC drops the orphaned parquet files
after the configured GC time, and optimisations such as not compacting
parquet from soft-deleted namespaces seems like a trivial win.
Fixes#6418.
Makes sure the querier, the router, and the ingest replica CLI all
accept and validate ingester addresses the same, except whether or not
at least one value is required.
* feat: `PartitionRepo::list_ids`
* refactor: `CatalogPartitionsSource` => `CatalogToCompactPartitionsSource`
* feat: allow the compactor to process all known partitions
Closes#6648.
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* chore: Add more tests
* chore: Fix default ordering; implement ORDER BY
* feat: Add EXPLAIN support
* chore: Add additional tests to validate GROUP BY expansion
* chore: More test cases for TZ, and failing log scalar function
- do not wait for a non-empty partition result (this doesn't make sense
if we are not running endlessly)
- modify entry point to allow the compactor to exit on its own (this is
normally not allowed for other server types)
This debugging tool was more useful in previous situations where it was
harder to get real data as input for the compactor.
It's currently causing a flaky test that isn't worth investigating.
Fixes#6190 by making it moot.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Ensure a HTTP error response contains a well-formed JSON structure
containing "code" and "message" fields (for backwards compatibility with
existing InfluxDB versions) and a correct "content-type" header.
Instead of looping and polling a fresh set of partitions and
constructing a stream from that, use an endless stream instead. This
helps w/ efficiency during roll-overs since we can already start to
process the next set of partitions while the last ones from the previous
round are still in-progress.
Closes#6750.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: partition filters for TargetLevel version and a complete test
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* chore: run fmt after applying review suggestions in git
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: rename compact algo versions to reflect thier actual work
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
---------
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>