Although the `format` in the request is used, the value coming
through the header is parsed earlier. So, when that lookup in
the header fails an error is returned (`InvalidMimeType`).
In this commit, there are extra checks to allow the default `Accept`
header values that come from the browser by defaulting it to `json`
closes: https://github.com/influxdata/influxdb/issues/25874
Related to https://github.com/influxdata/influxdb_pro/issues/436
This PR updates the filter handling in the `WriteBuffer` so that sets of `Expr`s provided in a query will better prune both chunks from the in-memory buffer, as well as the set of parquet file chunks that are forwarded to DataFusion, for query execution.
### New `BufferFilter` type
This introduces the [`BufferFilter`](bab428f0eb/influxdb3_write/src/lib.rs (L496)) type. This converts a set of `Expr`s from a logical query plan into a filter that can be used to:
* prune chunks based on a provided lower/upper `time` boundary from both the buffer and parquet
* prune chunks from the buffer based on any literal guarantees predicated on tag columns in the query, e.g., `WHERE tag = 'a'` or `WHERE tag IN ['a', 'b']`
This type is exposed such that it will be easy to use from replicated buffers and from the compactor when producing `Arc<dyn QueryChunk>`s in Enterprise.
### Tests
* Tests in the [`table_buffer`](bab428f0eb/influxdb3_write/src/write_buffer/table_buffer.rs) module were updated to use the `WriteValidator`. This allows construction of rows based on line protocol directly, and in cleaning up the tests a bit, allowed me to extend some of the test cases in [this test](bab428f0eb/influxdb3_write/src/write_buffer/table_buffer.rs (L979)).
* I added [a test](bab428f0eb/influxdb3_write/src/write_buffer/table_buffer.rs (L1243)) that checks the buffer chunk index filtering for expressions against multiple tag columns.
* Added [a test](bab428f0eb/influxdb3_write/src/write_buffer/table_buffer.rs (L1153)) that checks time pruning
* Added [a test](bab428f0eb/influxdb3_write/src/write_buffer/persisted_files.rs (L279)) that checks time pruning in `PersistedFiles`
* I renamed several tests to start with `test_`.
* chore: add out of order tests
- assertions for what remains in the queryable buffer when out of order
timestamps are encountered. This could be true for back filling, and
in that case back filled data takes over the queryable buffer and
moving all the recent data into parquet files (as part of snapshotting)
- assertions to check last cache still retains the most recent values
when out of order data is encountered
* chore: update comment
Co-authored-by: Trevor Hilton <thilton@influxdata.com>
---------
Co-authored-by: Trevor Hilton <thilton@influxdata.com>
* feat(processing_engine): Add cron plugins and triggers to the processing engine.
* feat(processing_engine): switch from 'cron plugin' to 'schedule plugin', use TimeProvider.
* feat(processing_engine): add test for test scheduled plugin.
* feat: improve plugin logging interface
Updates the plugin log functions so they can take any number of Python objects which will be converted into a single log line string.
Closes#25847
* refactor: udpate on PR feedback
* feat: return better plugin execution errors
This sets up the framework for fleshing out more useful plugin execution errors that get returned to the user during testing. We'll also want to capture these for logging in system tables.
Also fixes a test that was broken in previous commit on time limits. Didn't show up because of the feature flag.
* fix: compile errors without system-py feature
* refactor: update tests for wal file removal
- update the last wal file seen first so that removal doesn't
wait for one more cycle
- added the worked out example test
- minor tidy ups (introduce inner so that block scopes are delegated)
* refactor: address PR feedback
This updates the v1 /query API hanlder to handle InfluxDB v1's unique
query response structure when GROUP BY clauses are provided.
The distinction is in the addition of a "tags" field to the emitted series
data that contains a map of the GROUP BY tags along with their distinct
values associated with the data in the "values" field.
This required splitting the QueryExecutor into two query paths for InfluxQL
and SQL, as this allowed for handling InfluxQL query parsing in advance
of query planning.
A set of snapshot tests were added to check that it all works.
Creates AmazonS3 object using all environmental variables for additional cases where command line parameters are not appropriate.
This makes step 4 here possible - https://docs.aws.amazon.com/sdk-for-rust/latest/dg/credproviders.html
Alternative approach may have been to add a similar command line option for AWS_CONTAINER_CREDENTIALS_RELATIVE_URI but this makes no sense to provide on a CLI given it is only to be set automatically on Fargate and EKS containers.
As from_env() looks up all relevant environmental variables, it no longer makes sense to look for them as part of the CLI option parsing, so those relevant env options are removed.
Fixes#25828
This commit sets InfluxDB 3 Core to have a 72 hour limit for queries and
writes. What this means is that writes that contain historical data
older than 72 hours will be rejected and queries will filter out data
older than 72 hours. Core is intended to be a recent timeseries database
and performance over data older than 72 hours will degrade without a
garbage collector, a core feature of InfluxDB 3 Enterprise. InfluxDB 3
Enterprise does not have this write or query limit in place.
Note that this does *not* mean older data is deleted. Older data is
still accessible in object storage as Parquet files that can still be
used in other services and analyzed with dataframe libraries like pandas
and polars.
This commit does a few things:
- Uses timestamps in the year 2065 for tests as these should not break
for longer than many of us will be working in our lifetimes. This is
only needed for the integration tests as other tests use the
MockProvider for time.
- Filters the buffer and persisted files to only show data newer than
3 days ago
- Fixes the integration tests to work with the fact that writes older
than 3 days are rejected
This changes the CLI arg `host-id` to `writer-id` to more accurately
indicate meaning.
This changes also goes through the codebase and changes struct fields,
methods, and variables to use the term `writer_id` or `writer_identifier_prefix`
instead of `host_id` etc., to make the meaning clear in the code.
This also changes the catalog serialization to use the field `writer_id`
instead of `host_id`, which is breaking change.
This updates the create plugin API and CLI so that it doesn't take the pugin code, but instead takes a file name of a file that must be in the plugin-dir of the server. It returns an error if the plugin-dir is not configured or if the file isn't there.
Also updates the WAL and catalog so that it doesn't store the plugin code directly. The code is read from disk one time when the plugin runs.
Closes#25797
* feat: introduce num wal files to keep
This commit allows a configurable number of wal files to be left behind
in OS. This is necessary as enterprise replicas rely on these files.
closes: https://github.com/influxdata/influxdb/issues/25788
* refactor: address PR feedback
* refactor: address PR comment
This allows the user to specify arguments that will be passed to each execution of a wal plugin trigger. The CLI test was updated to check this end to end.
Closes#25655
This updates the WAL so that it can have new file notifiers added that will get updated when the wal flushes. The processing engine now implements the WALNotifier trait.
I've updated the CLI test for creating a trigger to run and end-to-end test that defines a plugin, creates a trigger, writes data into the database, triggering the plugin, which writes summary statistics back into the database in a different table. The test queries the destination table to confirm that the plugin worked.
* feat: Update WAL plugin for new structure
This ended up being a very large change set. In order to get around circular dependencies, the processing engine had to be moved into its own crate, which I think is ultimately much cleaner.
Unfortunately, this required changing a ton of things. There's more testing and things to add on to this, but I think it's important to get this through and build on it.
Importantly, the processing engine no longer resides inside the write buffer. Instead, it is attached to the HTTP server. It is now able to take a query executor, write buffer, and WAL so that the full range of functionality of the server can be exposed to the plugin API.
There are a bunch of system-py feature flags littered everywhere, which I'm hoping we can remove soon.
* refactor: PR feedback
This ended up being a couple things rolled into one. In order to add a query API to the Python plugin, I had to pull the QueryExecutor trait out of server into a place so that the python crate could use it.
This implements the query API, but also fixes up the WAL plugin test CLI a bit. I've added a test in the CLI section so that it shows end-to-end operation of the WAL plugin test API and exercise of the entire Plugin API.
Closes#25757