This commit restructures our tests to look like Enterprise in their
layout. We break cli.rs into it's own module, combine the server tests
and cli tests under one lib.rs file and handle the changes for
visibility and import paths needed to make things work. the packages
tests have been cfged out as a module so that it would not need to be
added on a per test basis. Note that those tests fail locally for me
currently, but it seems like we weren't testing these in CI at the
moment.
There is no issue for this.
This refactors plugins and triggers so that plugins no longer need to be "created". Since plugins exist in either the configured local directory or on the Github repo, a user now only needs to create a trigger and reference the plugin filename.
Closes#25876
This commit sets InfluxDB 3 Core to have a 72 hour limit for queries and
writes. What this means is that writes that contain historical data
older than 72 hours will be rejected and queries will filter out data
older than 72 hours. Core is intended to be a recent timeseries database
and performance over data older than 72 hours will degrade without a
garbage collector, a core feature of InfluxDB 3 Enterprise. InfluxDB 3
Enterprise does not have this write or query limit in place.
Note that this does *not* mean older data is deleted. Older data is
still accessible in object storage as Parquet files that can still be
used in other services and analyzed with dataframe libraries like pandas
and polars.
This commit does a few things:
- Uses timestamps in the year 2065 for tests as these should not break
for longer than many of us will be working in our lifetimes. This is
only needed for the integration tests as other tests use the
MockProvider for time.
- Filters the buffer and persisted files to only show data newer than
3 days ago
- Fixes the integration tests to work with the fact that writes older
than 3 days are rejected
This adds a new system table "meta_caches" that allows users to view the
state of their metadata caches on a per-db basis
An integration test was added to verify that it works.
- instrumented code to get read and write measurement
- introduced EventsBucket for collection of reads/writes
- sampler now samples every minute for all metrics (including
reads/writes)
- other tidy ups
closes: https://github.com/influxdata/influxdb/issues/25372
This extends the system tables available with a new `parquet_files` table
which will list the parquet files associated with a given table in a
database.
Queries to system.parquet_files must provide a table_name predicate to
specify the table name of interest.
The files are accessed through the QueryableBuffer.
In addition, a test was added to check success and failure modes of the
new system table query.
Finally, the Persister trait had its associated error type removed. This
was somewhat of a consequence of how I initially implemented this change,
but I felt cleaned the code up a bit, so I kept it in the commit.
Added a new system table, system.last_caches, to enable queries that display information about last caches in a database.
You can query the table like so:
SELECT * FROM system.last_caches
Since queries are scoped to a database, this will only show last caches configured for the database being queried.
Results look like so:
+-------+----------------+----------------+---------------+-------+-----+
| table | name | key_columns | value_columns | count | ttl |
+-------+----------------+----------------+---------------+-------+-----+
| mem | mem_last_cache | [host, region] | [time, usage] | 1 | 60 |
+-------+----------------+----------------+---------------+-------+-----+
An end-to-end test was added to verify queries to the system.last_caches table.
Introduce the experimental series key feature to monolith, along with the new `/api/v3/write` API which accepts the new line protocol to write to tables containing a series key.
Series key
* The series key is supported in the `schema::Schema` type by the addition of a metadata entry that stores the series key members in their correct order. Writes that are received to `v3` tables must have the same series key for every single write.
Series key columns are `NOT NULL`
* Nullability of columns is enforced in the core `schema` crate based on a column's membership in the series key. So, when building a `schema::Schema` using `schema::SchemaBuilder`, the arrow `Field`s that are injected into the schema will have `nullable` set to false for columns that are part of the series key, as well as the `time` column.
* The `NOT NULL` _constraint_, if you can call it that, is enforced in the buffer (see [here](https://github.com/influxdata/influxdb/pull/25066/files#diff-d70ef3dece149f3742ff6e164af17f6601c5a7818e31b0e3b27c3f83dcd7f199R102-R119)) by ensuring there are no gaps in data buffered for series key columns.
Series key columns are still tags
* Columns in the series key are annotated as tags in the arrow schema, which for now means that they are stored as Dictionaries. This was done to avoid having to support a new column type for series key columns.
New write API
* This PR introduces the new write API, `/api/v3/write`, which accepts the new `v3` line protocol. Currently, the only part of the new line protocol proposed in https://github.com/influxdata/influxdb/issues/24979 that is supported is the series key. New data types are not yet supported for fields.
Split write paths
* To support the existing write path alongside the new write path, a new module was set up to perform validation in the `influxdb3_write` crate (`write_buffer/validator.rs`). This re-uses the existing write validation logic, and replicates it with needed changes for the new API. I refactored the validation code to use a state machine over a series of nested function calls to help distinguish the fallible validation/update steps from the infallible conversion steps.
* The code in that module could potentially be refactored to reduce code duplication.
feat: add _series_id to tables on write
New _series_id column is added to tables; this stores a 32 byte SHA256 hash of the tag set of a line of Line Protocol. The tag set is checked for sort order, then sorted if not already, before producing the hash.
Unit tests were added to check hashing and sorting functions work.
Tests that performed queries needed to be modified to account for the new _series_id column; in general, SELECT * queries were altered to use a select clause with specific column names.
The Column limit was increased to 501 internally, to account for the new _series_id column, but the user-facing limit is still 500
This commit is the final piece for the write_lp endpoint. It adds limits
to Edge such that:
- There can only be 5 Databases
- There can only be 500 Columns per Table
- There can only be 2000 Tables across all Databases
We do this by modifying the catalog code to error out whenever one of
these limits would be exceeded before permanently modifying the schema.
These are hard coded limits and cannot be configured by the user.
Closes#24554
feat: add query_influxql api
This PR adds support for the /api/v3/query_influxql API. This re-uses code from the existing query_sql API, but some refactoring was done to allow for code re-use between the two.
The main change to the original code from the existing query_sql API was that the format is determined up front, in the event that the user provides some incorrect Accept header, so that the 400 BAD REQUEST is returned before performing the query.
Support of several InfluxQL queries that previously required a bridge to be executed in 3.0 was added:
SHOW MEASUREMENTS
SHOW TAG KEYS
SHOW TAG VALUES
SHOW FIELD KEYS
SHOW DATABASES
Handling of qualified measurement names in SELECT queries (see below)
This is accomplished with the newly added iox_query_influxql_rewrite crate, which provides the means to re-write an InfluxQL statement to strip out a database name and retention policy, if provided. Doing so allows the query_influxql API to have the database parameter optional, as it may be provided in the query string.
Handling qualified measurement names in SELECT
The implementation in this PR will inspect all measurements provided in a FROM clause and extract the database (DB) name and retention policy (RP) name (if not the default). If multiple DB/RP's are provided, an error is thrown.
Testing
E2E tests were added for performing basic queries against a running server on both the query_sql and query_influxql APIs. In addition, the test for query_influxql includes some of the InfluxQL-specific queries, e.g., SHOW MEASUREMENTS.
Other Changes
The influxdb3_client now has the api_v3_query_influxql method (and a basic test was added for this)