Commit Graph

28 Commits (docker_plugin_improvements)

Author SHA1 Message Date
Michael Gattozzi b7d2475ea6
chore: move separate cli and server tests (#25934)
This commit restructures our tests to look like Enterprise in their
layout. We break cli.rs into it's own module, combine the server tests
and cli tests under one lib.rs file and handle the changes for
visibility and import paths needed to make things work. the packages
tests have been cfged out as a module so that it would not need to be
added on a per test basis. Note that those tests fail locally for me
currently, but it seems like we weren't testing these in CI at the
moment.

There is no issue for this.
2025-01-31 11:41:44 -05:00
praveen-influx d39a4a2f21
feat: rename json_lines to jsonl for cli (#25888)
closes: https://github.com/influxdata/influxdb/issues/25873
2025-01-21 17:32:55 +00:00
praveen-influx 447f66d9a7
fix: allow default browser header (#25885)
Although the `format` in the request is used, the value coming
through the header is parsed earlier. So, when that lookup in
the header fails an error is returned (`InvalidMimeType`).

In this commit, there are extra checks to allow the default `Accept`
header values that come from the browser by defaulting it to `json`

closes: https://github.com/influxdata/influxdb/issues/25874
2025-01-21 13:43:07 +00:00
Trevor Hilton b8a94488b5
feat: support v1 query API GROUP BY semantics (#25845)
This updates the v1 /query API hanlder to handle InfluxDB v1's unique
query response structure when GROUP BY clauses are provided.

The distinction is in the addition of a "tags" field to the emitted series
data that contains a map of the GROUP BY tags along with their distinct
values associated with the data in the "values" field.

This required splitting the QueryExecutor into two query paths for InfluxQL
and SQL, as this allowed for handling InfluxQL query parsing in advance
of query planning.

A set of snapshot tests were added to check that it all works.
2025-01-16 11:59:01 -05:00
Michael Gattozzi aa8a8c560d
feat: Set 72 hour query/write limit for Core (#25810)
This commit sets InfluxDB 3 Core to have a 72 hour limit for queries and
writes. What this means is that writes that contain historical data
older than 72 hours will be rejected and queries will filter out data
older than 72 hours. Core is intended to be a recent timeseries database
and performance over data older than 72 hours will degrade without a
garbage collector, a core feature of InfluxDB 3 Enterprise. InfluxDB 3
Enterprise does not have this write or query limit in place.

Note that this does *not* mean older data is deleted. Older data is
still accessible in object storage as Parquet files that can still be
used in other services and analyzed with dataframe libraries like pandas
and polars.

This commit does a few things:
- Uses timestamps in the year 2065 for tests as these should not break
  for longer than many of us will be working in our lifetimes. This is
  only needed for the integration tests as other tests use the
  MockProvider for time.
- Filters the buffer and persisted files to only show data newer than
  3 days ago
- Fixes the integration tests to work with the fact that writes older
  than 3 days are rejected
2025-01-12 13:08:01 -05:00
Trevor Hilton c71dafc313
refactor: rename metadata cache to distinct value cache (#25775) 2025-01-10 08:48:51 -05:00
Trevor Hilton 23866ef3d1
fix: /query API emits empty result as JSON (#25769) 2025-01-09 09:26:30 -05:00
Trevor Hilton dfc853d903
feat: handle params in request body for /query API (#25762)
Closes #25749

This changes the `/query` API handler so that the parameters can be passed in either the request URI or in the request body for either a `GET` or `POST` request.

Parameters can be specified in the URI, the body, or both; if they are specified in both places, those in the body will take precedent.

Error variants in the HTTP server code related to missing request parameters were updated to return `400` status.
2025-01-07 19:53:29 -05:00
Trevor Hilton 6524f383ba
feat: show databases CLI/API (#25748)
_Follows #25737 (keeping in draft until that merges)_

Closes #25745 

This PR provides both a CLI and underlying API for listing databases in the InfluxDB 3 Core server. Details are below.

There was already a method to list databases for the query executor for InfluxQL; this works by exposing that via the `HttpApi` in `influxdb3_server`.

However, one thing that we may address is that the query result for that uses `iox::database` as the column name. If we are removing references to `iox`, then we may want to just have it as `database`. I left it as is, for now, because I wanted to keep code churn down and wasn't sure why we use that prefix in the first place for the `SHOW DATABASES` and `SHOW RETENTION POLICIES` InfluxQL queries.

## Details

### CLI

This PR provides the `influxdb3 show` CLI:
```
influxdb3 show -h
List resources on the InfluxDB 3 Core server

Usage: influxdb3 show <COMMAND>

Commands:
  databases  List databases
  help       Print this message or the help of the given subcommand(s)

Options:
  -h, --help  Print help information
```
with the ability to list databases:
```
influxdb3 show databases -h
List databases

Usage: influxdb3 show databases [OPTIONS]

Options:
  -H, --host <HOST_URL>         The host URL of the running InfluxDB 3 Core server [env: INFLUXDB3_HOST_URL=] [default: http://127.0.0.1:8181]
      --token <AUTH_TOKEN>      The token for authentication with the InfluxDB 3 Core server [env: INFLUXDB3_AUTH_TOKEN=]
      --show-deleted            Include databases that were marked as deleted in the output
      --format <OUTPUT_FORMAT>  The format in which to output the list of databases [default: pretty] [possible values: pretty, json, json_lines, csv]
  -h, --help                    Print help information
```
Since this uses the query executor, we can pass a `--format` argument to get the output as JSON, CSV, or JSONL, but by default, it uses the `pretty` format:
```
influxdb3 show databases
+---------------+
| iox::database |
+---------------+
| bar           |
+---------------+
```
The `--show-deleted` flag will have the `deleted` column displayed as well as any databases that have been marked as deleted:
```
influxdb3 show databases --show-deleted
+---------------------+---------+
| iox::database       | deleted |
+---------------------+---------+
| bar                 | false   |
| foo-20250105T202949 | true    |
+---------------------+---------+
```

### API

The API to list databases can be invoked via:
```
GET /api/v3/configure/database
```
with optional parameters:
* `format`: `pretty`, `json`, `csv`, `parquet`, or `jsonl`
* `show_deleted`: `bool`, defaults to `false`

Note that `database` is singular in the API endpoint, to be consistent with the other database related create/delete API endpoints. We could change it to be plural `databases` if that is the convention we want to go with.
2025-01-06 21:08:12 -05:00
Michael Gattozzi c764d37636
feat: Add json lines support to query output (#25698) 2024-12-20 14:57:19 -05:00
Michael Gattozzi 2a132f1edf
fix: change failed query with missing db to 404 (#25693) 2024-12-20 11:53:13 -05:00
Paul Dix 0eab724bee
fix: Field not in queryable buffer (#25691) 2024-12-19 19:22:47 -05:00
Paul Dix 56576402cc
fix: Ensure tags are never null (#25680)
* fix: Ensure tags are never null

This injects empty strings into tags for any rows in the buffer where the tag value is null. This is required because the tags are what make up the series key, which must have all non-null values.

There is an ongoing discussion about what the real behavior should be here, but for now this will get our users running that break without this behavior. Discussion is in #25674.

Fixes #25648

* fix: clippy failures
2024-12-18 17:09:23 -05:00
Trevor Hilton 3a66fe0ec3
fix: flaky metadata cache JSON test (#25638) 2024-12-10 09:47:00 -08:00
Trevor Hilton ef3599d7ce
test: metadata cache query using JSON format (#25626) 2024-12-06 15:36:49 -05:00
Trevor Hilton 234d37329a
feat: metacache REST APIs to create and delete (#25587) 2024-11-27 08:41:46 -05:00
Paul Dix 43877beb15
fix: query bugs with buffer (#25213)
* fix: query bugs with buffer

This fixes three different bugs with the buffer. First was that aggregations would fail because projection was pushed down to the in-buffer data that de-duplication needs to be called on. The test in influxdb3/tests/server/query.rs catches that.

I also added a test in write_buffer/mod.rs to ensure that data is correctly queryable when combining with different states: only data in buffer, only data in parquet files, and data across both. This showed two bugs, one where the parquet data was being doubled up (parquet chunks were being created in write buffer mod and in queryable buffer. The second was that the timestamp min max on table buffer would panic if the buffer was empty.

* refactor: PR feedback

* fix: fix wal replay and buffer snapshot

Fixes two problems uncovered by adding to the write_buffer/mod.rs test. Ensures we can replay wal data and that snapshots work properly with replayed data.

* fix: run cargo update to fix audit
2024-08-07 16:00:17 -04:00
Jean Arhancet 1fd355ed83
refactor: v1 recordbatch to json (#25085)
* refactor: refactor serde json to use recordbatch

* fix: cargo audit with cargo update

* fix: add timestamp datatype

* fix: add timestamp datatype

* fix: apply feedbacks

* fix: cargo audit with cargo update

* fix: add timestamp datatype

* fix: apply feedbacks

* refactor: test data conversion
2024-07-05 09:21:40 -04:00
Jean Arhancet b6718e59e3
feat: add csv influx v1 (#25030)
* feat: add csv influx v1

* fix: clippy error

* fix: cargo.lock

* fix: apply feedbacks

* test: add csv integration test

* fix: cargo audit
2024-06-25 08:45:55 -04:00
Jean Arhancet 62d1c67b14
refactor: remove arrow_batchtes_to_json (#25046)
* refactor: remove arrow_batchtes_to_json

* test: query v3 json format
2024-06-10 15:25:12 -04:00
Trevor Hilton 9354c22f2c
chore: remove _series_id (#24969)
Removed the _series_id column that stored a SHA256 hash of the tag set
for each write.

Updated all test assertions that made reference to it.

Corrected the limits on columns to un-account for the additional _series_id
column.
2024-05-08 12:28:49 -04:00
Michael Gattozzi 2291ebeae7
feat: sort and dedupe on persist (#24870)
When persisting parquet files we now will sort and dedupe on persist using the
COMPACT operation implemented in IOx Query. Note that right now we don't choose
any column to sort on and default to no column. This means that we dedupe and
sort on whatever the default behavior is for the COMPACT operation. Future
changes can figure out what columns to sort by when compacting the data.
2024-04-03 15:13:36 -04:00
Trevor Hilton c79821b246
feat: add `_series_id` to tables on write (#24842)
feat: add _series_id to tables on write

New _series_id column is added to tables; this stores a 32 byte SHA256 hash of the tag set of a line of Line Protocol. The tag set is checked for sort order, then sorted if not already, before producing the hash.

Unit tests were added to check hashing and sorting functions work.

Tests that performed queries needed to be modified to account for the new _series_id column; in general, SELECT * queries were altered to use a select clause with specific column names.

The Column limit was increased to 501 internally, to account for the new _series_id column, but the user-facing limit is still 500
2024-03-26 15:22:19 -04:00
Trevor Hilton 2febaff24b
feat: support query parameters (#24804)
feat: support query parameters

This adds support for parameters in the /api/v3/query_sql
and /api/v3/query_influxql API

The new parameter `params` is supported in the URL query string
of a GET request, or in the JSON body of a POST request.

Two new E2E tests were added to check successful GET/POST as well
as error scenario when params are not provided for a query string
that would expect them.
2024-03-23 10:41:00 -04:00
Trevor Hilton 1fe414c14b
feat: support v1 query API (#24746)
feat: support the v1 query API

This PR adds support for the `/api/v1/query` API, which is meant to
serve the original InfluxDB v1 query API, to serve single statement
`SELECT` and `SHOW` queries. The response, which is returned as JSON,
can be chunked via the `chunked` and optional `chunk_size` parameters.
An optional `epoch` parameter can be supplied to have `time` column
timestamps converted to a UNIX epoch with the given precision.

## Buffering

The response is buffered by default, but if the `chunked` parameter
is not supplied, or is passed as `false`, then the entire query
result will be buffered into memory before being returned in the
response. This is how the original API behaves, so we are replicating
that here.

When `chunked` is passed as `true`, then the response will be a
stream of chunks, where each chunk is a self-contained response,
with the same structure as that of the non-chunked response. Chunks
are split up by the provided `chunk_size`, or by series, i.e.,
measurement, which ever comes first. The default chunk size is 10,000
rows.

Buffering is implemented with the `QueryResponseStream` and
`ChunkBuffer` types, the former implements the `Stream` trait,
which allows it to be streamed in the HTTP response directly with
`hyper`'s `Body::wrap_stream`. The `QueryResponseStream` is a wrapper
around the inner arrow `RecordBatchStream`, which buffers the
streamed `RecordBatch`es according to the requested chunking parameters.

## Testing

Two new E2E tests were added to test basic query functionality and
chunking behaviour, respectively. In addition, some manual testing
was done to verify that the InfluxDB Grafana plugin works with this
API.
2024-03-15 13:38:15 -04:00
Trevor Hilton fb4f09d675
feat: support `SHOW RETENTION POLICIES` (#24729)
feat: support SHOW RETENTION POLICIES

Added support through the influxdb3 Query Executor to perform
SHOW RETENTION POLICIES queries, both on a specific database as well
as accross all databases.

Test cases were added to check this functionality.
2024-03-05 15:40:58 -05:00
Michael Gattozzi a5082ec432
feat: Add limits for InfluxDB Edge (#24703)
This commit is the final piece for the write_lp endpoint. It adds limits
to Edge such that:

- There can only be 5 Databases
- There can only be 500 Columns per Table
- There can only be 2000 Tables across all Databases

We do this by modifying the catalog code to error out whenever one of
these limits would be exceeded before permanently modifying the schema.
These are hard coded limits and cannot be configured by the user.

Closes #24554
2024-03-04 10:24:33 -05:00
Trevor Hilton f7892ebee5
feat: add the `api/v3/query_influxql` API (#24696)
feat: add query_influxql api

This PR adds support for the /api/v3/query_influxql API. This re-uses code from the existing query_sql API, but some refactoring was done to allow for code re-use between the two.

The main change to the original code from the existing query_sql API was that the format is determined up front, in the event that the user provides some incorrect Accept header, so that the 400 BAD REQUEST is returned before performing the query.

Support of several InfluxQL queries that previously required a bridge to be executed in 3.0 was added:

SHOW MEASUREMENTS
SHOW TAG KEYS
SHOW TAG VALUES
SHOW FIELD KEYS
SHOW DATABASES

Handling of qualified measurement names in SELECT queries (see below)

This is accomplished with the newly added iox_query_influxql_rewrite crate, which provides the means to re-write an InfluxQL statement to strip out a database name and retention policy, if provided. Doing so allows the query_influxql API to have the database parameter optional, as it may be provided in the query string.

Handling qualified measurement names in SELECT

The implementation in this PR will inspect all measurements provided in a FROM clause and extract the database (DB) name and retention policy (RP) name (if not the default). If multiple DB/RP's are provided, an error is thrown.

Testing

E2E tests were added for performing basic queries against a running server on both the query_sql and query_influxql APIs. In addition, the test for query_influxql includes some of the InfluxQL-specific queries, e.g., SHOW MEASUREMENTS.

Other Changes

The influxdb3_client now has the api_v3_query_influxql method (and a basic test was added for this)
2024-03-01 12:27:38 -05:00