2024-01-08 16:50:59 +00:00
|
|
|
[package]
|
|
|
|
name = "influxdb3_server"
|
|
|
|
version.workspace = true
|
|
|
|
authors.workspace = true
|
|
|
|
edition.workspace = true
|
|
|
|
license.workspace = true
|
|
|
|
|
|
|
|
[dependencies]
|
2024-02-29 21:21:41 +00:00
|
|
|
# Core Crates
|
|
|
|
authz.workspace = true
|
|
|
|
data_types.workspace = true
|
|
|
|
datafusion_util.workspace = true
|
|
|
|
influxdb-line-protocol.workspace = true
|
|
|
|
iox_catalog.workspace = true
|
2024-03-28 17:33:17 +00:00
|
|
|
iox_http.workspace = true
|
2024-02-29 21:21:41 +00:00
|
|
|
iox_query.workspace = true
|
feat: add the `api/v3/query_influxql` API (#24696)
feat: add query_influxql api
This PR adds support for the /api/v3/query_influxql API. This re-uses code from the existing query_sql API, but some refactoring was done to allow for code re-use between the two.
The main change to the original code from the existing query_sql API was that the format is determined up front, in the event that the user provides some incorrect Accept header, so that the 400 BAD REQUEST is returned before performing the query.
Support of several InfluxQL queries that previously required a bridge to be executed in 3.0 was added:
SHOW MEASUREMENTS
SHOW TAG KEYS
SHOW TAG VALUES
SHOW FIELD KEYS
SHOW DATABASES
Handling of qualified measurement names in SELECT queries (see below)
This is accomplished with the newly added iox_query_influxql_rewrite crate, which provides the means to re-write an InfluxQL statement to strip out a database name and retention policy, if provided. Doing so allows the query_influxql API to have the database parameter optional, as it may be provided in the query string.
Handling qualified measurement names in SELECT
The implementation in this PR will inspect all measurements provided in a FROM clause and extract the database (DB) name and retention policy (RP) name (if not the default). If multiple DB/RP's are provided, an error is thrown.
Testing
E2E tests were added for performing basic queries against a running server on both the query_sql and query_influxql APIs. In addition, the test for query_influxql includes some of the InfluxQL-specific queries, e.g., SHOW MEASUREMENTS.
Other Changes
The influxdb3_client now has the api_v3_query_influxql method (and a basic test was added for this)
2024-03-01 17:27:38 +00:00
|
|
|
iox_query_params.workspace = true
|
|
|
|
iox_query_influxql.workspace = true
|
2024-05-17 16:04:25 +00:00
|
|
|
iox_system_tables.workspace = true
|
2024-02-29 21:21:41 +00:00
|
|
|
iox_time.workspace = true
|
|
|
|
metric.workspace = true
|
|
|
|
metric_exporters.workspace = true
|
|
|
|
observability_deps.workspace = true
|
|
|
|
schema.workspace = true
|
|
|
|
service_common.workspace = true
|
|
|
|
service_grpc_flight.workspace = true
|
|
|
|
trace.workspace = true
|
|
|
|
trace_exporters.workspace = true
|
|
|
|
trace_http.workspace = true
|
|
|
|
tracker.workspace = true
|
|
|
|
|
|
|
|
# Local Deps
|
2024-11-27 13:41:46 +00:00
|
|
|
influxdb3_cache = { path = "../influxdb3_cache" }
|
2024-08-02 20:04:12 +00:00
|
|
|
influxdb3_catalog = { path = "../influxdb3_catalog" }
|
2025-01-06 22:32:17 +00:00
|
|
|
influxdb3_client = { path = "../influxdb3_client" }
|
2024-09-18 15:44:04 +00:00
|
|
|
influxdb3_id = { path = "../influxdb3_id" }
|
2025-01-10 01:13:20 +00:00
|
|
|
influxdb3_internal_api = { path = "../influxdb3_internal_api" }
|
2024-04-01 20:57:10 +00:00
|
|
|
influxdb3_process = { path = "../influxdb3_process", default-features = false }
|
refactor: implement new wal and refactor write buffer (#25196)
* feat: refactor WAL and WriteBuffer
There is a ton going on here, but here are the high level things. This implements a new WAL, which is backed entirely by object store. It then updates the WriteBuffer to be able to work with how the new WAL works, which also required an update to how the Catalog is modified and persisted.
The concept of Segments has been removed. Previously there was a separate WAL per segment of time. Instead, there is now a single WAL that all writes and updates flow into. Data within the write buffer is organized by Chunk(s) within tables, which is based on the timestamp of the row data. These are known as the Level0 files, which will be persisted as Parquet into object store. The default chunk duration for level 0 files is 10 minutes.
The WAL is written as single files that get created at the configured WAL flush interval (1s by default). After a certain number of files have been created, the server will attempt to snapshot the WAL (default is to snapshot the first 600 files of the WAL after we have 900 total, i.e. snapshot 10 minutes of WAL data).
The design goal with this is to persist 10 minute chunks of data that are no longer receiving writes, while clearing out old WAL files. This works if data getting written in around "now" with no more than 5 minutes of delay. If we continue to have delayed writes, a snapshot of all data will be forced in order to clear out the WAL and free up memory in the buffer.
Overall, this structure of a single wal, with flushes and snapshots and chunks in the queryable buffer led to a simpler setup for the write buffer overall. I was able to clear out quite a bit of code related to the old segment organization.
Fixes #25142 and fixes #25173
* refactor: address PR feedback
* refactor: wal to replay and background flush on new
* chore: remove stray println
2024-08-01 19:04:15 +00:00
|
|
|
influxdb3_wal = { path = "../influxdb3_wal"}
|
2024-09-18 15:44:04 +00:00
|
|
|
influxdb3_write = { path = "../influxdb3_write" }
|
feat: add the `api/v3/query_influxql` API (#24696)
feat: add query_influxql api
This PR adds support for the /api/v3/query_influxql API. This re-uses code from the existing query_sql API, but some refactoring was done to allow for code re-use between the two.
The main change to the original code from the existing query_sql API was that the format is determined up front, in the event that the user provides some incorrect Accept header, so that the 400 BAD REQUEST is returned before performing the query.
Support of several InfluxQL queries that previously required a bridge to be executed in 3.0 was added:
SHOW MEASUREMENTS
SHOW TAG KEYS
SHOW TAG VALUES
SHOW FIELD KEYS
SHOW DATABASES
Handling of qualified measurement names in SELECT queries (see below)
This is accomplished with the newly added iox_query_influxql_rewrite crate, which provides the means to re-write an InfluxQL statement to strip out a database name and retention policy, if provided. Doing so allows the query_influxql API to have the database parameter optional, as it may be provided in the query string.
Handling qualified measurement names in SELECT
The implementation in this PR will inspect all measurements provided in a FROM clause and extract the database (DB) name and retention policy (RP) name (if not the default). If multiple DB/RP's are provided, an error is thrown.
Testing
E2E tests were added for performing basic queries against a running server on both the query_sql and query_influxql APIs. In addition, the test for query_influxql includes some of the InfluxQL-specific queries, e.g., SHOW MEASUREMENTS.
Other Changes
The influxdb3_client now has the api_v3_query_influxql method (and a basic test was added for this)
2024-03-01 17:27:38 +00:00
|
|
|
iox_query_influxql_rewrite = { path = "../iox_query_influxql_rewrite" }
|
2024-12-02 10:55:37 +00:00
|
|
|
influxdb3_sys_events = { path = "../influxdb3_sys_events" }
|
2024-10-01 17:34:00 +00:00
|
|
|
influxdb3_telemetry = { path = "../influxdb3_telemetry" }
|
2024-01-08 16:50:59 +00:00
|
|
|
|
2024-02-29 21:21:41 +00:00
|
|
|
# crates.io Dependencies
|
feat: support v1 query API (#24746)
feat: support the v1 query API
This PR adds support for the `/api/v1/query` API, which is meant to
serve the original InfluxDB v1 query API, to serve single statement
`SELECT` and `SHOW` queries. The response, which is returned as JSON,
can be chunked via the `chunked` and optional `chunk_size` parameters.
An optional `epoch` parameter can be supplied to have `time` column
timestamps converted to a UNIX epoch with the given precision.
## Buffering
The response is buffered by default, but if the `chunked` parameter
is not supplied, or is passed as `false`, then the entire query
result will be buffered into memory before being returned in the
response. This is how the original API behaves, so we are replicating
that here.
When `chunked` is passed as `true`, then the response will be a
stream of chunks, where each chunk is a self-contained response,
with the same structure as that of the non-chunked response. Chunks
are split up by the provided `chunk_size`, or by series, i.e.,
measurement, which ever comes first. The default chunk size is 10,000
rows.
Buffering is implemented with the `QueryResponseStream` and
`ChunkBuffer` types, the former implements the `Stream` trait,
which allows it to be streamed in the HTTP response directly with
`hyper`'s `Body::wrap_stream`. The `QueryResponseStream` is a wrapper
around the inner arrow `RecordBatchStream`, which buffers the
streamed `RecordBatch`es according to the requested chunking parameters.
## Testing
Two new E2E tests were added to test basic query functionality and
chunking behaviour, respectively. In addition, some manual testing
was done to verify that the InfluxDB Grafana plugin works with this
API.
2024-03-15 17:38:15 +00:00
|
|
|
anyhow.workspace = true
|
2024-02-29 21:21:41 +00:00
|
|
|
arrow.workspace = true
|
2024-07-05 13:21:40 +00:00
|
|
|
arrow-array.workspace = true
|
2024-02-29 21:21:41 +00:00
|
|
|
arrow-csv.workspace = true
|
2024-02-26 20:07:48 +00:00
|
|
|
arrow-flight.workspace = true
|
2024-02-29 21:21:41 +00:00
|
|
|
arrow-json.workspace = true
|
|
|
|
arrow-schema.workspace = true
|
|
|
|
async-trait.workspace = true
|
2024-03-06 17:43:00 +00:00
|
|
|
base64.workspace = true
|
2024-02-29 21:21:41 +00:00
|
|
|
bytes.workspace = true
|
|
|
|
chrono.workspace = true
|
2024-06-25 12:45:55 +00:00
|
|
|
csv.workspace = true
|
2024-02-29 21:21:41 +00:00
|
|
|
datafusion.workspace = true
|
|
|
|
flate2.workspace = true
|
|
|
|
futures.workspace = true
|
|
|
|
hex.workspace = true
|
|
|
|
hyper.workspace = true
|
2024-11-27 13:41:46 +00:00
|
|
|
humantime.workspace = true
|
2024-07-16 14:32:26 +00:00
|
|
|
mime.workspace = true
|
2024-02-29 21:21:41 +00:00
|
|
|
object_store.workspace = true
|
|
|
|
parking_lot.workspace = true
|
|
|
|
pin-project-lite.workspace = true
|
feat: support v1 query API (#24746)
feat: support the v1 query API
This PR adds support for the `/api/v1/query` API, which is meant to
serve the original InfluxDB v1 query API, to serve single statement
`SELECT` and `SHOW` queries. The response, which is returned as JSON,
can be chunked via the `chunked` and optional `chunk_size` parameters.
An optional `epoch` parameter can be supplied to have `time` column
timestamps converted to a UNIX epoch with the given precision.
## Buffering
The response is buffered by default, but if the `chunked` parameter
is not supplied, or is passed as `false`, then the entire query
result will be buffered into memory before being returned in the
response. This is how the original API behaves, so we are replicating
that here.
When `chunked` is passed as `true`, then the response will be a
stream of chunks, where each chunk is a self-contained response,
with the same structure as that of the non-chunked response. Chunks
are split up by the provided `chunk_size`, or by series, i.e.,
measurement, which ever comes first. The default chunk size is 10,000
rows.
Buffering is implemented with the `QueryResponseStream` and
`ChunkBuffer` types, the former implements the `Stream` trait,
which allows it to be streamed in the HTTP response directly with
`hyper`'s `Body::wrap_stream`. The `QueryResponseStream` is a wrapper
around the inner arrow `RecordBatchStream`, which buffers the
streamed `RecordBatch`es according to the requested chunking parameters.
## Testing
Two new E2E tests were added to test basic query functionality and
chunking behaviour, respectively. In addition, some manual testing
was done to verify that the InfluxDB Grafana plugin works with this
API.
2024-03-15 17:38:15 +00:00
|
|
|
secrecy.workspace = true
|
2024-02-29 21:21:41 +00:00
|
|
|
serde.workspace = true
|
|
|
|
serde_json.workspace = true
|
|
|
|
serde_urlencoded.workspace = true
|
|
|
|
sha2.workspace = true
|
|
|
|
thiserror.workspace = true
|
|
|
|
tokio.workspace = true
|
|
|
|
tokio-util.workspace = true
|
|
|
|
tonic.workspace = true
|
|
|
|
tower.workspace = true
|
|
|
|
unicode-segmentation.workspace = true
|
2024-01-08 16:50:59 +00:00
|
|
|
|
2025-01-06 22:32:17 +00:00
|
|
|
[dependencies.pyo3]
|
|
|
|
version = "0.23.3"
|
|
|
|
# this is necessary to automatically initialize the Python interpreter
|
|
|
|
features = ["auto-initialize"]
|
|
|
|
optional = true
|
|
|
|
|
|
|
|
[features]
|
|
|
|
system-py = ["pyo3"]
|
|
|
|
|
2024-01-08 16:50:59 +00:00
|
|
|
[dev-dependencies]
|
2024-02-29 21:21:41 +00:00
|
|
|
# Core Crates
|
2024-02-12 17:04:05 +00:00
|
|
|
parquet.workspace = true
|
2024-02-29 21:21:41 +00:00
|
|
|
parquet_file.workspace = true
|
|
|
|
test_helpers.workspace = true
|
|
|
|
|
|
|
|
# crates.io crates
|
|
|
|
http.workspace = true
|
|
|
|
hyper.workspace = true
|
2024-08-08 12:46:26 +00:00
|
|
|
test-log.workspace = true
|
2024-02-29 21:21:41 +00:00
|
|
|
urlencoding.workspace = true
|
|
|
|
pretty_assertions.workspace = true
|