influxdb/influxdb3_server/Cargo.toml

105 lines
2.9 KiB
TOML
Raw Normal View History

[package]
name = "influxdb3_server"
version.workspace = true
authors.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
# Core Crates
authz.workspace = true
data_types.workspace = true
datafusion_util.workspace = true
influxdb-line-protocol.workspace = true
influxdb_influxql_parser.workspace = true
iox_catalog.workspace = true
iox_http.workspace = true
iox_query.workspace = true
feat: add the `api/v3/query_influxql` API (#24696) feat: add query_influxql api This PR adds support for the /api/v3/query_influxql API. This re-uses code from the existing query_sql API, but some refactoring was done to allow for code re-use between the two. The main change to the original code from the existing query_sql API was that the format is determined up front, in the event that the user provides some incorrect Accept header, so that the 400 BAD REQUEST is returned before performing the query. Support of several InfluxQL queries that previously required a bridge to be executed in 3.0 was added: SHOW MEASUREMENTS SHOW TAG KEYS SHOW TAG VALUES SHOW FIELD KEYS SHOW DATABASES Handling of qualified measurement names in SELECT queries (see below) This is accomplished with the newly added iox_query_influxql_rewrite crate, which provides the means to re-write an InfluxQL statement to strip out a database name and retention policy, if provided. Doing so allows the query_influxql API to have the database parameter optional, as it may be provided in the query string. Handling qualified measurement names in SELECT The implementation in this PR will inspect all measurements provided in a FROM clause and extract the database (DB) name and retention policy (RP) name (if not the default). If multiple DB/RP's are provided, an error is thrown. Testing E2E tests were added for performing basic queries against a running server on both the query_sql and query_influxql APIs. In addition, the test for query_influxql includes some of the InfluxQL-specific queries, e.g., SHOW MEASUREMENTS. Other Changes The influxdb3_client now has the api_v3_query_influxql method (and a basic test was added for this)
2024-03-01 17:27:38 +00:00
iox_query_params.workspace = true
iox_query_influxql.workspace = true
iox_system_tables.workspace = true
iox_time.workspace = true
metric.workspace = true
metric_exporters.workspace = true
observability_deps.workspace = true
schema.workspace = true
service_common.workspace = true
service_grpc_flight.workspace = true
trace.workspace = true
trace_exporters.workspace = true
trace_http.workspace = true
tracker.workspace = true
# Local Deps
influxdb3_cache = { path = "../influxdb3_cache" }
influxdb3_catalog = { path = "../influxdb3_catalog" }
influxdb3_client = { path = "../influxdb3_client" }
influxdb3_id = { path = "../influxdb3_id" }
influxdb3_internal_api = { path = "../influxdb3_internal_api" }
influxdb3_process = { path = "../influxdb3_process", default-features = false }
influxdb3_processing_engine = { path = "../influxdb3_processing_engine" }
refactor: implement new wal and refactor write buffer (#25196) * feat: refactor WAL and WriteBuffer There is a ton going on here, but here are the high level things. This implements a new WAL, which is backed entirely by object store. It then updates the WriteBuffer to be able to work with how the new WAL works, which also required an update to how the Catalog is modified and persisted. The concept of Segments has been removed. Previously there was a separate WAL per segment of time. Instead, there is now a single WAL that all writes and updates flow into. Data within the write buffer is organized by Chunk(s) within tables, which is based on the timestamp of the row data. These are known as the Level0 files, which will be persisted as Parquet into object store. The default chunk duration for level 0 files is 10 minutes. The WAL is written as single files that get created at the configured WAL flush interval (1s by default). After a certain number of files have been created, the server will attempt to snapshot the WAL (default is to snapshot the first 600 files of the WAL after we have 900 total, i.e. snapshot 10 minutes of WAL data). The design goal with this is to persist 10 minute chunks of data that are no longer receiving writes, while clearing out old WAL files. This works if data getting written in around "now" with no more than 5 minutes of delay. If we continue to have delayed writes, a snapshot of all data will be forced in order to clear out the WAL and free up memory in the buffer. Overall, this structure of a single wal, with flushes and snapshots and chunks in the queryable buffer led to a simpler setup for the write buffer overall. I was able to clear out quite a bit of code related to the old segment organization. Fixes #25142 and fixes #25173 * refactor: address PR feedback * refactor: wal to replay and background flush on new * chore: remove stray println
2024-08-01 19:04:15 +00:00
influxdb3_wal = { path = "../influxdb3_wal"}
influxdb3_write = { path = "../influxdb3_write" }
feat: add the `api/v3/query_influxql` API (#24696) feat: add query_influxql api This PR adds support for the /api/v3/query_influxql API. This re-uses code from the existing query_sql API, but some refactoring was done to allow for code re-use between the two. The main change to the original code from the existing query_sql API was that the format is determined up front, in the event that the user provides some incorrect Accept header, so that the 400 BAD REQUEST is returned before performing the query. Support of several InfluxQL queries that previously required a bridge to be executed in 3.0 was added: SHOW MEASUREMENTS SHOW TAG KEYS SHOW TAG VALUES SHOW FIELD KEYS SHOW DATABASES Handling of qualified measurement names in SELECT queries (see below) This is accomplished with the newly added iox_query_influxql_rewrite crate, which provides the means to re-write an InfluxQL statement to strip out a database name and retention policy, if provided. Doing so allows the query_influxql API to have the database parameter optional, as it may be provided in the query string. Handling qualified measurement names in SELECT The implementation in this PR will inspect all measurements provided in a FROM clause and extract the database (DB) name and retention policy (RP) name (if not the default). If multiple DB/RP's are provided, an error is thrown. Testing E2E tests were added for performing basic queries against a running server on both the query_sql and query_influxql APIs. In addition, the test for query_influxql includes some of the InfluxQL-specific queries, e.g., SHOW MEASUREMENTS. Other Changes The influxdb3_client now has the api_v3_query_influxql method (and a basic test was added for this)
2024-03-01 17:27:38 +00:00
iox_query_influxql_rewrite = { path = "../iox_query_influxql_rewrite" }
influxdb3_sys_events = { path = "../influxdb3_sys_events" }
influxdb3_telemetry = { path = "../influxdb3_telemetry" }
# crates.io Dependencies
feat: support v1 query API (#24746) feat: support the v1 query API This PR adds support for the `/api/v1/query` API, which is meant to serve the original InfluxDB v1 query API, to serve single statement `SELECT` and `SHOW` queries. The response, which is returned as JSON, can be chunked via the `chunked` and optional `chunk_size` parameters. An optional `epoch` parameter can be supplied to have `time` column timestamps converted to a UNIX epoch with the given precision. ## Buffering The response is buffered by default, but if the `chunked` parameter is not supplied, or is passed as `false`, then the entire query result will be buffered into memory before being returned in the response. This is how the original API behaves, so we are replicating that here. When `chunked` is passed as `true`, then the response will be a stream of chunks, where each chunk is a self-contained response, with the same structure as that of the non-chunked response. Chunks are split up by the provided `chunk_size`, or by series, i.e., measurement, which ever comes first. The default chunk size is 10,000 rows. Buffering is implemented with the `QueryResponseStream` and `ChunkBuffer` types, the former implements the `Stream` trait, which allows it to be streamed in the HTTP response directly with `hyper`'s `Body::wrap_stream`. The `QueryResponseStream` is a wrapper around the inner arrow `RecordBatchStream`, which buffers the streamed `RecordBatch`es according to the requested chunking parameters. ## Testing Two new E2E tests were added to test basic query functionality and chunking behaviour, respectively. In addition, some manual testing was done to verify that the InfluxDB Grafana plugin works with this API.
2024-03-15 17:38:15 +00:00
anyhow.workspace = true
arrow.workspace = true
arrow-array.workspace = true
arrow-csv.workspace = true
arrow-flight.workspace = true
arrow-json.workspace = true
arrow-schema.workspace = true
async-trait.workspace = true
base64.workspace = true
bytes.workspace = true
chrono.workspace = true
csv.workspace = true
datafusion.workspace = true
flate2.workspace = true
futures.workspace = true
hashbrown.workspace = true
hex.workspace = true
hyper.workspace = true
humantime.workspace = true
mime.workspace = true
object_store.workspace = true
parking_lot.workspace = true
pin-project-lite.workspace = true
regex.workspace = true
feat: support v1 query API (#24746) feat: support the v1 query API This PR adds support for the `/api/v1/query` API, which is meant to serve the original InfluxDB v1 query API, to serve single statement `SELECT` and `SHOW` queries. The response, which is returned as JSON, can be chunked via the `chunked` and optional `chunk_size` parameters. An optional `epoch` parameter can be supplied to have `time` column timestamps converted to a UNIX epoch with the given precision. ## Buffering The response is buffered by default, but if the `chunked` parameter is not supplied, or is passed as `false`, then the entire query result will be buffered into memory before being returned in the response. This is how the original API behaves, so we are replicating that here. When `chunked` is passed as `true`, then the response will be a stream of chunks, where each chunk is a self-contained response, with the same structure as that of the non-chunked response. Chunks are split up by the provided `chunk_size`, or by series, i.e., measurement, which ever comes first. The default chunk size is 10,000 rows. Buffering is implemented with the `QueryResponseStream` and `ChunkBuffer` types, the former implements the `Stream` trait, which allows it to be streamed in the HTTP response directly with `hyper`'s `Body::wrap_stream`. The `QueryResponseStream` is a wrapper around the inner arrow `RecordBatchStream`, which buffers the streamed `RecordBatch`es according to the requested chunking parameters. ## Testing Two new E2E tests were added to test basic query functionality and chunking behaviour, respectively. In addition, some manual testing was done to verify that the InfluxDB Grafana plugin works with this API.
2024-03-15 17:38:15 +00:00
secrecy.workspace = true
serde.workspace = true
serde_json.workspace = true
serde_urlencoded.workspace = true
sha2.workspace = true
thiserror.workspace = true
tokio.workspace = true
tokio-util.workspace = true
tonic.workspace = true
tower.workspace = true
unicode-segmentation.workspace = true
[dependencies.pyo3]
version = "0.23.3"
# this is necessary to automatically initialize the Python interpreter
features = ["auto-initialize"]
optional = true
[features]
system-py = ["pyo3", "influxdb3_processing_engine/system-py"]
[dev-dependencies]
# Core Crates
parquet.workspace = true
parquet_file.workspace = true
test_helpers.workspace = true
# crates.io crates
http.workspace = true
hyper.workspace = true
test-log.workspace = true
urlencoding.workspace = true
pretty_assertions.workspace = true