For releases we need to have Docker images and binary images available for the
user to actually run influxdb3. These CI changes will build the binaries on a
release tag and the Docker image as well, test, sign, and publish them and make
them available for download.
Co-Authored-By: Brandon Pfeifer <bpfeifer@influxdata.com>
* feat: report system stats in load generator
Added the mechanism to report system stats during load generation. The
following stats are saved in a CSV file:
- cpu_usage
- disk_written_bytes
- disk_read_bytes
- memory
- virtual_memory
This only works when running the load generator against a local instance
of influxdb3, i.e., one that is running on your machine.
Generating system stats is done by passing the --system-stats flag to the
load generator.
* feat: /ping API to serve version
The /ping API was added, which is served at GET and
POST methods. The API responds with a JSON body
containing the version and revision of the build.
A new crate was added, influxdb3_process, which
takes the process_info.rs module from the influxdb3
crate, and puts it in a separate crate so that other
crates (influxdb3_server) can depend on it. This was
needed in order to have access to the version and
revision values, which are generated at build time,
in the HTTP API code of influxdb3_server.
A E2E test was added to check that /ping works.
E2E TestServer can now have logs emitted using the
TEST_LOG environment variable.
* feat: initial load generator implementation
This adds a load generator as a new crate. Initially it only generates write load, but the scaffolding is there to add a query load generator to complement the write load tool.
This could have been added as a subcommand to the influxdb3 program, but I thought it best to have it separate for now.
It's fairly light on tests and error handling given its an internal tooling CLI. I've added only something very basic to test the line protocol generation and run the actual write command by hand.
I included pretty detailed instructions and some runnable examples.
* refactor: address PR feedback
feat: support the v1 query API
This PR adds support for the `/api/v1/query` API, which is meant to
serve the original InfluxDB v1 query API, to serve single statement
`SELECT` and `SHOW` queries. The response, which is returned as JSON,
can be chunked via the `chunked` and optional `chunk_size` parameters.
An optional `epoch` parameter can be supplied to have `time` column
timestamps converted to a UNIX epoch with the given precision.
## Buffering
The response is buffered by default, but if the `chunked` parameter
is not supplied, or is passed as `false`, then the entire query
result will be buffered into memory before being returned in the
response. This is how the original API behaves, so we are replicating
that here.
When `chunked` is passed as `true`, then the response will be a
stream of chunks, where each chunk is a self-contained response,
with the same structure as that of the non-chunked response. Chunks
are split up by the provided `chunk_size`, or by series, i.e.,
measurement, which ever comes first. The default chunk size is 10,000
rows.
Buffering is implemented with the `QueryResponseStream` and
`ChunkBuffer` types, the former implements the `Stream` trait,
which allows it to be streamed in the HTTP response directly with
`hyper`'s `Body::wrap_stream`. The `QueryResponseStream` is a wrapper
around the inner arrow `RecordBatchStream`, which buffers the
streamed `RecordBatch`es according to the requested chunking parameters.
## Testing
Two new E2E tests were added to test basic query functionality and
chunking behaviour, respectively. In addition, some manual testing
was done to verify that the InfluxDB Grafana plugin works with this
API.
This changes the 'influxdb3 create token' command so that it will just
automatically generate a completely random base64 encoded token prepended with
'apiv3_' that is then fed into a Sha512 algorithm instead of Sha256. The
user can no longer pass in a token to be turned into the proper output.
This also changes the server code to handle the change to Sha512 as well.
Closes#24704
feat: support SHOW RETENTION POLICIES
Added support through the influxdb3 Query Executor to perform
SHOW RETENTION POLICIES queries, both on a specific database as well
as accross all databases.
Test cases were added to check this functionality.
feat: add query_influxql api
This PR adds support for the /api/v3/query_influxql API. This re-uses code from the existing query_sql API, but some refactoring was done to allow for code re-use between the two.
The main change to the original code from the existing query_sql API was that the format is determined up front, in the event that the user provides some incorrect Accept header, so that the 400 BAD REQUEST is returned before performing the query.
Support of several InfluxQL queries that previously required a bridge to be executed in 3.0 was added:
SHOW MEASUREMENTS
SHOW TAG KEYS
SHOW TAG VALUES
SHOW FIELD KEYS
SHOW DATABASES
Handling of qualified measurement names in SELECT queries (see below)
This is accomplished with the newly added iox_query_influxql_rewrite crate, which provides the means to re-write an InfluxQL statement to strip out a database name and retention policy, if provided. Doing so allows the query_influxql API to have the database parameter optional, as it may be provided in the query string.
Handling qualified measurement names in SELECT
The implementation in this PR will inspect all measurements provided in a FROM clause and extract the database (DB) name and retention policy (RP) name (if not the default). If multiple DB/RP's are provided, an error is thrown.
Testing
E2E tests were added for performing basic queries against a running server on both the query_sql and query_influxql APIs. In addition, the test for query_influxql includes some of the InfluxQL-specific queries, e.g., SHOW MEASUREMENTS.
Other Changes
The influxdb3_client now has the api_v3_query_influxql method (and a basic test was added for this)
This commit is a major refactor for the code base. It mainly does four
things:
1. Splits code shared between the internal IOx repository and this one
into it's own repo over at https://github.com/influxdata/influxdb3_core
2. Removes any docs or anything else that did not relate to this project
3. Reorganizes the Cargo.toml files to use the top level Cargo.toml to
declare dependencies and versions to keep all crates in sync and sets
all others to use `<dep>.workspace = true` unless it's an optional
dependency
4. Set the top level Cargo.toml to point to the core crates as git
dependencies
With this any changes specific to Edge will be contained here, updating
deps will be a PR over in `influxdata/influxdb3_core`, and we can prove
out the viability for this model to use for IOx.
A new crate, influxdb3_client, was added, which provides the Client
struct. This gives programmatic access to the influxdb3 HTTP API.
Two primary methods are provided:
- `api_v3_write_lp`
- `api_v3_query_sql`
Each API uses a builder approach to composing the request to be sent.
Response handling was kept somewhat naive, in `write_lp` case not returning
anything, and in `query_sql`, returning raw `Bytes`. We may improve this in
future once the respective APIs have their responses more finalized.
Both methods, as well as all associated types are documented with rustdocs.
The general approach to these methods was to use a builder style API so that
the user of the client can build their requests functionally before sending them
to the server.
* chore(deps): Update arrow and datafusion to 49.0.0
This commit copies in our dependency code from influxdb_iox in order for
us to be able to upgrade from a forked version of 46.0.0 to 49.0.0 of
both arrow and datafusion. Most of the important changes were around how
we consumed the crates in influxdb3(_server/_write). Those diffs are
particularly worth looking at as the rest was a straight copy and we
don't touch those crates in our development currently for influxdb3
edge.
* fix: regenerate workspace hack crate
* fix: Protobuf issues with incompatibility labels
* fix: Broken CI yaml
* fix: buf version
* fix: Only check IOx repo
* fix: Remove protobuf lint
* fix: Comment out call to protobuf-lint
* WIP: basic influxdb3 command and http server
* WIP: write lp, buffer, query out
* WIP: test write & query on influxdb3_server, fix warnings
* WIP: pull write buffer and catalog into separate crate
* WIP: sketch out types used for write: buffer, wal, persister
* WIP: remove a bunch of old IOx stuff and fmt
* chore: Update DataFusion pin again
* chore: update for different type
* fix: statistics
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update DataFusion pin
* chore: Update for new API
* fix: fix test
* fix: only check error messages
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update DataFusion pin
* chore: Update for new API
* fix: Update for API
* fix: update compactor test
* fix: Update to patched version of arrow 46.0.0
* fix: map `DataFusionError::Configuration` to an internal error
* fix: do not use deprecated API
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Adds a crate that layers compaction-specific gossip types and
abstractions over the underlying gossip transport for a nicer (and
decoupled!) internal API.
Adds basic structure for #8349. This will be filled in using separate
PRs for easier review.
The layer structure was chosen to simplify testing and allow composition
of features (like retries, circuit breaking, metrics, etc.). In contrast
to the V1 client (`querier::ingester`) a client here addresses exactly 1
ingester, not multiple (via an `addr` parameter). The tracking around
mutiple states in the V1 version is not really nice and overly
complicated.
Adds a reusable "gossip_parquet_file" crate that provides a use-case
specific wrapper over the underlying gossip transport.
This crate deals with the encoding and decoding of parquet gossip
messages, handling them off to the application, and decoupling latency
of handlers from the gossip reactor.
Adds a new gossip_schema crate that provides a high-level interface to
schema change notifications.
This crate layers schema-specific interfaces over the existing low-level
gossip crate. Users can obtain best-effort schema change notifications
by implementing a SchemaEventHandler delegate given to a SchemaRx, or
efficiently dispatch schema change notifications to listening peers
using a SchemaTx.
Schema notifications are sent over the Topic::SchemaChanges topic
(ID=1), which the caller must register as an interest on receiving
gossip nodes.
Define new proto for the structure that gets sent from router to ingester and persisted in the ingester WAL.
Create ingest_structure crate with functions to convert from line protocol to new proto structure while validating schema.
Add function to convert new proto structure to RecordBatch.
* chore: Update datafusion pin
* fix: Update for change in API
* chore: Update plan
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update datafusion to get new grouping
* chore: Update for new API
* chore: update tests
* fix: new API
* fix: state type
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update datafusion + arrow/arrow-flight/parquet to version `42.0.0`
* chore: Update for new APIs
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: make compactor_scheduler crate
* refactor: move PartitionsSource into the compactor_scheduler
The compactor currently uses PartitionsSource in two ways:
* for the preparation of PartitionIds prior to the compactor pipeline.
* for the abstraction which utilize the PartitionIds during the IO pipeline.
This commit is a refactoring to enable us to delineate between these two utilizations.
The former (preparation) utilization will now be done in the compactor_scheduler.
Since the compactor is dependent on the compactor_scheduler, it made sense to move the trait to the scheduler.
* chore: Update DataFusion pin
* chore: Update API changes
* chore: Don't use deprecated API
* chore: Run cargo hakari tasks
* chore: Update tests due to changes in logical plan nodes from DF update
* chore: Fix broken links in docs
* chore: Adjust changes to expected output
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>