This adds an in memory Parquet cache to the WriteBuffer. With this we
now have a cache that Parquet files will be queried from when a query
does come in. Note this change *does not* actually let us persist any
data. This merely adds the cache. Future changes will add the ability
to cache the data as well as the logic around what should be cached.
As this doesn't allow any data to be cached or queried a test has not
been added at this time, but will in future PRs.
* refactor: make end common to load generatino tool
Made the --end argument common to both the query and write load generation
runners.
A panic message was also added in the table buffer where unwraps were
causing panics
* refactor: load gen print statements for consistency
* refactor: query/write load gen arg interface
Refactored the argument interface for the query and write load gen
commands to make them easier to unify in a new `full` command.
In summary:
- remove the query sampling interval
- make short-form of --querier-count 'q' instead of 'Q'
- remove the short-form for --query-format
- remove --spec-path in favour of --querier-spec and --writer-spec
for specifying spec path of the `query` and `write` loads, resp.
* feat: produce error on 0s sampling interval
* refactor: split out query/write command configs
Refactored the query and write command clap configurations to make
them composable for the full command
* refactor: expose query and write runner for composability
Refactored the query and write runners so that they can be
composed into the full runner.
* feat: add the full load generator sub-command
Implement a new sub-command for the load generator: full
This runs both the query and write loads simultaneously, and exposes
the unified CLI of the two commands, respectively.
* chore: cargo update to fix audit
When persisting parquet files we now will sort and dedupe on persist using the
COMPACT operation implemented in IOx Query. Note that right now we don't choose
any column to sort on and default to no column. This means that we dedupe and
sort on whatever the default behavior is for the COMPACT operation. Future
changes can figure out what columns to sort by when compacting the data.
* feat: report system stats in load generator
Added the mechanism to report system stats during load generation. The
following stats are saved in a CSV file:
- cpu_usage
- disk_written_bytes
- disk_read_bytes
- memory
- virtual_memory
This only works when running the load generator against a local instance
of influxdb3, i.e., one that is running on your machine.
Generating system stats is done by passing the --system-stats flag to the
load generator.
* feat: add new clap args for results gen
Added the results_dir and configuration_name args
to the common load generator config which will be
used in generating the results directory structure.
* feat: load gen results directory structure
Write and query load generation runners will now setup files in a
results directory, using a specific structure. Users of the load tool
can specify a `results_dir` to save these results, or the tool will
pick a `results` folder in the current directory, by default.
Results will be saved in files using the following path convention:
results/<s>/<c>/<write|query|system>_<time>.csv
- <s>: spec name
- <c>: configuration name, specified by user with the `config-name`
arg, or by default, will use the revision SHA of the running server
- <write|query|system>: which kind of results file
- <time>: a timestamp in the form 'YYYY-MM-DD-HH-MM'
The setup code was unified for both write and query commands, in
preparation for the creation of a system stats file, as well as for
the capability to run both query and write at the same time, however,
those remain unimplemented as of this commit.
* feat: /ping API support on influxdb3_client::Client
* feat: /ping API to serve version
The /ping API was added, which is served at GET and
POST methods. The API responds with a JSON body
containing the version and revision of the build.
A new crate was added, influxdb3_process, which
takes the process_info.rs module from the influxdb3
crate, and puts it in a separate crate so that other
crates (influxdb3_server) can depend on it. This was
needed in order to have access to the version and
revision values, which are generated at build time,
in the HTTP API code of influxdb3_server.
A E2E test was added to check that /ping works.
E2E TestServer can now have logs emitted using the
TEST_LOG environment variable.
* refactor: Buffer to use Arrow builders
This refactors the TableBuffer to use the Arrow builders for the data. This also removes cloning from the table buffer in favor of yielding record batches. This is part of a test to see if querying the buffer will be faster with this method avoiding a bunch of data copies.
* fix: adding columns when data is in buffer
This fixes a bug where the Arrow schema in the Catalog wouldn't get updated when columns are added to a table. Also fixes bug in the buffer where a new column wouldn't have the correct number of rows in it (now fixed by adding in nulls for previous rows).
* refactor: PR feedback in buffer_segment
Implement the query load generator. The design follows that of the existing write load generator.
A QuerySpec is defined that will be used by the query command to generate a set of queriers to perform queries against a running server in parallel.
When running influxdb3 we did not have a default log level. As a result we
couldn't even see if the program was even running. This change provides a
default level unless a user supplied one is given.
feat: support v1 and v2 write APIs
This adds support for two APIs: /write and /api/v2/write. These implement the v1 and v2 write APIs, respectively. In general, the difference between these and the new /api/v3/write_lp API is in the request parsing. We leverage the WriteRequestUnifier trait from influxdb3_core to handle parsing of v1 and v2 HTTP requests, to keep the error handling at that level consistent with distributed versions of InfluxDB 3.0. Specifically, we use the SingleTenantRequestUnifier implementation of the trait.
Changes:
- Addition of two new routes to the route_request method in influxdb3_server::http to serve /write and /api/v2/write requests.
- Database name validation was updated to handle cases where retention policies may be passed in /write requests, and to also reject empty names. A unit test was added to verify the validate_db_name function.
- HTTP request authorization in the router will extract the full Authorization header value, and store it in the request extensions; this is used in the write request parsing from the core iox_http crate to authorize write requests.
- E2E tests to verify correct HTTP request parsing / response behaviour for both /write and /api/v2/write APIs
- E2E tests to check that data sent in through /write and /api/v2/write can be queried back
feat: add _series_id to tables on write
New _series_id column is added to tables; this stores a 32 byte SHA256 hash of the tag set of a line of Line Protocol. The tag set is checked for sort order, then sorted if not already, before producing the hash.
Unit tests were added to check hashing and sorting functions work.
Tests that performed queries needed to be modified to account for the new _series_id column; in general, SELECT * queries were altered to use a select clause with specific column names.
The Column limit was increased to 501 internally, to account for the new _series_id column, but the user-facing limit is still 500
Fixes a bug where the loader would error out if there was a wal segment file for a previous segment that hand't been persisted, and a new wal file had to be created for the new open segment. This would show up as an error if you started the server and then stopped and restarted it without writing any data.
When a write comes into the buffer that both updates the catalog and creates a new segment, it would create that segment with a catalog sequence number that matched what happened after the catalog modification. The result is that when the segment is persisted, the catalog won't be persisted because it wasn't being viewed as being updated. This fixes that.
* feat: initial load generator implementation
This adds a load generator as a new crate. Initially it only generates write load, but the scaffolding is there to add a query load generator to complement the write load tool.
This could have been added as a subcommand to the influxdb3 program, but I thought it best to have it separate for now.
It's fairly light on tests and error handling given its an internal tooling CLI. I've added only something very basic to test the line protocol generation and run the actual write command by hand.
I included pretty detailed instructions and some runnable examples.
* refactor: address PR feedback
feat: add query parameter support to influxdb3 client
This adds the ability to use parameterized queries in the influxdb3_client crate
when calling the /api/v3/query_sql and /api/v3/query_influxql APIs.
The QueryRequestBuilder now has two new methods: with_param and
with_try_param, that allow binding of parameters to a query being made.
Tests were added in influxdb3_client to verify their usage with both sql and
influxql query APIs.
feat: support query parameters
This adds support for parameters in the /api/v3/query_sql
and /api/v3/query_influxql API
The new parameter `params` is supported in the URL query string
of a GET request, or in the JSON body of a POST request.
Two new E2E tests were added to check successful GET/POST as well
as error scenario when params are not provided for a query string
that would expect them.
* chore: Update to Rust 1.77.0
This is a fairly quiet upgrade. The only changes are some lints around
`OpenOptions` that were added to clippy between 1.75 and this version
and they're small changes that either remove unecessary function calls
or add a needed function call.
* fix: cargo-deny by using the --locked flag
feat: support the v1 query API
This PR adds support for the `/api/v1/query` API, which is meant to
serve the original InfluxDB v1 query API, to serve single statement
`SELECT` and `SHOW` queries. The response, which is returned as JSON,
can be chunked via the `chunked` and optional `chunk_size` parameters.
An optional `epoch` parameter can be supplied to have `time` column
timestamps converted to a UNIX epoch with the given precision.
## Buffering
The response is buffered by default, but if the `chunked` parameter
is not supplied, or is passed as `false`, then the entire query
result will be buffered into memory before being returned in the
response. This is how the original API behaves, so we are replicating
that here.
When `chunked` is passed as `true`, then the response will be a
stream of chunks, where each chunk is a self-contained response,
with the same structure as that of the non-chunked response. Chunks
are split up by the provided `chunk_size`, or by series, i.e.,
measurement, which ever comes first. The default chunk size is 10,000
rows.
Buffering is implemented with the `QueryResponseStream` and
`ChunkBuffer` types, the former implements the `Stream` trait,
which allows it to be streamed in the HTTP response directly with
`hyper`'s `Body::wrap_stream`. The `QueryResponseStream` is a wrapper
around the inner arrow `RecordBatchStream`, which buffers the
streamed `RecordBatch`es according to the requested chunking parameters.
## Testing
Two new E2E tests were added to test basic query functionality and
chunking behaviour, respectively. In addition, some manual testing
was done to verify that the InfluxDB Grafana plugin works with this
API.
This commit re-enables the limits test after making a fix that has it
run <1 second on my laptop vs the old behavior of >=30 seconds. It does
so by constructing one single write_lp request to create 1995 tables
rather than 1995 individual requests that make a table. This is far more
efficient.
feat: support authenticating v1 APIs with p parameter
The p URL query parameter can be used to authenticate requests
to the /api/v1/query and /api/v1/write APIs
A test was added to ensure this works
* feat: wire up query from parquet files
This adds the functionality to query from Parquet files that have been persisted in object storage. Any segments that are loaded up on boot up will be included (limit of 1k segments at the time of this PR). In a follow on PR we should add a good end-to-end test that has persistence and query through the main API (might be tricky).
* Move BufferChunk and ParquetChunk into chunk module
* Add object_store_url to Persister
* Register object_store on server startup
* Add loaded persisted_segments to SegmentState
* refactor: PR feedback
This implements automatic segment persistence and cleanup of the WAL files. Every second the write buffer checks to and persists segments that have been open for longer than half the segment duration and that are not in the current or next block of time.
One thing left to do is to deal with blocks of time that have had multiple segments persisted in them. This will be addressed in a follow on PR.
Specific udpates:
* Update Persister persist_segment to take borrow
* Move SegmentState into its own module
* Create functions to close open segments and persist them when time
* Add tokio task to check every second to see if segments should be persisted
* Split WriteBuffer into segments
* Add SegmentRange and SegmentDuration
* Update WAL to store SegmentRange and to be able to open up multiple ranges
* Remove Partitioner and PartitionBuffer
* Update SegmentState and loader
* Update SegmentState with current, next and outside
* Update loader and tests to load up current, next and previous outside segments based on the passed in time and desired segment duration
* Update WriteBufferImpl and Flusher
* Update the flusher to flush to multiple segments
* Update WriteBufferImpl to split data into segments getting written to
* Update HTTP and WriteBuffer to use TimeProvider
* Wire up outside segment writes and loading
* Data outside current and next no longer go to a single segment, but to a segment based on that data's time. Limits to 100 segments of time that can be written to at any given time.
* Refactor SegmentDuration add config option
* Refactors SegmentDuration to be a new type over duration
* Adds the clap block configuration to pass SegmentDuration, defaulting to 1h
* refactor: SegmentState and loader
* remove the current_segment and next_segment from the loader and segment state, instead having just a collection of segments
* open up only the current_segment by default
* keep current and next segments open if they exist, while others go into persisting or persisted
* fix: cargo audit
* refactor: fixup PR feedback
* feat: add `Authorizer` impls to authz REST and gRPC
This adds two new Authorizer implementations to Edge: Default and
AllOrNothing, which will provide the two auth options for Edge.
Both gRPC requests and HTTP REST request will be authorized by
the same Authorizer implementation.
The SHA512 digest action was moved into the `Authorizer` impl.
* feat: add `ServerBuilder` to construct `Server
A builder was added to the Server in this commit, as part of an
attempt to get the server creation to be more modular.
* refactor: use test server fixture in auth e2e test
Refactored the `auth` integration test in `influxdb3` to use the
`TestServer` fixture; part of this involved extending the fixture
to be configurable, so that the `TestServer` can be spun up with
an auth token.
* test: add test for authorized gRPC
A new end-to-end test, auth_grpc, was added to check that
authorization is working with the influxdb3 Flight service.
This changes the 'influxdb3 create token' command so that it will just
automatically generate a completely random base64 encoded token prepended with
'apiv3_' that is then fed into a Sha512 algorithm instead of Sha256. The
user can no longer pass in a token to be turned into the proper output.
This also changes the server code to handle the change to Sha512 as well.
Closes#24704
feat: support SHOW RETENTION POLICIES
Added support through the influxdb3 Query Executor to perform
SHOW RETENTION POLICIES queries, both on a specific database as well
as accross all databases.
Test cases were added to check this functionality.
Extended the InfluxQL rewriter to handle SELECT statements with nested
sub-queries, as well as EXPLAIN statements.
Tests were added to check all the rewrite cases for happy path and
failure modes.
In order for Edge to support other object stores besides the local file
system we just needed to turn on the features in clap_blocks which
handles all of the configuration needed to create an `Arc<dyn ObjectStore>`
for us. We already were calling it's `make_object_store` function that
did this and so it's a simple switch flip to turn it on.
Closes: #24553
This commit is the final piece for the write_lp endpoint. It adds limits
to Edge such that:
- There can only be 5 Databases
- There can only be 500 Columns per Table
- There can only be 2000 Tables across all Databases
We do this by modifying the catalog code to error out whenever one of
these limits would be exceeded before permanently modifying the schema.
These are hard coded limits and cannot be configured by the user.
Closes#24554
feat: add query_influxql api
This PR adds support for the /api/v3/query_influxql API. This re-uses code from the existing query_sql API, but some refactoring was done to allow for code re-use between the two.
The main change to the original code from the existing query_sql API was that the format is determined up front, in the event that the user provides some incorrect Accept header, so that the 400 BAD REQUEST is returned before performing the query.
Support of several InfluxQL queries that previously required a bridge to be executed in 3.0 was added:
SHOW MEASUREMENTS
SHOW TAG KEYS
SHOW TAG VALUES
SHOW FIELD KEYS
SHOW DATABASES
Handling of qualified measurement names in SELECT queries (see below)
This is accomplished with the newly added iox_query_influxql_rewrite crate, which provides the means to re-write an InfluxQL statement to strip out a database name and retention policy, if provided. Doing so allows the query_influxql API to have the database parameter optional, as it may be provided in the query string.
Handling qualified measurement names in SELECT
The implementation in this PR will inspect all measurements provided in a FROM clause and extract the database (DB) name and retention policy (RP) name (if not the default). If multiple DB/RP's are provided, an error is thrown.
Testing
E2E tests were added for performing basic queries against a running server on both the query_sql and query_influxql APIs. In addition, the test for query_influxql includes some of the InfluxQL-specific queries, e.g., SHOW MEASUREMENTS.
Other Changes
The influxdb3_client now has the api_v3_query_influxql method (and a basic test was added for this)
This commit is a major refactor for the code base. It mainly does four
things:
1. Splits code shared between the internal IOx repository and this one
into it's own repo over at https://github.com/influxdata/influxdb3_core
2. Removes any docs or anything else that did not relate to this project
3. Reorganizes the Cargo.toml files to use the top level Cargo.toml to
declare dependencies and versions to keep all crates in sync and sets
all others to use `<dep>.workspace = true` unless it's an optional
dependency
4. Set the top level Cargo.toml to point to the core crates as git
dependencies
With this any changes specific to Edge will be contained here, updating
deps will be a PR over in `influxdata/influxdb3_core`, and we can prove
out the viability for this model to use for IOx.
* fix: persister loading with no segments
Fixes a bug where the persister would throw an error if attempting to load segments when none had been persisted.
Moved persister tests into tests block.
* feat: implement loader for persisted state
This implements a loader for the write buffer. It loads the catalog and the buffer from the WAL.
Move Persister errors into their own type now that the write buffer load could return errors from the persister.
This doesn't yet rotate segments or trigger persistence of newly closed segments, which will be addressed in a future PR.
* fix: cargo update to fix audit
* refactor: add error type to persister trait
* refactor: use generics instead of dyn
---------
Co-authored-by: Trevor Hilton <thilton@influxdata.com>
* feat: Add partial write and name check to write_lp
This commit adds new behavior to the v3 write_lp http endpoint by
implementing both partial writes and checking the db name for validity.
It also sets the partial write behavior as the default now, whereas
before we would reject the entire request if one line was incorrect.
Users who *do* actually want that behavior can now opt in by putting
'accept_partial=false' into the url of the request.
We also check that the db name used in the request contains only
numbers, letters, underscores and hyphens and that it must start with
either a number or letter.
We also introduce a more standardized way to return errors to the user
as JSON that we can expand over time to give actionable error messages
to the user that they can use to fix their requests.
Finally tests have been included to mock out and test the behavior for
all of the above so that changes to the error messages are reflected in
tests, that both partial and not partial writes work as expected, and
that invalid db names are rejected without writing.
* feat: Add precision to write_lp http endpoint
This commit adds the ability to control the precision of the time stamp
passed in to the endpoint. For example if a user chooses 'second' and
the timestamp 20 that will be 20 seconds past the Unix Epoch. If they
choose 'millisecond' instead it will be 20 milliseconds past the Epoch.
Up to this point we assumed that all data passed in was of nanosecond
precision. The data is still stored in the database as nanoseconds.
Instead upon receiving the data we convert it to nanoseconds. If the
precision URL parameter is not specified we default to auto and take a
best effort guess at what the user wanted based on the order of
magnitude of the data passed in.
This change will allow users finer grained control over what precision
they want to use for their data as well as trying our best to make a
good user experience and having things work as expected and not creating
a failure mode whereby a user wanted seconds and instead put in
nanoseconds by default.