This commit updates us to rustc 1.80. There are three significant changes
here:
1. LazyLock and LazyCell have been stabilized meaning we can replace our
usage of Lazy from the once_cell crate with the std lib versions
2. Lints were added to handle unknown cfg directives. `tokio_unstable`
is affected by this and while we do have the flags in our
.cargo/config.toml Cargo still output a lint for it so we supress
that warning now in our Cargo.toml for the workspace
3. clippy now throws a new warning about priority levels for lints. It's
quite frankly a thing that doesn't make sense to me and should be
something cargo fixes, but here we are.
Besides that it was a painless upgrade and now we're on the latest and
greatest.
Part of #25067
Changes in this PR:
Addition of a PROFILING.md file, which briefly outlines how to build the influxdb3 binary in preparation for profiling and explains usage of macOS's Instruments tool
Addition of a quick-bench profile, which extends the already existing quick-release profile with debuginfo turned on
Closes#25096
- Adds a new HTTP API that allows the creation of a last cache, see the issue for details
- An E2E test was added to check success/failure behaviour of the API
- Adds the mime crate, for parsing request MIME types, but this is only used in the code I added - we may adopt it in other APIs / parts of the HTTP server in future PRs
* feat: base for last cache implementation
Each last cache holds a ring buffer for each column in an index map, which
preserves the insertion order for faster record batch production.
The ring buffer uses a custom type to handle the different supported
data types that we can have in the system.
* feat: implement last cache provider
LastCacheProvider is the API used to create last caches and write
table batches to them. It uses a two-layer RwLock/HashMap: the first for
the database, and the second layer for the table within the database.
This allows for table-level locks when writing in buffered data, and only
gets a database-level lock when creating a cache (and in future, when
removing them as well).
* test: APIs on write buffer and test for last cache
Added basic APIs on the write buffer to access the last cache and then a
test to the last_cache module to see that it works with a simple example
* docs: add some doc comments to last_cache
* chore: clippy
* chore: one small comment on IndexMap
* chore: clean up some stale comments
* refactor: part of PR feedback
Addressed three parts of PR feedback:
1. Remove double-lock on cache map
2. Re-order the get when writing to the cache to be outside the loop
3. Move the time check into the cache itself
* refactor: nest cache by key columns
This refactors the last cache to use a nested caching structure, where
the key columns for a given cache are used to create a hierarchy of
nested maps, terminating in the actual store for the values in the cache.
Access to the cache is done via a set of predicates which can optionally
specify the key column values at any level in the cache hierarchy to only
gather record batches from children of that node in the cache.
Some todos:
- Need to handle the TTL
- Need to move the TableProvider impl up to the LastCache type
* refactor: TableProvider impl to LastCache
This re-writes the datafusion TableProvider implementation on the correct
type, i.e., the LastCache, and adds conversion from the filter Expr's to
the Predicate type for the cache.
* feat: support TTL in last cache
Last caches will have expired entries walked when writes come in.
* refactor: add panic when unexpected predicate used
* refactor: small naming convention change
* refactor: include keys in query results and no null keys
Changed key columns so that they do not accept null values, i.e., rows
that are pushed that are missing key column values will be ignored.
When producing record batches for a cache, if not all key columns are
used in the predicate, then this change makes it so that the non-predicate
key columns are produced as columns in the outputted record batches.
A test with a few cases showing this was added.
* fix: last cache key column query output
Ensure key columns in the last cache that are not included in the
predicate are emitted in the RecordBatches as a column.
Cleaned up and added comments to the new test.
* chore: clippy and some un-needed code
* fix: clean up some logic errors in last_cache
* test: add tests for non default cache size and TTL
Added two tests, as per commit title. Also moved the eviction process
to a separate function so that it was not being done on every write to
the cache, which could be expensive, and this ensures that entries are
evicted regardless of whether writes are coming in or not.
* test: add invalid predicate test cases to last_cache
* test: last_cache with field key columns
* test: last_cache uses series key for default keys
* test: last_cache uses tag set as default keys
* docs: add doc comments to last_cache
* fix: logic error in last cache creation
CacheAlreadyExists errors were only being based on the database and
table names, and not including the cache names, which was not
correct.
* docs: add some comments to last cache create fn
* feat: support null values in last cache
This also adds explicit support for series key columns to distinguish
them from normal tags in terms of nullability
A test was added to check nulls work
* fix: reset last cache last time when ttl evicts all data
Introduce the experimental series key feature to monolith, along with the new `/api/v3/write` API which accepts the new line protocol to write to tables containing a series key.
Series key
* The series key is supported in the `schema::Schema` type by the addition of a metadata entry that stores the series key members in their correct order. Writes that are received to `v3` tables must have the same series key for every single write.
Series key columns are `NOT NULL`
* Nullability of columns is enforced in the core `schema` crate based on a column's membership in the series key. So, when building a `schema::Schema` using `schema::SchemaBuilder`, the arrow `Field`s that are injected into the schema will have `nullable` set to false for columns that are part of the series key, as well as the `time` column.
* The `NOT NULL` _constraint_, if you can call it that, is enforced in the buffer (see [here](https://github.com/influxdata/influxdb/pull/25066/files#diff-d70ef3dece149f3742ff6e164af17f6601c5a7818e31b0e3b27c3f83dcd7f199R102-R119)) by ensuring there are no gaps in data buffered for series key columns.
Series key columns are still tags
* Columns in the series key are annotated as tags in the arrow schema, which for now means that they are stored as Dictionaries. This was done to avoid having to support a new column type for series key columns.
New write API
* This PR introduces the new write API, `/api/v3/write`, which accepts the new `v3` line protocol. Currently, the only part of the new line protocol proposed in https://github.com/influxdata/influxdb/issues/24979 that is supported is the series key. New data types are not yet supported for fields.
Split write paths
* To support the existing write path alongside the new write path, a new module was set up to perform validation in the `influxdb3_write` crate (`write_buffer/validator.rs`). This re-uses the existing write validation logic, and replicates it with needed changes for the new API. I refactored the validation code to use a state machine over a series of nested function calls to help distinguish the fallible validation/update steps from the infallible conversion steps.
* The code in that module could potentially be refactored to reduce code duplication.
Remove reliance on data_types::ColumnType
Introduce TableSnapshot for serializing table information in the catalog.
Remove the columns BTree from the TableDefinition an use the schema
directly. BTrees are still used to ensure column ordering when tables are
created, or columns added to existing tables.
The custom Deserialize impl on TableDefinition used to block duplicate
column definitions in the serialized data. This preserves that bevaviour
using serde_with and extends it to the other types in the catalog, namely
InnerCatalog and DatabaseSchema.
The serialization test for the catalog was extended to include multiple
tables in a database and multiple columns spanning the range of available
types in each table.
Snapshot testing was introduced using the insta crate to check the
serialized JSON form of the catalog, and help catch breaking changes
when introducing features to the catalog.
Added a test that verifies the no-duplicate key rules when deserializing
the map components in the Catalog
Introduction of the `TokioDatafusionConfig` clap block for configuring the DataFusion runtime - this exposes many new `--datafusion-*` options on start, including `--datafusion-num-threads`
To accommodate renaming of `QueryNamespaceProvider` to `QueryDatabase` in `influxdb3_core`, I renamed the `QueryDatabase` type to `Database`.
Fixed tests that broke as a result of sync.
For releases we need to have Docker images and binary images available for the
user to actually run influxdb3. These CI changes will build the binaries on a
release tag and the Docker image as well, test, sign, and publish them and make
them available for download.
Co-Authored-By: Brandon Pfeifer <bpfeifer@influxdata.com>
* feat: report system stats in load generator
Added the mechanism to report system stats during load generation. The
following stats are saved in a CSV file:
- cpu_usage
- disk_written_bytes
- disk_read_bytes
- memory
- virtual_memory
This only works when running the load generator against a local instance
of influxdb3, i.e., one that is running on your machine.
Generating system stats is done by passing the --system-stats flag to the
load generator.
* feat: /ping API to serve version
The /ping API was added, which is served at GET and
POST methods. The API responds with a JSON body
containing the version and revision of the build.
A new crate was added, influxdb3_process, which
takes the process_info.rs module from the influxdb3
crate, and puts it in a separate crate so that other
crates (influxdb3_server) can depend on it. This was
needed in order to have access to the version and
revision values, which are generated at build time,
in the HTTP API code of influxdb3_server.
A E2E test was added to check that /ping works.
E2E TestServer can now have logs emitted using the
TEST_LOG environment variable.
* feat: initial load generator implementation
This adds a load generator as a new crate. Initially it only generates write load, but the scaffolding is there to add a query load generator to complement the write load tool.
This could have been added as a subcommand to the influxdb3 program, but I thought it best to have it separate for now.
It's fairly light on tests and error handling given its an internal tooling CLI. I've added only something very basic to test the line protocol generation and run the actual write command by hand.
I included pretty detailed instructions and some runnable examples.
* refactor: address PR feedback
feat: support the v1 query API
This PR adds support for the `/api/v1/query` API, which is meant to
serve the original InfluxDB v1 query API, to serve single statement
`SELECT` and `SHOW` queries. The response, which is returned as JSON,
can be chunked via the `chunked` and optional `chunk_size` parameters.
An optional `epoch` parameter can be supplied to have `time` column
timestamps converted to a UNIX epoch with the given precision.
## Buffering
The response is buffered by default, but if the `chunked` parameter
is not supplied, or is passed as `false`, then the entire query
result will be buffered into memory before being returned in the
response. This is how the original API behaves, so we are replicating
that here.
When `chunked` is passed as `true`, then the response will be a
stream of chunks, where each chunk is a self-contained response,
with the same structure as that of the non-chunked response. Chunks
are split up by the provided `chunk_size`, or by series, i.e.,
measurement, which ever comes first. The default chunk size is 10,000
rows.
Buffering is implemented with the `QueryResponseStream` and
`ChunkBuffer` types, the former implements the `Stream` trait,
which allows it to be streamed in the HTTP response directly with
`hyper`'s `Body::wrap_stream`. The `QueryResponseStream` is a wrapper
around the inner arrow `RecordBatchStream`, which buffers the
streamed `RecordBatch`es according to the requested chunking parameters.
## Testing
Two new E2E tests were added to test basic query functionality and
chunking behaviour, respectively. In addition, some manual testing
was done to verify that the InfluxDB Grafana plugin works with this
API.
This changes the 'influxdb3 create token' command so that it will just
automatically generate a completely random base64 encoded token prepended with
'apiv3_' that is then fed into a Sha512 algorithm instead of Sha256. The
user can no longer pass in a token to be turned into the proper output.
This also changes the server code to handle the change to Sha512 as well.
Closes#24704
feat: support SHOW RETENTION POLICIES
Added support through the influxdb3 Query Executor to perform
SHOW RETENTION POLICIES queries, both on a specific database as well
as accross all databases.
Test cases were added to check this functionality.
feat: add query_influxql api
This PR adds support for the /api/v3/query_influxql API. This re-uses code from the existing query_sql API, but some refactoring was done to allow for code re-use between the two.
The main change to the original code from the existing query_sql API was that the format is determined up front, in the event that the user provides some incorrect Accept header, so that the 400 BAD REQUEST is returned before performing the query.
Support of several InfluxQL queries that previously required a bridge to be executed in 3.0 was added:
SHOW MEASUREMENTS
SHOW TAG KEYS
SHOW TAG VALUES
SHOW FIELD KEYS
SHOW DATABASES
Handling of qualified measurement names in SELECT queries (see below)
This is accomplished with the newly added iox_query_influxql_rewrite crate, which provides the means to re-write an InfluxQL statement to strip out a database name and retention policy, if provided. Doing so allows the query_influxql API to have the database parameter optional, as it may be provided in the query string.
Handling qualified measurement names in SELECT
The implementation in this PR will inspect all measurements provided in a FROM clause and extract the database (DB) name and retention policy (RP) name (if not the default). If multiple DB/RP's are provided, an error is thrown.
Testing
E2E tests were added for performing basic queries against a running server on both the query_sql and query_influxql APIs. In addition, the test for query_influxql includes some of the InfluxQL-specific queries, e.g., SHOW MEASUREMENTS.
Other Changes
The influxdb3_client now has the api_v3_query_influxql method (and a basic test was added for this)
This commit is a major refactor for the code base. It mainly does four
things:
1. Splits code shared between the internal IOx repository and this one
into it's own repo over at https://github.com/influxdata/influxdb3_core
2. Removes any docs or anything else that did not relate to this project
3. Reorganizes the Cargo.toml files to use the top level Cargo.toml to
declare dependencies and versions to keep all crates in sync and sets
all others to use `<dep>.workspace = true` unless it's an optional
dependency
4. Set the top level Cargo.toml to point to the core crates as git
dependencies
With this any changes specific to Edge will be contained here, updating
deps will be a PR over in `influxdata/influxdb3_core`, and we can prove
out the viability for this model to use for IOx.
A new crate, influxdb3_client, was added, which provides the Client
struct. This gives programmatic access to the influxdb3 HTTP API.
Two primary methods are provided:
- `api_v3_write_lp`
- `api_v3_query_sql`
Each API uses a builder approach to composing the request to be sent.
Response handling was kept somewhat naive, in `write_lp` case not returning
anything, and in `query_sql`, returning raw `Bytes`. We may improve this in
future once the respective APIs have their responses more finalized.
Both methods, as well as all associated types are documented with rustdocs.
The general approach to these methods was to use a builder style API so that
the user of the client can build their requests functionally before sending them
to the server.
* chore(deps): Update arrow and datafusion to 49.0.0
This commit copies in our dependency code from influxdb_iox in order for
us to be able to upgrade from a forked version of 46.0.0 to 49.0.0 of
both arrow and datafusion. Most of the important changes were around how
we consumed the crates in influxdb3(_server/_write). Those diffs are
particularly worth looking at as the rest was a straight copy and we
don't touch those crates in our development currently for influxdb3
edge.
* fix: regenerate workspace hack crate
* fix: Protobuf issues with incompatibility labels
* fix: Broken CI yaml
* fix: buf version
* fix: Only check IOx repo
* fix: Remove protobuf lint
* fix: Comment out call to protobuf-lint
* WIP: basic influxdb3 command and http server
* WIP: write lp, buffer, query out
* WIP: test write & query on influxdb3_server, fix warnings
* WIP: pull write buffer and catalog into separate crate
* WIP: sketch out types used for write: buffer, wal, persister
* WIP: remove a bunch of old IOx stuff and fmt
* chore: Update DataFusion pin again
* chore: update for different type
* fix: statistics
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update DataFusion pin
* chore: Update for new API
* fix: fix test
* fix: only check error messages
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update DataFusion pin
* chore: Update for new API
* fix: Update for API
* fix: update compactor test
* fix: Update to patched version of arrow 46.0.0
* fix: map `DataFusionError::Configuration` to an internal error
* fix: do not use deprecated API
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Adds a crate that layers compaction-specific gossip types and
abstractions over the underlying gossip transport for a nicer (and
decoupled!) internal API.
Adds basic structure for #8349. This will be filled in using separate
PRs for easier review.
The layer structure was chosen to simplify testing and allow composition
of features (like retries, circuit breaking, metrics, etc.). In contrast
to the V1 client (`querier::ingester`) a client here addresses exactly 1
ingester, not multiple (via an `addr` parameter). The tracking around
mutiple states in the V1 version is not really nice and overly
complicated.
Adds a reusable "gossip_parquet_file" crate that provides a use-case
specific wrapper over the underlying gossip transport.
This crate deals with the encoding and decoding of parquet gossip
messages, handling them off to the application, and decoupling latency
of handlers from the gossip reactor.
Adds a new gossip_schema crate that provides a high-level interface to
schema change notifications.
This crate layers schema-specific interfaces over the existing low-level
gossip crate. Users can obtain best-effort schema change notifications
by implementing a SchemaEventHandler delegate given to a SchemaRx, or
efficiently dispatch schema change notifications to listening peers
using a SchemaTx.
Schema notifications are sent over the Topic::SchemaChanges topic
(ID=1), which the caller must register as an interest on receiving
gossip nodes.
Define new proto for the structure that gets sent from router to ingester and persisted in the ingester WAL.
Create ingest_structure crate with functions to convert from line protocol to new proto structure while validating schema.
Add function to convert new proto structure to RecordBatch.
* chore: Update datafusion pin
* fix: Update for change in API
* chore: Update plan
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update datafusion to get new grouping
* chore: Update for new API
* chore: update tests
* fix: new API
* fix: state type
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update datafusion + arrow/arrow-flight/parquet to version `42.0.0`
* chore: Update for new APIs
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: make compactor_scheduler crate
* refactor: move PartitionsSource into the compactor_scheduler
The compactor currently uses PartitionsSource in two ways:
* for the preparation of PartitionIds prior to the compactor pipeline.
* for the abstraction which utilize the PartitionIds during the IO pipeline.
This commit is a refactoring to enable us to delineate between these two utilizations.
The former (preparation) utilization will now be done in the compactor_scheduler.
Since the compactor is dependent on the compactor_scheduler, it made sense to move the trait to the scheduler.
* chore: Update DataFusion pin
* chore: Update API changes
* chore: Don't use deprecated API
* chore: Run cargo hakari tasks
* chore: Update tests due to changes in logical plan nodes from DF update
* chore: Fix broken links in docs
* chore: Adjust changes to expected output
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* chore: Update DataFusion pin
* chore: Update cargo
* fix: update for API changes
* fix: Update plans
* chore: Update for new api
* fix: Update plans
* chore: Update for API changes more
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: update DataFusion and arrow/parquet/arrow-flight to 39.0.0
* chore: update DataFusion and arrow/parquet/arrow-flight to 39.0.0 in workspace-hack/Cargo.toml
* chore: Run cargo hakari tasks
* chore: fix CI test and lint
* chore: update csv schema
* refactor: remove type-annotate for `Arc`
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This adds a new crate with a type capable of converting
decoded WAL Write Op entries to line protocol and writing
the result to a namespaced destination. The wal crate
now exports a type which reads the sequenced wal ops and
decodes them as namespaced table batch writes.
The "server_util" crate exists only to support HTTP authz operations, so
this commit moves it under the authz crate. This helper is gated by a
feature flag allowing callers to opt into this extra HTTP dependency
(disabled by default).
`time` 0.1 suffers from [RUSTSEC-2020-0071] and many upstream crates
have tried to remove it for years. The last dependency is
1. `chrono-english`
2. `chrono` (default features)
3. `chrono` (oldtime)
4. `time` 0.1
`chrono-english` doesn't seem to be super well maintained, but I
couldn't find a nice replacement for it. Luckily the master branch of
`chrono-english` is already fixed, so let's just directly use that.
[RUSTSEC-2020-0071]: https://rustsec.org/advisories/RUSTSEC-2020-0071
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update datafusion and arrow/parquet to 37, tonic to 0.9.1
* refactor: Update for FieldRef and other API changes
* fix: Update field size calculation
* fix: Use `NullBuffer` directly
* fix: remove outdated comment
* chore: Update test for tonic
* chore: Run cargo hakari tasks
* chore: cargo update
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>