* feat: support FlightSQL by serving gRPC requests on same port as HTTP
This commit adds support for FlightSQL queries via gRPC to the influxdb3 service. It does so by ensuring the QueryExecutor implements the QueryNamespaceProvider trait, and the underlying QueryDatabase implements QueryNamespace. Satisfying those requirements allows the construction of a FlightServiceServer from the service_grpc_flight crate.
The FlightServiceServer is a gRPC server that can be served via tonic at the API surface; however, enabling this required some tower::Service wrangling. The influxdb3_server/src/server.rs module was introduced to house this code. The objective is to serve both gRPC (via the newly introduced tonic server) and standard REST HTTP requests (via the existing HTTP server) on the same port.
This is accomplished by the HybridService which can handle either gRPC or non-gRPC HTTP requests. The HybridService is wrapped in a HybridMakeService which allows us to serve it via hyper::Server on a single bind address.
End-to-end tests were added in influxdb3/tests/flight.rs. These cover some basic FlightSQL cases. A common.rs module was added that introduces some fixtures to aid in end-to-end tests in influxdb3.
* feat: add query and write cli for influxdb3
Adds two new sub-commands to the influxdb3 CLI:
- query: perform queries against the running server
- write: perform writes against the running server
Both share a common set of parameters for connecting to the database
which are managed in influxdb3/src/commands/common.rs.
Currently, query supports all underlying output formats, and can
write the output to a file on disk. It only supports SQL as the
query language, but will eventually also support InfluxQL.
Write supports line protocol for input and expects the source of
data to be from a file.
This commit adds basic authorization support to Edge. Up to this point
we didn't need have authorization at all and so the server would
receive and accept requests from anyone. This isn't exactly secure or
ideal for a deployment and so we add a basic form of authentication.
The way this works is that a user passes in a hex encoded sha256 hash of
a given token to the '--bearer-token' flag of the serve command. When
the server starts with this flag it will now check a header of the form
'Authorization: Bearer <token>' by making sure it is valid in the sense
that it is not malformed and that when token is hashed it matches the
value passed in on the command line. The request is denied with either a
400 Bad Request if the header is malformed or a 401 Unauthorized if the
hash does not match or the header is missing.
The user is provided a new subcommand of the form: 'influxdb3 create
token <token>' where the output contains the command to run the server
with and what the header should look like to make requests.
I can see future work including multiple tokens and rotating between
them or adding new ones to a live service, but for now this shall
suffice.
As part of the commit end-to-end tests are included to run the server
and make requests against the HTTP API and to make sure that requests
are denied for being unauthorized, accepted for having the right header,
or denied for being malformed.
Also as part of this commit a small fix is included for 'Accept: */*'
headers. We were not checking for them and if this header was included
we were denying it instead of sending back the default payload return
value.
A new crate, influxdb3_client, was added, which provides the Client
struct. This gives programmatic access to the influxdb3 HTTP API.
Two primary methods are provided:
- `api_v3_write_lp`
- `api_v3_query_sql`
Each API uses a builder approach to composing the request to be sent.
Response handling was kept somewhat naive, in `write_lp` case not returning
anything, and in `query_sql`, returning raw `Bytes`. We may improve this in
future once the respective APIs have their responses more finalized.
Both methods, as well as all associated types are documented with rustdocs.
The general approach to these methods was to use a builder style API so that
the user of the client can build their requests functionally before sending them
to the server.
* refactor: move write buffer into its own dir
* feat: implement write buffer segment with wal flushing
This creates the WriteBufferFlusher and OpenBufferSegment. If a wal is passed into the buffer, data written into it will be persisted to the wal for the initialized segment id.
* refactor: use crossbeam in flusher and pr cleanup
This commit adds the ability to choose the output format of a query via
the v3 api so that a user can choose, whether by Accept headers or the
format url param, how the data will be returned to them.
Prior to this commit the default was a pretty printed text format, but
that instead has been changed to json as the default.
There are multiple formats one can choose:
1. json
2. csv
3. pretty printed text
4. parquet
I've tested each of these out and it works well. In particular the
parquet output is exciting as users will be able to perform a query and
receive back parquet data that they can then load into say a Python
script or something else to work on and operate it. As we extend what
data can be queried, as well as persisting it, what people will be able
to do with Edge will be really cool and I'm interested to see how users
will end up using this functionality in the future.
* fix: add rust-analyzer to toolchain file
Added the rust-analyzer component to the rust-toolchain.toml
file so that the correct version of rust-analyzer is installed
on Apple Silicone.
This will allow the LSP to work on Apple Silicone machines.
* chore: update deps for cargo deny
* chore(deps): Update arrow and datafusion to 49.0.0
This commit copies in our dependency code from influxdb_iox in order for
us to be able to upgrade from a forked version of 46.0.0 to 49.0.0 of
both arrow and datafusion. Most of the important changes were around how
we consumed the crates in influxdb3(_server/_write). Those diffs are
particularly worth looking at as the rest was a straight copy and we
don't touch those crates in our development currently for influxdb3
edge.
* fix: regenerate workspace hack crate
* fix: Protobuf issues with incompatibility labels
* fix: Broken CI yaml
* fix: buf version
* fix: Only check IOx repo
* fix: Remove protobuf lint
* fix: Comment out call to protobuf-lint
* feat: Implement Catalog r/w for persister
This commit implements reading and writing the Catalog to the object
store. This was already stubbed out functionality, but it just needed an
implementation. Saving it to the object store is pretty straight forward
as it just serializes it to JSON and writes it to the object store. For
loading, it finds the most recently added Catalog based on the file name
and returns that from the object store in it's deserialized form and
returned to the caller.
This commit also adds some tests to make sure that the above
functionality works as intended.
* feat: Implement Segment r/w for persister
This commit continues the work on the persister by implementing the
persist_segment and load_segment functions for the persister. Much like
the Catalog implementation, it's serialized to JSON before being
persisted to the object store in persist_segment. This is pretty
straightforward. For the loading though we need to find the most recent
n segment files and so we need to list them and then return the most
recent n. This is a little more complicated to do, but there are
comments in the code to make it easier to grok.
We also implement more tests to make sure that this part of the
persister works as expected.
* feat: Implement Parquet r/w to persister
This commit does a few things:
- First we add methods to the persister trait for reading and writing
parquet files as these were not stubbed out in prior commits
- Secondly we add a method to serialize a SendableRecordBatchStream into
Parquet bytes
- With these in place implementing the trait methods is pretty
straightforward: hand a path in and a stream and get back some
metadata about the file persisted and also get the bytes back if
loading from the store
Of course we also add more tests to make sure this all works as
expected. Do note that this does nothing to make sure that we bound how
much memory is used or if this is the most efficient way to write
parquet files. This is mostly to get things working with the
understanding that future refinement on the approach might be needed.
* fix: Update smallvec for crate advisory
* fix: Implement better filename handling
* feat: Handle loading > 1000 Segment Info files
This commit introduces 4 new types in the paths module for the
influxdb3_write crate. They are:
- ParquetFilePath
- CatalogFilePath
- SegmentInfoFilePath
- SegmentWalFilePath
Each of these corresponds to an object store path and for the WAL file an
on disk path that we can use to address the needed files in a consistent way
and not need to have path construction be duplicated to address these files.
These types also Deref/AsRef to the object_store::path::Path type (or the
std::path::Path type for the Wal) so that they can be used in places that
expect the type such as various object_store/std::fs and so that we can use
the underlying type's methods without needing to implement them for each
type as they are just a thin wrapper around those types.
This commit adds some tests to make sure that the path construction
works as intended and also updates the `wal.rs` file to use the new
`SegmentWalFilePath` instead of just a `PathBuf`.
Closes: #24578
* feat: add basic wal implementation for Edge
This WAL implementation uses some of the code from the wal crate, but departs pretty significantly from it in many ways. For now it uses simple JSON encoding for the serialized ops, but we may want to switch that to Protobuf at some point in the future. This version of the wal doesn't have its own buffering. That will be implemented higher up in the BufferImpl, which will use the wal and SegmentWriter to make data in the buffer durable.
The write flow will be that writes will come into the buffer and validate/update against an in memory Catalog. Once validated, writes will get buffered up in memory and then flushed into the WAL periodically (likely every 10-20ms). After being flushed to the wal, the entire batch of writes will be put into the in memory queryable buffer. After that responses will be sent back to the clients. This should reduce the write lock pressure on the in-memory buffer considerably.
In this PR:
- Update the Wal, WalSegmentWriter, and WalSegmentReader traits to line up with new design/understanding
- Implement wal (mainly just a way to identify segment files in a directory)
- Implement WalSegmentWriter (write header, op batch with crc, and track sequence number in segment, re-open existing file)
- Implement WalSegmentReader
* refactor: make Wal return impl reader/writer
* refactor: clean up wal segment open
* fix: WriteBuffer and Wal usage
Turn wal and write buffer references into a concrete type, rather than dyn.
* fix: have wal loading ignore invalid files
* fix: build, upgrade rustc, and deps
This commit upgrades Rust to 1.75.0, the latest release. We also
upgraded our dependencies to stay up to date and to clear out any
uneeded deps from the lockfile. In order to make sure everything works
this also fixes the build by upgrading the workspace-hack crate using
cargo hikari and removing the `workspace.lint` that was in
influxdb3_write that didn't need to be there, probably from a merge
issue.
With this we can build influxdb3 as our default on main, but this alone
is not enough to fix CI and will be addressed in future commits.
* fix: warnings for influxdb3 build
This commit fixes the warnings emitted by `cargo build` when compiling
influxdb3. Mainly it adds needed lifetimes and removes uneccesary
imports and functions calls.
* fix: all of the clippy lints
This for the most part just applies suggested fixes by clippy with a few
exceptions:
- Generated type crates had additional allows added since we can't
control what code gets made
- Things that couldn't be automatically fixed were done so manually in
particular adding a Send bound for traits that created a Future that
should be Send
We also had to fix a build issue by adding a feature for tokio-compat
due to the upgrade of deps. The workspace crate was updated accordingly.
* fix: failing test due to rust panic message change
Inbetween rustc 1.72 and rustc 1.75 the way that error messages were
displayed when panicing changed. One of our tests depended on the output
of that behavior and this commit updates the error message to the new
form so that tests will pass.
* fix: broken cargo doc link
* fix: cargo formatting run
* fix: add workspace-hack to influxdb3 crates
This was the last change needed to make sure that the workspace-hack
crate CI lint would pass.
* fix: remove tests that can not run anymore
We removed iox code from this code base and as a result some tests
cannot be run anymore and so this commit removes them from the code base
so that we can get a green build.
* WIP: basic influxdb3 command and http server
* WIP: write lp, buffer, query out
* WIP: test write & query on influxdb3_server, fix warnings
* WIP: pull write buffer and catalog into separate crate
* WIP: sketch out types used for write: buffer, wal, persister
* WIP: remove a bunch of old IOx stuff and fmt
Adds initialisation code to the routers to instantiate an
AntiEntropyActor, pre-populate the Merkle Search Tree during schema
warmup, and maintain it at runtime.
* chore: Update DataFusion pin again
* chore: update for different type
* fix: statistics
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Use output of #8725 within the column ranges of the querier. Currently
this won't have any effect since the column ranges are only used to
prune parquet files and parquet files come with their own, more precise
time range (and that information has priority). However for #8705 we
want to use it to prune partitions before needing to deal with the
parquet files.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: impl `PartialEq + Eq` for `TestError`
* feat: i->q V2 circuit breaker
This is a straight port from V1, it even uses the same test. The code is
copied though (instead of reusing the old one) because the interface in
the V2 client is so different and the new testing infra is also nicer
(IMHO).
For #8349.
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Changes the default ingester configuration to assign half the logical
cores to datafusion for persist execution. Prior to this commit,
datafusion always used 4 threads by default.
In situations where the ingesters are configured with 4 logical cores or
less, the periodic persist can start enough persist jobs to keep the 4
threads assigned to datafusion busy. Because there are enough threads to
saturate all CPU cores, these CPU-heavy persist threads can impact write
latency by stealing CPU time from the tokio runtime threads.
This change assigns exactly half the threads to DF by default, ensuring
there's always N/2 cores to service I/O heavy API requests.
* chore: Update DataFusion pin
* chore: Update for new API
* fix: fix test
* fix: only check error messages
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update DataFusion pin
* chore: Update for new API
* fix: Update for API
* fix: update compactor test
* fix: Update to patched version of arrow 46.0.0
* fix: map `DataFusionError::Configuration` to an internal error
* fix: do not use deprecated API
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
The layer that serializes our requests. This also contains the logic to
leave out non-serialiable filters like the V1 version (same tests, just
slightly differently arranged).
For #8349.
* feat: more `TestResponse` constructors
* feat: "logging" layer for i->q V2 client
Logging layer for #8349. This mostly logs in debug mode but emits errors
to the log. Simple implementation that can be extended later.
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Optionally initialise the gossip subsystem in the compactor.
This will cause the compactor to perform PEX and join the cluster, but
as it registers no topic interests, it will not receive any
application-level payloads.
No messages are currently sent (in fact, gossip shuts down immediately).
Adds a crate that layers compaction-specific gossip types and
abstractions over the underlying gossip transport for a nicer (and
decoupled!) internal API.
Adds basic structure for #8349. This will be filled in using separate
PRs for easier review.
The layer structure was chosen to simplify testing and allow composition
of features (like retries, circuit breaking, metrics, etc.). In contrast
to the V1 client (`querier::ingester`) a client here addresses exactly 1
ingester, not multiple (via an `addr` parameter). The tracking around
mutiple states in the V1 version is not really nice and overly
complicated.
* chore(deps): Bump chrono from 0.4.26 to 0.4.27
Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.26 to 0.4.27.
- [Release notes](https://github.com/chronotope/chrono/releases)
- [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md)
- [Commits](https://github.com/chronotope/chrono/compare/v0.4.26...v0.4.27)
---
updated-dependencies:
- dependency-name: chrono
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
* chore: Run cargo hakari tasks
* fix: Update deprecated chrono methods to their now-recommended versions
`chrono::DateTime::<Tz>::from_utc` has been deprecated and it's now
recommended to use `chrono::DateTime::from_naive_utc_and_offset`
instead.
<https://github.com/chronotope/chrono/pull/1175>
Note that the `Timestamp` type in `influxdb_influxql_parser` is an alias
for `chrono::DateTime<FixedOffset>`.
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Provide a new broadcast_subset() method to gossip users, enabling a
message to be sent to a random subset of peers that have a registered
interest in the message being dispatched.
Adds a (currently unused) NamespaceCache decorator that observes the
post-merge content of the cache to maintain a content hash.
This makes use of a Merkle Search Tree
(https://inria.hal.science/hal-02303490) to track the CRDT content of
the cache in a deterministic structure that allows for synchronisation
between peers (itself a CRDT).
The hash of two routers' NamespaceCache will be equal iff their cache
contents are equal - this can be used to (very cheaply) identify
out-of-sync routers, and trigger convergence. The MST structure used
here provides functionality to compare two compact MST representations,
and identify subsets of the cache that are out-of-sync, allowing for
cheap convergence.
Note this content hash only covers the tables, their set of columns, and
those column schemas - this reflects the fact that only these values may
currently be converged by gossip. Future work will enable full
convergence of all fields.
* refactor: `is_terminal` no longer requires an external crate
This is now part of the stdlib.
* chore: Run cargo hakari tasks
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Adds a reusable "gossip_parquet_file" crate that provides a use-case
specific wrapper over the underlying gossip transport.
This crate deals with the encoding and decoding of parquet gossip
messages, handling them off to the application, and decoupling latency
of handlers from the gossip reactor.
Adds a new gossip_schema crate that provides a high-level interface to
schema change notifications.
This crate layers schema-specific interfaces over the existing low-level
gossip crate. Users can obtain best-effort schema change notifications
by implementing a SchemaEventHandler delegate given to a SchemaRx, or
efficiently dispatch schema change notifications to listening peers
using a SchemaTx.
Schema notifications are sent over the Topic::SchemaChanges topic
(ID=1), which the caller must register as an interest on receiving
gossip nodes.
Define new proto for the structure that gets sent from router to ingester and persisted in the ingester WAL.
Create ingest_structure crate with functions to convert from line protocol to new proto structure while validating schema.
Add function to convert new proto structure to RecordBatch.
* feat: gRPC interface for ingester->querier v2
Note that the actual "to/from bytes" conversion for the schema and
batches will be implemented in #8347.
Closes#8346.
* fix: typo
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
---------
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>