Commit Graph

49315 Commits (054ac7e8a305defd6765bba7cd9489ec803bdf0f)

Author SHA1 Message Date
Paul Dix 12636ca759
fix: loader error with single wal file (#24814)
Fixes a bug where the loader would error out if there was a wal segment file for a previous segment that hand't been persisted, and a new wal file had to be created for the new open segment. This would show up as an error if you started the server and then stopped and restarted it without writing any data.
2024-03-25 15:40:21 -04:00
Jamie Strandboge 58d4369e66
chore: tweak wording and don't reference gpg key in SECURITY.md (#24838) 2024-03-25 14:34:36 -05:00
Paul Dix 04b9cf6cc3
fix: catalog persist with new segment (#24813)
When a write comes into the buffer that both updates the catalog and creates a new segment, it would create that segment with a catalog sequence number that matched what happened after the catalog modification. The result is that when the segment is persisted, the catalog won't be persisted because it wasn't being viewed as being updated. This fixes that.
2024-03-25 15:18:43 -04:00
Jamie Strandboge f4cfae37d8
chore: add SECURITY.md (#24820) 2024-03-25 11:30:04 -05:00
Paul Dix 1827866d00
feat: initial load generator implementation (#24808)
* feat: initial load generator implementation

This adds a load generator as a new crate. Initially it only generates write load, but the scaffolding is there to add a query load generator to complement the write load tool.

This could have been added as a subcommand to the influxdb3 program, but I thought it best to have it separate for now.

It's fairly light on tests and error handling given its an internal tooling CLI. I've added only something very basic to test the line protocol generation and run the actual write command by hand.

I included pretty detailed instructions and some runnable examples.

* refactor: address PR feedback
2024-03-25 08:26:24 -04:00
Trevor Hilton 4f3288b4c4
feat: support query parameters in the `influxdb3_client` (#24806)
feat: add query parameter support to influxdb3 client

This adds the ability to use parameterized queries in the influxdb3_client crate
when calling the /api/v3/query_sql and /api/v3/query_influxql APIs.

The QueryRequestBuilder now has two new methods: with_param and
with_try_param, that allow binding of parameters to a query being made.

Tests were added in influxdb3_client to verify their usage with both sql and
influxql query APIs.
2024-03-23 11:06:08 -04:00
Trevor Hilton 2febaff24b
feat: support query parameters (#24804)
feat: support query parameters

This adds support for parameters in the /api/v3/query_sql
and /api/v3/query_influxql API

The new parameter `params` is supported in the URL query string
of a GET request, or in the JSON body of a POST request.

Two new E2E tests were added to check successful GET/POST as well
as error scenario when params are not provided for a query string
that would expect them.
2024-03-23 10:41:00 -04:00
BiKangNing 67cce99df7
chore: fix some typos (#24803)
Signed-off-by: depthlending <bikangning@outlook.com>
2024-03-22 09:32:37 -04:00
Michael Gattozzi a2984cdc17
chore: Update to Rust 1.77.0 (#24800)
* chore: Update to Rust 1.77.0

This is a fairly quiet upgrade. The only changes are some lints around
`OpenOptions` that were added to clippy between 1.75 and this version
and they're small changes that either remove unecessary function calls
or add a needed function call.

* fix: cargo-deny by using the --locked flag
2024-03-21 13:00:15 -04:00
Trevor Hilton caae9ca9f2
chore: `influxdb3_core` update (#24798)
chore: sync in latest core changes
2024-03-21 10:29:56 -04:00
Trevor Hilton 84b85a9b1c
refactor: use `/query` for v1 query API endpoint (#24790)
feat: handle v1 query API at /query and update tests
2024-03-20 08:26:28 -04:00
Trevor Hilton 1fe414c14b
feat: support v1 query API (#24746)
feat: support the v1 query API

This PR adds support for the `/api/v1/query` API, which is meant to
serve the original InfluxDB v1 query API, to serve single statement
`SELECT` and `SHOW` queries. The response, which is returned as JSON,
can be chunked via the `chunked` and optional `chunk_size` parameters.
An optional `epoch` parameter can be supplied to have `time` column
timestamps converted to a UNIX epoch with the given precision.

## Buffering

The response is buffered by default, but if the `chunked` parameter
is not supplied, or is passed as `false`, then the entire query
result will be buffered into memory before being returned in the
response. This is how the original API behaves, so we are replicating
that here.

When `chunked` is passed as `true`, then the response will be a
stream of chunks, where each chunk is a self-contained response,
with the same structure as that of the non-chunked response. Chunks
are split up by the provided `chunk_size`, or by series, i.e.,
measurement, which ever comes first. The default chunk size is 10,000
rows.

Buffering is implemented with the `QueryResponseStream` and
`ChunkBuffer` types, the former implements the `Stream` trait,
which allows it to be streamed in the HTTP response directly with
`hyper`'s `Body::wrap_stream`. The `QueryResponseStream` is a wrapper
around the inner arrow `RecordBatchStream`, which buffers the
streamed `RecordBatch`es according to the requested chunking parameters.

## Testing

Two new E2E tests were added to test basic query functionality and
chunking behaviour, respectively. In addition, some manual testing
was done to verify that the InfluxDB Grafana plugin works with this
API.
2024-03-15 13:38:15 -04:00
Michael Gattozzi 1f8e079579
fix: slow limits test (#24761)
This commit re-enables the limits test after making a fix that has it
run <1 second on my laptop vs the old behavior of >=30 seconds. It does
so by constructing one single write_lp request to create 1995 tables
rather than 1995 individual requests that make a table. This is far more
efficient.
2024-03-12 15:51:23 -04:00
Trevor Hilton 6849576ce0
feat: support authenticating v1 APIs with p parameter (#24760)
feat: support authenticating v1 APIs with p parameter

The p URL query parameter can be used to authenticate requests
to the /api/v1/query and /api/v1/write APIs

A test was added to ensure this works
2024-03-12 11:42:59 -04:00
Paul Dix 01d33f69b5
feat: wire up query from parquet files (#24749)
* feat: wire up query from parquet files

This adds the functionality to query from Parquet files that have been persisted in object storage. Any segments that are loaded up on boot up will be included (limit of 1k segments at the time of this PR). In a follow on PR we should add a good end-to-end test that has persistence and query through the main API (might be tricky).

* Move BufferChunk and ParquetChunk into chunk module
* Add object_store_url to Persister
* Register object_store on server startup
* Add loaded persisted_segments to SegmentState

* refactor: PR feedback
2024-03-12 09:47:32 -04:00
Paul Dix db77ed0a19
feat: Implement automatic segment persistence (#24747)
This implements automatic segment persistence and cleanup of the WAL files. Every second the write buffer checks to and persists segments that have been open for longer than half the segment duration and that are not in the current or next block of time.

One thing left to do is to deal with blocks of time that have had multiple segments persisted in them. This will be addressed in a follow on PR.

Specific udpates:
* Update Persister persist_segment to take borrow
* Move SegmentState into its own module
* Create functions to close open segments and persist them when time
* Add tokio task to check every second to see if segments should be persisted
2024-03-11 15:10:18 -04:00
Paul Dix bf931970d3
feat: Segment the write buffer on time (#24745)
* Split WriteBuffer into segments

* Add SegmentRange and SegmentDuration
* Update WAL to store SegmentRange and to be able to open up multiple ranges
* Remove Partitioner and PartitionBuffer

* Update SegmentState and loader

* Update SegmentState with current, next and outside
* Update loader and tests to load up current, next and previous outside segments based on the passed in time and desired segment duration

* Update WriteBufferImpl and Flusher

* Update the flusher to flush to multiple segments
* Update WriteBufferImpl to split data into segments getting written to
* Update HTTP and WriteBuffer to use TimeProvider

* Wire up outside segment writes and loading

* Data outside current and next no longer go to a single segment, but to a segment based on that data's time. Limits to 100 segments of time that can be written to at any given time.

* Refactor SegmentDuration add config option

* Refactors SegmentDuration to be a new type over duration
* Adds the clap block configuration to pass SegmentDuration, defaulting to 1h

* refactor: SegmentState and loader

* remove the current_segment and next_segment from the loader and segment state, instead having just a collection of segments
* open up only the current_segment by default
* keep current and next segments open if they exist, while others go into persisting or persisted

* fix: cargo audit

* refactor: fixup PR feedback
2024-03-11 13:54:09 -04:00
Trevor Hilton c4d651fbd1
feat: implement `Authorizer` to authorize all HTTP requests (#24738)
* feat: add `Authorizer` impls to authz REST and gRPC

This adds two new Authorizer implementations to Edge: Default and
AllOrNothing, which will provide the two auth options for Edge.

Both gRPC requests and HTTP REST request will be authorized by
the same Authorizer implementation.

The SHA512 digest action was moved into the `Authorizer` impl.

* feat: add `ServerBuilder` to construct `Server

A builder was added to the Server in this commit, as part of an
attempt to get the server creation to be more modular.

* refactor: use test server fixture in auth e2e test

Refactored the `auth` integration test in `influxdb3` to use the
`TestServer` fixture; part of this involved extending the fixture
to be configurable, so that the `TestServer` can be spun up with
an auth token.

* test: add test for authorized gRPC

A new end-to-end test, auth_grpc, was added to check that
authorization is working with the influxdb3 Flight service.
2024-03-08 14:18:17 -05:00
Trevor Hilton fad681c06c
chore: gate `limits` test behind a feature flag (#24737)
chore: gate limits test behind a feature flag
2024-03-06 14:37:38 -05:00
Trevor Hilton 969bec2788
docs: add docs to hybrid service code (#24727)
docs: add docs to hybrid service code
2024-03-06 13:26:54 -05:00
Michael Gattozzi ce8c158956
feat: Change Bearer Auth Token to use random bits (#24733)
This changes the 'influxdb3 create token' command so that it will just
automatically generate a completely random base64 encoded token prepended with
'apiv3_' that is then fed into a Sha512 algorithm instead of Sha256. The
user can no longer pass in a token to be turned into the proper output.

This also changes the server code to handle the change to Sha512 as well.

Closes #24704
2024-03-06 12:43:00 -05:00
Trevor Hilton 971676b498
test: add tests to check InfluxQL over Flight (#24732)
test: add tests to check InfluxQL over Flight
2024-03-05 15:41:30 -05:00
Trevor Hilton fb4f09d675
feat: support `SHOW RETENTION POLICIES` (#24729)
feat: support SHOW RETENTION POLICIES

Added support through the influxdb3 Query Executor to perform
SHOW RETENTION POLICIES queries, both on a specific database as well
as accross all databases.

Test cases were added to check this functionality.
2024-03-05 15:40:58 -05:00
Trevor Hilton 423308dcd4
feat: extend InfluxQL rewriter for SELECT and EXPLAIN (#24726)
Extended the InfluxQL rewriter to handle SELECT statements with nested
sub-queries, as well as EXPLAIN statements.

Tests were added to check all the rewrite cases for happy path and
failure modes.
2024-03-05 15:40:16 -05:00
Michael Gattozzi 160ac34edd
fix: CircleCI image deprecations (#24725) 2024-03-04 11:59:22 -05:00
Michael Gattozzi 573c21c61a
feat: Enable Obj Storage Support for Azure/GCP/S3 (#24724)
In order for Edge to support other object stores besides the local file
system we just needed to turn on the features in clap_blocks which
handles all of the configuration needed to create an `Arc<dyn ObjectStore>`
for us. We already were calling it's `make_object_store` function that
did this and so it's a simple switch flip to turn it on.

Closes: #24553
2024-03-04 10:51:57 -05:00
Michael Gattozzi a5082ec432
feat: Add limits for InfluxDB Edge (#24703)
This commit is the final piece for the write_lp endpoint. It adds limits
to Edge such that:

- There can only be 5 Databases
- There can only be 500 Columns per Table
- There can only be 2000 Tables across all Databases

We do this by modifying the catalog code to error out whenever one of
these limits would be exceeded before permanently modifying the schema.
These are hard coded limits and cannot be configured by the user.

Closes #24554
2024-03-04 10:24:33 -05:00
Trevor Hilton f7892ebee5
feat: add the `api/v3/query_influxql` API (#24696)
feat: add query_influxql api

This PR adds support for the /api/v3/query_influxql API. This re-uses code from the existing query_sql API, but some refactoring was done to allow for code re-use between the two.

The main change to the original code from the existing query_sql API was that the format is determined up front, in the event that the user provides some incorrect Accept header, so that the 400 BAD REQUEST is returned before performing the query.

Support of several InfluxQL queries that previously required a bridge to be executed in 3.0 was added:

SHOW MEASUREMENTS
SHOW TAG KEYS
SHOW TAG VALUES
SHOW FIELD KEYS
SHOW DATABASES

Handling of qualified measurement names in SELECT queries (see below)

This is accomplished with the newly added iox_query_influxql_rewrite crate, which provides the means to re-write an InfluxQL statement to strip out a database name and retention policy, if provided. Doing so allows the query_influxql API to have the database parameter optional, as it may be provided in the query string.

Handling qualified measurement names in SELECT

The implementation in this PR will inspect all measurements provided in a FROM clause and extract the database (DB) name and retention policy (RP) name (if not the default). If multiple DB/RP's are provided, an error is thrown.

Testing

E2E tests were added for performing basic queries against a running server on both the query_sql and query_influxql APIs. In addition, the test for query_influxql includes some of the InfluxQL-specific queries, e.g., SHOW MEASUREMENTS.

Other Changes

The influxdb3_client now has the api_v3_query_influxql method (and a basic test was added for this)
2024-03-01 12:27:38 -05:00
Michael Gattozzi 3c9e6ed836
fix: Add docker folder back for CI (#24720) 2024-02-29 16:47:41 -05:00
Michael Gattozzi 59d8e23d49
fix: Readd the Dockerfile for the main branch (#24719) 2024-02-29 16:33:36 -05:00
Michael Gattozzi 73e261c021
feat: Split out shared core crates from Edge (#24714)
This commit is a major refactor for the code base. It mainly does four
things:

1. Splits code shared between the internal IOx repository and this one
   into it's own repo over at https://github.com/influxdata/influxdb3_core
2. Removes any docs or anything else that did not relate to this project
3. Reorganizes the Cargo.toml files to use the top level Cargo.toml to
   declare dependencies and versions to keep all crates in sync and sets
   all others to use `<dep>.workspace = true` unless it's an optional
   dependency
4. Set the top level Cargo.toml to point to the core crates as git
   dependencies

With this any changes specific to Edge will be contained here, updating
deps will be a PR over in `influxdata/influxdb3_core`, and we can prove
out the viability for this model to use for IOx.
2024-02-29 16:21:41 -05:00
Paul Dix 2da5803bfd
feat: implement loader for persisted state (#24705)
* fix: persister loading with no segments

Fixes a bug where the persister would throw an error if attempting to load segments when none had been persisted.

Moved persister tests into tests block.

* feat: implement loader for persisted state

This implements a loader for the write buffer. It loads the catalog and the buffer from the WAL.

Move Persister errors into their own type now that the write buffer load could return errors from the persister.

This doesn't yet rotate segments or trigger persistence of newly closed segments, which will be addressed in a future PR.

* fix: cargo update to fix audit

* refactor: add error type to persister trait

* refactor: use generics instead of dyn

---------

Co-authored-by: Trevor Hilton <thilton@influxdata.com>
2024-02-29 15:58:19 -05:00
Brandon Pfeifer 3dcf2778d6
chore: remove unused CircleCI scripts (#24701) 2024-02-28 09:48:57 -05:00
Michael Gattozzi 8fec1d636e
feat: Add write_lp partial write, name check, and precision (#24677)
* feat: Add partial write and name check to write_lp

This commit adds new behavior to the v3 write_lp http endpoint by
implementing both partial writes and checking the db name for validity.
It also sets the partial write behavior as the default now, whereas
before we would reject the entire request if one line was incorrect.
Users who *do* actually want that behavior can now opt in by putting
'accept_partial=false' into the url of the request.

We also check that the db name used in the request contains only
numbers, letters, underscores and hyphens and that it must start with
either a number or letter.

We also introduce a more standardized way to return errors to the user
as JSON that we can expand over time to give actionable error messages
to the user that they can use to fix their requests.

Finally tests have been included to mock out and test the behavior for
all of the above so that changes to the error messages are reflected in
tests, that both partial and not partial writes work as expected, and
that invalid db names are rejected without writing.

* feat: Add precision to write_lp http endpoint

This commit adds the ability to control the precision of the time stamp
passed in to the endpoint. For example if a user chooses 'second' and
the timestamp 20 that will be 20 seconds past the Unix Epoch. If they
choose 'millisecond' instead it will be 20 milliseconds past the Epoch.

Up to this point we assumed that all data passed in was of nanosecond
precision. The data is still stored in the database as nanoseconds.
Instead upon receiving the data we convert it to nanoseconds. If the
precision URL parameter is not specified we default to auto and take a
best effort guess at what the user wanted based on the order of
magnitude of the data passed in.

This change will allow users finer grained control over what precision
they want to use for their data as well as trying our best to make a
good user experience and having things work as expected and not creating
a failure mode whereby a user wanted seconds and instead put in
nanoseconds by default.
2024-02-27 11:57:10 -05:00
Trevor Hilton 298055e9fb
feat: support FlightSQL in 3.0 (#24678)
* feat: support FlightSQL by serving gRPC requests on same port as HTTP

This commit adds support for FlightSQL queries via gRPC to the influxdb3 service. It does so by ensuring the QueryExecutor implements the QueryNamespaceProvider trait, and the underlying QueryDatabase implements QueryNamespace. Satisfying those requirements allows the construction of a FlightServiceServer from the service_grpc_flight crate.

The FlightServiceServer is a gRPC server that can be served via tonic at the API surface; however, enabling this required some tower::Service wrangling. The influxdb3_server/src/server.rs module was introduced to house this code. The objective is to serve both gRPC (via the newly introduced tonic server) and standard REST HTTP requests (via the existing HTTP server) on the same port.

This is accomplished by the HybridService which can handle either gRPC or non-gRPC HTTP requests. The HybridService is wrapped in a HybridMakeService which allows us to serve it via hyper::Server on a single bind address.

End-to-end tests were added in influxdb3/tests/flight.rs. These cover some basic FlightSQL cases. A common.rs module was added that introduces some fixtures to aid in end-to-end tests in influxdb3.
2024-02-26 15:07:48 -05:00
Michael Gattozzi 75afbbd20e
chore: Remove dependabot for our repo (#24693) 2024-02-26 13:38:20 -05:00
dependabot[bot] ada6561f4a
chore(deps): Bump serde_json from 1.0.113 to 1.0.114 (#24687) 2024-02-25 14:34:37 +00:00
dependabot[bot] fca7b702f0
chore(deps): Bump ring from 0.17.7 to 0.17.8 (#24684) 2024-02-25 14:32:26 +00:00
dependabot[bot] f67968c159
chore(deps): Bump insta from 1.34.0 to 1.35.1 (#24688) 2024-02-25 14:27:40 +00:00
dependabot[bot] 278ecbeb56
chore(deps): Bump serde from 1.0.196 to 1.0.197 (#24689) 2024-02-25 14:26:15 +00:00
dependabot[bot] bc1e8fc15e
chore(deps): Bump unicode-normalization from 0.1.22 to 0.1.23 (#24690) 2024-02-25 14:24:47 +00:00
dependabot[bot] f817d63cf7
chore(deps): Bump ahash from 0.8.8 to 0.8.9 (#24692) 2024-02-25 14:22:32 +00:00
dependabot[bot] 4b6f630387
chore(deps): Bump clap from 4.5.0 to 4.5.1 (#24691) 2024-02-25 14:22:09 +00:00
Trevor Hilton 6ce3165aac
feat: add write and query CLI sub-commands (#24671)
* feat: add query and write cli for influxdb3

Adds two new sub-commands to the influxdb3 CLI:

- query: perform queries against the running server
- write: perform writes against the running server

Both share a common set of parameters for connecting to the database
which are managed in influxdb3/src/commands/common.rs.

Currently, query supports all underlying output formats, and can
write the output to a file on disk. It only supports SQL as the
query language, but will eventually also support InfluxQL.

Write supports line protocol for input and expects the source of
data to be from a file.
2024-02-20 16:14:19 -05:00
Michael Gattozzi de102bc927
feat: Add All or Nothing Bearer token auth support (#24666)
This commit adds basic authorization support to Edge. Up to this point
we didn't need have authorization at all and so the server would
receive and accept requests from anyone. This isn't exactly secure or
ideal for a deployment and so we add a basic form of authentication.

The way this works is that a user passes in a hex encoded sha256 hash of
a given token to the '--bearer-token' flag of the serve command. When
the server starts with this flag it will now check a header of the form
'Authorization: Bearer <token>' by making sure it is valid in the sense
that it is not malformed and that when token is hashed it matches the
value passed in on the command line. The request is denied with either a
400 Bad Request if the header is malformed or a 401 Unauthorized if the
hash does not match or the header is missing.

The user is provided a new subcommand of the form: 'influxdb3 create
token <token>' where the output contains the command to run the server
with and what the header should look like to make requests.

I can see future work including multiple tokens and rotating between
them or adding new ones to a live service, but for now this shall
suffice.

As part of the commit end-to-end tests are included to run the server
and make requests against the HTTP API and to make sure that requests
are denied for being unauthorized, accepted for having the right header,
or denied for being malformed.

Also as part of this commit a small fix is included for 'Accept: */*'
headers. We were not checking for them and if this header was included
we were denying it instead of sending back the default payload return
value.
2024-02-20 15:34:39 -05:00
Trevor Hilton 80505d2b42
feat: add the `influxdb3_client` crate (#24665)
A new crate, influxdb3_client, was added, which provides the Client
struct. This gives programmatic access to the influxdb3 HTTP API.

Two primary methods are provided:
- `api_v3_write_lp`
- `api_v3_query_sql`

Each API uses a builder approach to composing the request to be sent.
Response handling was kept somewhat naive, in `write_lp` case not returning
anything, and in `query_sql`, returning raw `Bytes`. We may improve this in 
future once the respective APIs have their responses more finalized.

Both methods, as well as all associated types are documented with rustdocs.

The general approach to these methods was to use a builder style API so that
the user of the client can build their requests functionally before sending them
to the server.
2024-02-16 15:02:16 -05:00
Paul Dix 3c5e5bf241
feat: Add segment persist of closed buffer segment (#24659)
* feat: add catalog sequence tracking to OpenBufferSegment

* feat: Add segment persist of closed buffer

* refactor: pr review updates

* refactor: PR updates
2024-02-14 10:55:09 -05:00
Paul Dix 4d9095e58d
feat: add segmenting and wal persistence to WriteBuffer (#24624)
* refactor: move write buffer into its own dir

* feat: implement write buffer segment with wal flushing

This creates the WriteBufferFlusher and OpenBufferSegment. If a wal is passed into the buffer, data written into it will be persisted to the wal for the initialized segment id.

* refactor: use crossbeam in flusher and pr cleanup
2024-02-12 12:36:10 -05:00
Michael Gattozzi b555ddf18b
feat: Add different output support to queries (#24616)
This commit adds the ability to choose the output format of a query via
the v3 api so that a user can choose, whether by Accept headers or the
format url param, how the data will be returned to them.

Prior to this commit the default was a pretty printed text format, but
that instead has been changed to json as the default.

There are multiple formats one can choose:

1. json
2. csv
3. pretty printed text
4. parquet

I've tested each of these out and it works well. In particular the
parquet output is exciting as users will be able to perform a query and
receive back parquet data that they can then load into say a Python
script or something else to work on and operate it. As we extend what
data can be queried, as well as persisting it, what people will be able
to do with Edge will be really cool and I'm interested to see how users
will end up using this functionality in the future.
2024-02-12 12:04:05 -05:00
Michael Gattozzi 8a68ae3f11
fix: Remove nightly CI build from Circle CI runs (#24637)
Prior to this change we've had CI fail nightly because we can't push the
image to CI due to permissions issues. The problem is that
influxdata/influxdb_iox is the one that actually has access to push that
data to quay.

This commit removes the nightly build and references to it as this image
is built nightly by the IOx team. If things break we have access to fix
it, but I don't think it'll be an issue.
2024-02-12 10:21:15 -05:00