influxdb

Commit Graph

Author	SHA1	Message	Date
Paul Dix	e8422a240a	feat: Wire up arguments to wal plugin trigger (#25783 ) This allows the user to specify arguments that will be passed to each execution of a wal plugin trigger. The CLI test was updated to check this end to end. Closes #25655	2025-01-10 16:58:18 -05:00
Jackson Newhouse	954416d4d9	fix(processing_engine): enable downstream system-py feature (#25784 )	2025-01-10 13:44:24 -08:00
Paul Dix	7230148b58	feat: Update WAL plugin for new structure (#25777 ) * feat: Update WAL plugin for new structure This ended up being a very large change set. In order to get around circular dependencies, the processing engine had to be moved into its own crate, which I think is ultimately much cleaner. Unfortunately, this required changing a ton of things. There's more testing and things to add on to this, but I think it's important to get this through and build on it. Importantly, the processing engine no longer resides inside the write buffer. Instead, it is attached to the HTTP server. It is now able to take a query executor, write buffer, and WAL so that the full range of functionality of the server can be exposed to the plugin API. There are a bunch of system-py feature flags littered everywhere, which I'm hoping we can remove soon. * refactor: PR feedback	2025-01-10 05:52:33 -05:00
Paul Dix	2d18a61949	feat: Add query API to Python plugins (#25766 ) This ended up being a couple things rolled into one. In order to add a query API to the Python plugin, I had to pull the QueryExecutor trait out of server into a place so that the python crate could use it. This implements the query API, but also fixes up the WAL plugin test CLI a bit. I've added a test in the CLI section so that it shows end-to-end operation of the WAL plugin test API and exercise of the entire Plugin API. Closes #25757	2025-01-09 20:13:20 -05:00
Paul Dix	1ce6a24c3f	feat: Implement WAL plugin test API (#25704 ) * feat: Implement WAL plugin test API This implements the WAL plugin test API. It also introduces a new API for the Python plugins to be called, get their data, and call back into the database server. There are some things that I'll want to address in follow on work: * CLI tests, but will wait on #25737 to land for a refactor of the CLI here * Would be better to hook the Python logging to call back into the plugin return state like here: https://pyo3.rs/v0.23.3/ecosystem/logging.html#the-python-to-rust-direction * We should only load the LineBuilder interface once in a module, rather than on every execution of a WAL plugin * More tests all around But I want to get this in so that the actual plugin and trigger system can get udated to build around this model. * refactor: PR feedback	2025-01-06 17:32:17 -05:00
praveen-influx	43755c2d9c	feat: sys events store added (#25603 ) This commit introduces basic store for sys events and the backing ring buffer. Since the buffer needs to hold arbitrary data, it uses `Box<dyn Any>` closes: https://github.com/influxdata/influxdb/issues/25581	2024-12-02 10:55:37 +00:00
Trevor Hilton	234d37329a	feat: metacache REST APIs to create and delete (#25587 )	2024-11-27 08:41:46 -05:00
praveen-influx	72dcd1866f	feat(telemetry): adds reads and writes (#25409 ) - instrumented code to get read and write measurement - introduced EventsBucket for collection of reads/writes - sampler now samples every minute for all metrics (including reads/writes) - other tidy ups closes: https://github.com/influxdata/influxdb/issues/25372	2024-10-01 18:34:00 +01:00
Michael Gattozzi	54d209d0bf	feat: Add u32 ID for Databases (#25302 ) * feat: Remove lock for FileId tests Since we now are using cargo-nextest in CI we can remove the locks used in the FileId tests to make sure that we have no race conditions * feat: Add u32 ID for Databases This commit adds a new DbId for databases. It also updates paths to use that id as part of the name. When starting up the WriteBuffer we apply the DbId from the persisted snapshot much like we do for ParquetFileId's This introduces the influxdb3_id crate to avoid circular deps with ids. The ParquetFileId should also be moved into this crate, but it's outside the scope of this change. Closes #25301	2024-09-18 11:44:04 -04:00
Trevor Hilton	7474c0b3b4	feat: add `system.parquet_files` table (#25225 ) This extends the system tables available with a new `parquet_files` table which will list the parquet files associated with a given table in a database. Queries to system.parquet_files must provide a table_name predicate to specify the table name of interest. The files are accessed through the QueryableBuffer. In addition, a test was added to check success and failure modes of the new system table query. Finally, the Persister trait had its associated error type removed. This was somewhat of a consequence of how I initially implemented this change, but I felt cleaned the code up a bit, so I kept it in the commit.	2024-08-08 08:46:26 -04:00
Paul Dix	2b8fc7b44e	refactor: Move Catalog into influxdb3_catalog crate (#25210 ) * refactor: Move Catalog into influxdb3_catalog crate This moves the catalog and its serialization logic into its own crate. This is a precursor to recording more catalog modifications into the WAL. Fixes #25204 * fix: cargo update * fix: add version = 2 to deny.toml * fix: update deny.toml * fix: add CCO to deny.toml	2024-08-02 16:04:12 -04:00
Paul Dix	3265960010	refactor: implement new wal and refactor write buffer (#25196 ) * feat: refactor WAL and WriteBuffer There is a ton going on here, but here are the high level things. This implements a new WAL, which is backed entirely by object store. It then updates the WriteBuffer to be able to work with how the new WAL works, which also required an update to how the Catalog is modified and persisted. The concept of Segments has been removed. Previously there was a separate WAL per segment of time. Instead, there is now a single WAL that all writes and updates flow into. Data within the write buffer is organized by Chunk(s) within tables, which is based on the timestamp of the row data. These are known as the Level0 files, which will be persisted as Parquet into object store. The default chunk duration for level 0 files is 10 minutes. The WAL is written as single files that get created at the configured WAL flush interval (1s by default). After a certain number of files have been created, the server will attempt to snapshot the WAL (default is to snapshot the first 600 files of the WAL after we have 900 total, i.e. snapshot 10 minutes of WAL data). The design goal with this is to persist 10 minute chunks of data that are no longer receiving writes, while clearing out old WAL files. This works if data getting written in around "now" with no more than 5 minutes of delay. If we continue to have delayed writes, a snapshot of all data will be forced in order to clear out the WAL and free up memory in the buffer. Overall, this structure of a single wal, with flushes and snapshots and chunks in the queryable buffer led to a simpler setup for the write buffer overall. I was able to clear out quite a bit of code related to the old segment organization. Fixes #25142 and fixes #25173 * refactor: address PR feedback * refactor: wal to replay and background flush on new * chore: remove stray println	2024-08-01 15:04:15 -04:00
Trevor Hilton	56488592db	feat: API to create last caches (#25147 ) Closes #25096 - Adds a new HTTP API that allows the creation of a last cache, see the issue for details - An E2E test was added to check success/failure behaviour of the API - Adds the mime crate, for parsing request MIME types, but this is only used in the code I added - we may adopt it in other APIs / parts of the HTTP server in future PRs	2024-07-16 10:32:26 -04:00
wayne	cd6734a7c4	chore: remove unused dependencies ioxd_common and test_helpers_end_to_end (#25134 ) Co-authored-by: Trevor Hilton <thilton@influxdata.com>	2024-07-10 09:17:54 -06:00
Jean Arhancet	1fd355ed83	refactor: v1 recordbatch to json (#25085 ) * refactor: refactor serde json to use recordbatch * fix: cargo audit with cargo update * fix: add timestamp datatype * fix: add timestamp datatype * fix: apply feedbacks * fix: cargo audit with cargo update * fix: add timestamp datatype * fix: apply feedbacks * refactor: test data conversion	2024-07-05 09:21:40 -04:00
Jean Arhancet	b6718e59e3	feat: add csv influx v1 (#25030 ) * feat: add csv influx v1 * fix: clippy error * fix: cargo.lock * fix: apply feedbacks * test: add csv integration test * fix: cargo audit	2024-06-25 08:45:55 -04:00
Trevor Hilton	0201febd52	feat: add the `system.queries` table (#24992 ) The system.queries table is now accessible, when queries are initiated in debug mode, which is not currently enabled via the HTTP API, therefore this is not yet accessible unless via the gRPC interface. The system.queries table lists all queries in the QueryLog on the QueryExecutorImpl.	2024-05-17 12:04:25 -04:00
Trevor Hilton	8f72bf06e1	chore: use latest `influxdb3_core` changes (#24982 ) Introduction of the `TokioDatafusionConfig` clap block for configuring the DataFusion runtime - this exposes many new `--datafusion-*` options on start, including `--datafusion-num-threads` To accommodate renaming of `QueryNamespaceProvider` to `QueryDatabase` in `influxdb3_core`, I renamed the `QueryDatabase` type to `Database`. Fixed tests that broke as a result of sync.	2024-05-13 12:33:50 -04:00
Trevor Hilton	e0465843be	feat: `/ping` API to serve version and revision (#24864 ) * feat: /ping API to serve version The /ping API was added, which is served at GET and POST methods. The API responds with a JSON body containing the version and revision of the build. A new crate was added, influxdb3_process, which takes the process_info.rs module from the influxdb3 crate, and puts it in a separate crate so that other crates (influxdb3_server) can depend on it. This was needed in order to have access to the version and revision values, which are generated at build time, in the HTTP API code of influxdb3_server. A E2E test was added to check that /ping works. E2E TestServer can now have logs emitted using the TEST_LOG environment variable.	2024-04-01 16:57:10 -04:00
Trevor Hilton	7784749bca	feat: support v1 and v2 write APIs (#24793 ) feat: support v1 and v2 write APIs This adds support for two APIs: /write and /api/v2/write. These implement the v1 and v2 write APIs, respectively. In general, the difference between these and the new /api/v3/write_lp API is in the request parsing. We leverage the WriteRequestUnifier trait from influxdb3_core to handle parsing of v1 and v2 HTTP requests, to keep the error handling at that level consistent with distributed versions of InfluxDB 3.0. Specifically, we use the SingleTenantRequestUnifier implementation of the trait. Changes: - Addition of two new routes to the route_request method in influxdb3_server::http to serve /write and /api/v2/write requests. - Database name validation was updated to handle cases where retention policies may be passed in /write requests, and to also reject empty names. A unit test was added to verify the validate_db_name function. - HTTP request authorization in the router will extract the full Authorization header value, and store it in the request extensions; this is used in the write request parsing from the core iox_http crate to authorize write requests. - E2E tests to verify correct HTTP request parsing / response behaviour for both /write and /api/v2/write APIs - E2E tests to check that data sent in through /write and /api/v2/write can be queried back	2024-03-28 13:33:17 -04:00
Trevor Hilton	1fe414c14b	feat: support v1 query API (#24746 ) feat: support the v1 query API This PR adds support for the `/api/v1/query` API, which is meant to serve the original InfluxDB v1 query API, to serve single statement `SELECT` and `SHOW` queries. The response, which is returned as JSON, can be chunked via the `chunked` and optional `chunk_size` parameters. An optional `epoch` parameter can be supplied to have `time` column timestamps converted to a UNIX epoch with the given precision. ## Buffering The response is buffered by default, but if the `chunked` parameter is not supplied, or is passed as `false`, then the entire query result will be buffered into memory before being returned in the response. This is how the original API behaves, so we are replicating that here. When `chunked` is passed as `true`, then the response will be a stream of chunks, where each chunk is a self-contained response, with the same structure as that of the non-chunked response. Chunks are split up by the provided `chunk_size`, or by series, i.e., measurement, which ever comes first. The default chunk size is 10,000 rows. Buffering is implemented with the `QueryResponseStream` and `ChunkBuffer` types, the former implements the `Stream` trait, which allows it to be streamed in the HTTP response directly with `hyper`'s `Body::wrap_stream`. The `QueryResponseStream` is a wrapper around the inner arrow `RecordBatchStream`, which buffers the streamed `RecordBatch`es according to the requested chunking parameters. ## Testing Two new E2E tests were added to test basic query functionality and chunking behaviour, respectively. In addition, some manual testing was done to verify that the InfluxDB Grafana plugin works with this API.	2024-03-15 13:38:15 -04:00
Michael Gattozzi	ce8c158956	feat: Change Bearer Auth Token to use random bits (#24733 ) This changes the 'influxdb3 create token' command so that it will just automatically generate a completely random base64 encoded token prepended with 'apiv3_' that is then fed into a Sha512 algorithm instead of Sha256. The user can no longer pass in a token to be turned into the proper output. This also changes the server code to handle the change to Sha512 as well. Closes #24704	2024-03-06 12:43:00 -05:00
Trevor Hilton	fb4f09d675	feat: support `SHOW RETENTION POLICIES` (#24729 ) feat: support SHOW RETENTION POLICIES Added support through the influxdb3 Query Executor to perform SHOW RETENTION POLICIES queries, both on a specific database as well as accross all databases. Test cases were added to check this functionality.	2024-03-05 15:40:58 -05:00
Trevor Hilton	f7892ebee5	feat: add the `api/v3/query_influxql` API (#24696 ) feat: add query_influxql api This PR adds support for the /api/v3/query_influxql API. This re-uses code from the existing query_sql API, but some refactoring was done to allow for code re-use between the two. The main change to the original code from the existing query_sql API was that the format is determined up front, in the event that the user provides some incorrect Accept header, so that the 400 BAD REQUEST is returned before performing the query. Support of several InfluxQL queries that previously required a bridge to be executed in 3.0 was added: SHOW MEASUREMENTS SHOW TAG KEYS SHOW TAG VALUES SHOW FIELD KEYS SHOW DATABASES Handling of qualified measurement names in SELECT queries (see below) This is accomplished with the newly added iox_query_influxql_rewrite crate, which provides the means to re-write an InfluxQL statement to strip out a database name and retention policy, if provided. Doing so allows the query_influxql API to have the database parameter optional, as it may be provided in the query string. Handling qualified measurement names in SELECT The implementation in this PR will inspect all measurements provided in a FROM clause and extract the database (DB) name and retention policy (RP) name (if not the default). If multiple DB/RP's are provided, an error is thrown. Testing E2E tests were added for performing basic queries against a running server on both the query_sql and query_influxql APIs. In addition, the test for query_influxql includes some of the InfluxQL-specific queries, e.g., SHOW MEASUREMENTS. Other Changes The influxdb3_client now has the api_v3_query_influxql method (and a basic test was added for this)	2024-03-01 12:27:38 -05:00
Michael Gattozzi	73e261c021	feat: Split out shared core crates from Edge (#24714 ) This commit is a major refactor for the code base. It mainly does four things: 1. Splits code shared between the internal IOx repository and this one into it's own repo over at https://github.com/influxdata/influxdb3_core 2. Removes any docs or anything else that did not relate to this project 3. Reorganizes the Cargo.toml files to use the top level Cargo.toml to declare dependencies and versions to keep all crates in sync and sets all others to use `<dep>.workspace = true` unless it's an optional dependency 4. Set the top level Cargo.toml to point to the core crates as git dependencies With this any changes specific to Edge will be contained here, updating deps will be a PR over in `influxdata/influxdb3_core`, and we can prove out the viability for this model to use for IOx.	2024-02-29 16:21:41 -05:00
Michael Gattozzi	8fec1d636e	feat: Add write_lp partial write, name check, and precision (#24677 ) * feat: Add partial write and name check to write_lp This commit adds new behavior to the v3 write_lp http endpoint by implementing both partial writes and checking the db name for validity. It also sets the partial write behavior as the default now, whereas before we would reject the entire request if one line was incorrect. Users who do actually want that behavior can now opt in by putting 'accept_partial=false' into the url of the request. We also check that the db name used in the request contains only numbers, letters, underscores and hyphens and that it must start with either a number or letter. We also introduce a more standardized way to return errors to the user as JSON that we can expand over time to give actionable error messages to the user that they can use to fix their requests. Finally tests have been included to mock out and test the behavior for all of the above so that changes to the error messages are reflected in tests, that both partial and not partial writes work as expected, and that invalid db names are rejected without writing. * feat: Add precision to write_lp http endpoint This commit adds the ability to control the precision of the time stamp passed in to the endpoint. For example if a user chooses 'second' and the timestamp 20 that will be 20 seconds past the Unix Epoch. If they choose 'millisecond' instead it will be 20 milliseconds past the Epoch. Up to this point we assumed that all data passed in was of nanosecond precision. The data is still stored in the database as nanoseconds. Instead upon receiving the data we convert it to nanoseconds. If the precision URL parameter is not specified we default to auto and take a best effort guess at what the user wanted based on the order of magnitude of the data passed in. This change will allow users finer grained control over what precision they want to use for their data as well as trying our best to make a good user experience and having things work as expected and not creating a failure mode whereby a user wanted seconds and instead put in nanoseconds by default.	2024-02-27 11:57:10 -05:00
Trevor Hilton	298055e9fb	feat: support FlightSQL in 3.0 (#24678 ) * feat: support FlightSQL by serving gRPC requests on same port as HTTP This commit adds support for FlightSQL queries via gRPC to the influxdb3 service. It does so by ensuring the QueryExecutor implements the QueryNamespaceProvider trait, and the underlying QueryDatabase implements QueryNamespace. Satisfying those requirements allows the construction of a FlightServiceServer from the service_grpc_flight crate. The FlightServiceServer is a gRPC server that can be served via tonic at the API surface; however, enabling this required some tower::Service wrangling. The influxdb3_server/src/server.rs module was introduced to house this code. The objective is to serve both gRPC (via the newly introduced tonic server) and standard REST HTTP requests (via the existing HTTP server) on the same port. This is accomplished by the HybridService which can handle either gRPC or non-gRPC HTTP requests. The HybridService is wrapped in a HybridMakeService which allows us to serve it via hyper::Server on a single bind address. End-to-end tests were added in influxdb3/tests/flight.rs. These cover some basic FlightSQL cases. A common.rs module was added that introduces some fixtures to aid in end-to-end tests in influxdb3.	2024-02-26 15:07:48 -05:00
dependabot[bot]	ada6561f4a	chore(deps): Bump serde_json from 1.0.113 to 1.0.114 (#24687 )	2024-02-25 14:34:37 +00:00
dependabot[bot]	278ecbeb56	chore(deps): Bump serde from 1.0.196 to 1.0.197 (#24689 )	2024-02-25 14:26:15 +00:00
Michael Gattozzi	de102bc927	feat: Add All or Nothing Bearer token auth support (#24666 ) This commit adds basic authorization support to Edge. Up to this point we didn't need have authorization at all and so the server would receive and accept requests from anyone. This isn't exactly secure or ideal for a deployment and so we add a basic form of authentication. The way this works is that a user passes in a hex encoded sha256 hash of a given token to the '--bearer-token' flag of the serve command. When the server starts with this flag it will now check a header of the form 'Authorization: Bearer <token>' by making sure it is valid in the sense that it is not malformed and that when token is hashed it matches the value passed in on the command line. The request is denied with either a 400 Bad Request if the header is malformed or a 401 Unauthorized if the hash does not match or the header is missing. The user is provided a new subcommand of the form: 'influxdb3 create token <token>' where the output contains the command to run the server with and what the header should look like to make requests. I can see future work including multiple tokens and rotating between them or adding new ones to a live service, but for now this shall suffice. As part of the commit end-to-end tests are included to run the server and make requests against the HTTP API and to make sure that requests are denied for being unauthorized, accepted for having the right header, or denied for being malformed. Also as part of this commit a small fix is included for 'Accept: /' headers. We were not checking for them and if this header was included we were denying it instead of sending back the default payload return value.	2024-02-20 15:34:39 -05:00
Michael Gattozzi	b555ddf18b	feat: Add different output support to queries (#24616 ) This commit adds the ability to choose the output format of a query via the v3 api so that a user can choose, whether by Accept headers or the format url param, how the data will be returned to them. Prior to this commit the default was a pretty printed text format, but that instead has been changed to json as the default. There are multiple formats one can choose: 1. json 2. csv 3. pretty printed text 4. parquet I've tested each of these out and it works well. In particular the parquet output is exciting as users will be able to perform a query and receive back parquet data that they can then load into say a Python script or something else to work on and operate it. As we extend what data can be queried, as well as persisting it, what people will be able to do with Edge will be really cool and I'm interested to see how users will end up using this functionality in the future.	2024-02-12 12:04:05 -05:00
Michael Gattozzi	ff567cd33f	chore(deps): Update arrow and datafusion to 49.0.0 (#24605 ) * chore(deps): Update arrow and datafusion to 49.0.0 This commit copies in our dependency code from influxdb_iox in order for us to be able to upgrade from a forked version of 46.0.0 to 49.0.0 of both arrow and datafusion. Most of the important changes were around how we consumed the crates in influxdb3(_server/_write). Those diffs are particularly worth looking at as the rest was a straight copy and we don't touch those crates in our development currently for influxdb3 edge. * fix: regenerate workspace hack crate * fix: Protobuf issues with incompatibility labels * fix: Broken CI yaml * fix: buf version * fix: Only check IOx repo * fix: Remove protobuf lint * fix: Comment out call to protobuf-lint	2024-01-31 19:18:51 -05:00
Michael Gattozzi	8ee13bca48	fix: Failing CI on main (#24562 ) * fix: build, upgrade rustc, and deps This commit upgrades Rust to 1.75.0, the latest release. We also upgraded our dependencies to stay up to date and to clear out any uneeded deps from the lockfile. In order to make sure everything works this also fixes the build by upgrading the workspace-hack crate using cargo hikari and removing the `workspace.lint` that was in influxdb3_write that didn't need to be there, probably from a merge issue. With this we can build influxdb3 as our default on main, but this alone is not enough to fix CI and will be addressed in future commits. * fix: warnings for influxdb3 build This commit fixes the warnings emitted by `cargo build` when compiling influxdb3. Mainly it adds needed lifetimes and removes uneccesary imports and functions calls. * fix: all of the clippy lints This for the most part just applies suggested fixes by clippy with a few exceptions: - Generated type crates had additional allows added since we can't control what code gets made - Things that couldn't be automatically fixed were done so manually in particular adding a Send bound for traits that created a Future that should be Send We also had to fix a build issue by adding a feature for tokio-compat due to the upgrade of deps. The workspace crate was updated accordingly. * fix: failing test due to rust panic message change Inbetween rustc 1.72 and rustc 1.75 the way that error messages were displayed when panicing changed. One of our tests depended on the output of that behavior and this commit updates the error message to the new form so that tests will pass. * fix: broken cargo doc link * fix: cargo formatting run * fix: add workspace-hack to influxdb3 crates This was the last change needed to make sure that the workspace-hack crate CI lint would pass. * fix: remove tests that can not run anymore We removed iox code from this code base and as a result some tests cannot be run anymore and so this commit removes them from the code base so that we can get a green build.	2024-01-09 15:11:35 -05:00
Paul Dix	5831cf8cee	feat: Add basic Edge server structure (#24552 ) * WIP: basic influxdb3 command and http server * WIP: write lp, buffer, query out * WIP: test write & query on influxdb3_server, fix warnings * WIP: pull write buffer and catalog into separate crate * WIP: sketch out types used for write: buffer, wal, persister * WIP: remove a bunch of old IOx stuff and fmt	2024-01-08 11:50:59 -05:00

34 Commits (praveen/delete-wal-background)