influxdb

Commit Graph

Author	SHA1	Message	Date
Paul Dix	43877beb15	fix: query bugs with buffer (#25213 ) * fix: query bugs with buffer This fixes three different bugs with the buffer. First was that aggregations would fail because projection was pushed down to the in-buffer data that de-duplication needs to be called on. The test in influxdb3/tests/server/query.rs catches that. I also added a test in write_buffer/mod.rs to ensure that data is correctly queryable when combining with different states: only data in buffer, only data in parquet files, and data across both. This showed two bugs, one where the parquet data was being doubled up (parquet chunks were being created in write buffer mod and in queryable buffer. The second was that the timestamp min max on table buffer would panic if the buffer was empty. * refactor: PR feedback * fix: fix wal replay and buffer snapshot Fixes two problems uncovered by adding to the write_buffer/mod.rs test. Ensures we can replay wal data and that snapshots work properly with replayed data. * fix: run cargo update to fix audit	2024-08-07 16:00:17 -04:00
Paul Dix	3265960010	refactor: implement new wal and refactor write buffer (#25196 ) * feat: refactor WAL and WriteBuffer There is a ton going on here, but here are the high level things. This implements a new WAL, which is backed entirely by object store. It then updates the WriteBuffer to be able to work with how the new WAL works, which also required an update to how the Catalog is modified and persisted. The concept of Segments has been removed. Previously there was a separate WAL per segment of time. Instead, there is now a single WAL that all writes and updates flow into. Data within the write buffer is organized by Chunk(s) within tables, which is based on the timestamp of the row data. These are known as the Level0 files, which will be persisted as Parquet into object store. The default chunk duration for level 0 files is 10 minutes. The WAL is written as single files that get created at the configured WAL flush interval (1s by default). After a certain number of files have been created, the server will attempt to snapshot the WAL (default is to snapshot the first 600 files of the WAL after we have 900 total, i.e. snapshot 10 minutes of WAL data). The design goal with this is to persist 10 minute chunks of data that are no longer receiving writes, while clearing out old WAL files. This works if data getting written in around "now" with no more than 5 minutes of delay. If we continue to have delayed writes, a snapshot of all data will be forced in order to clear out the WAL and free up memory in the buffer. Overall, this structure of a single wal, with flushes and snapshots and chunks in the queryable buffer led to a simpler setup for the write buffer overall. I was able to clear out quite a bit of code related to the old segment organization. Fixes #25142 and fixes #25173 * refactor: address PR feedback * refactor: wal to replay and background flush on new * chore: remove stray println	2024-08-01 15:04:15 -04:00
Trevor Hilton	10dd22b6de	fix: last cache catalog configuration tracks explicit vs. non-explicit value columns (#25185 ) * fix: catalog support for last caches that accept new fields Last cache definitions in the catalog were augmented to either store an explicit set of column names (including time), or to accept new fields. This will allow these caches to be loaded properly on server restart such that all non-key columns are cached. * refactor: use tagged serialization for last cache values def This also updated the client code to accept the new structure in influxdb3_client. * test: add e2e tests to catch regressions in influxdb3_client * chore: cargo update for audit	2024-07-24 11:00:40 -04:00
Trevor Hilton	7752d03a79	feat: `last_caches` system table (#25166 ) Added a new system table, system.last_caches, to enable queries that display information about last caches in a database. You can query the table like so: SELECT * FROM system.last_caches Since queries are scoped to a database, this will only show last caches configured for the database being queried. Results look like so: +-------+----------------+----------------+---------------+-------+-----+ \| table \| name \| key_columns \| value_columns \| count \| ttl \| +-------+----------------+----------------+---------------+-------+-----+ \| mem \| mem_last_cache \| [host, region] \| [time, usage] \| 1 \| 60 \| +-------+----------------+----------------+---------------+-------+-----+ An end-to-end test was added to verify queries to the system.last_caches table.	2024-07-17 09:14:51 -04:00
Trevor Hilton	e8d9b02818	feat: `DELETE` last cache API (#25162 ) Adds an API for deleting last caches. - The API allows parameters to be passed in either the request URI query string, or in the body as JSON - Some additional error modes were handled, specifically, for better HTTP status code responses, e.g., invalid content type is now a 415, URL query string parsing errors are now 400 - An end-to-end test was added to check behaviour of the API	2024-07-16 10:57:48 -04:00
Trevor Hilton	56488592db	feat: API to create last caches (#25147 ) Closes #25096 - Adds a new HTTP API that allows the creation of a last cache, see the issue for details - An E2E test was added to check success/failure behaviour of the API - Adds the mime crate, for parsing request MIME types, but this is only used in the code I added - we may adopt it in other APIs / parts of the HTTP server in future PRs	2024-07-16 10:32:26 -04:00
Trevor Hilton	8fd50cefe1	chore: sync latest core (#25138 ) * chore: sync latest core * chore: clippy	2024-07-10 12:25:09 -04:00
Jean Arhancet	1fd355ed83	refactor: v1 recordbatch to json (#25085 ) * refactor: refactor serde json to use recordbatch * fix: cargo audit with cargo update * fix: add timestamp datatype * fix: add timestamp datatype * fix: apply feedbacks * fix: cargo audit with cargo update * fix: add timestamp datatype * fix: apply feedbacks * refactor: test data conversion	2024-07-05 09:21:40 -04:00
Jean Arhancet	b6718e59e3	feat: add csv influx v1 (#25030 ) * feat: add csv influx v1 * fix: clippy error * fix: cargo.lock * fix: apply feedbacks * test: add csv integration test * fix: cargo audit	2024-06-25 08:45:55 -04:00
Trevor Hilton	5cb7874b2c	feat: v3 write API with series key (#25066 ) Introduce the experimental series key feature to monolith, along with the new `/api/v3/write` API which accepts the new line protocol to write to tables containing a series key. Series key * The series key is supported in the `schema::Schema` type by the addition of a metadata entry that stores the series key members in their correct order. Writes that are received to `v3` tables must have the same series key for every single write. Series key columns are `NOT NULL` * Nullability of columns is enforced in the core `schema` crate based on a column's membership in the series key. So, when building a `schema::Schema` using `schema::SchemaBuilder`, the arrow `Field`s that are injected into the schema will have `nullable` set to false for columns that are part of the series key, as well as the `time` column. * The `NOT NULL` _constraint_, if you can call it that, is enforced in the buffer (see [here](https://github.com/influxdata/influxdb/pull/25066/files#diff-d70ef3dece149f3742ff6e164af17f6601c5a7818e31b0e3b27c3f83dcd7f199R102-R119)) by ensuring there are no gaps in data buffered for series key columns. Series key columns are still tags * Columns in the series key are annotated as tags in the arrow schema, which for now means that they are stored as Dictionaries. This was done to avoid having to support a new column type for series key columns. New write API * This PR introduces the new write API, `/api/v3/write`, which accepts the new `v3` line protocol. Currently, the only part of the new line protocol proposed in https://github.com/influxdata/influxdb/issues/24979 that is supported is the series key. New data types are not yet supported for fields. Split write paths * To support the existing write path alongside the new write path, a new module was set up to perform validation in the `influxdb3_write` crate (`write_buffer/validator.rs`). This re-uses the existing write validation logic, and replicates it with needed changes for the new API. I refactored the validation code to use a state machine over a series of nested function calls to help distinguish the fallible validation/update steps from the infallible conversion steps. * The code in that module could potentially be refactored to reduce code duplication.	2024-06-17 14:52:06 -04:00
Jean Arhancet	62d1c67b14	refactor: remove arrow_batchtes_to_json (#25046 ) * refactor: remove arrow_batchtes_to_json * test: query v3 json format	2024-06-10 15:25:12 -04:00
Trevor Hilton	faab7a0abc	fix: writes with incorrect schema should fail (#25022 ) * test: add reproducer for #25006 * fix: validate schema of lines in lp and return error for invalid fields	2024-05-29 09:48:50 -04:00
Trevor Hilton	220e1f4ec6	refactor: expose system tables by default in edge/pro (#25000 )	2024-05-17 12:39:08 -04:00
Trevor Hilton	0201febd52	feat: add the `system.queries` table (#24992 ) The system.queries table is now accessible, when queries are initiated in debug mode, which is not currently enabled via the HTTP API, therefore this is not yet accessible unless via the gRPC interface. The system.queries table lists all queries in the QueryLog on the QueryExecutorImpl.	2024-05-17 12:04:25 -04:00
Trevor Hilton	9354c22f2c	chore: remove _series_id (#24969 ) Removed the _series_id column that stored a SHA256 hash of the tag set for each write. Updated all test assertions that made reference to it. Corrected the limits on columns to un-account for the additional _series_id column.	2024-05-08 12:28:49 -04:00
Michael Gattozzi	2291ebeae7	feat: sort and dedupe on persist (#24870 ) When persisting parquet files we now will sort and dedupe on persist using the COMPACT operation implemented in IOx Query. Note that right now we don't choose any column to sort on and default to no column. This means that we dedupe and sort on whatever the default behavior is for the COMPACT operation. Future changes can figure out what columns to sort by when compacting the data.	2024-04-03 15:13:36 -04:00
Trevor Hilton	1982244e65	chore: update to latest core (#24876 ) * chore: update to latest core	2024-04-03 09:36:28 -04:00
Trevor Hilton	e0465843be	feat: `/ping` API to serve version and revision (#24864 ) * feat: /ping API to serve version The /ping API was added, which is served at GET and POST methods. The API responds with a JSON body containing the version and revision of the build. A new crate was added, influxdb3_process, which takes the process_info.rs module from the influxdb3 crate, and puts it in a separate crate so that other crates (influxdb3_server) can depend on it. This was needed in order to have access to the version and revision values, which are generated at build time, in the HTTP API code of influxdb3_server. A E2E test was added to check that /ping works. E2E TestServer can now have logs emitted using the TEST_LOG environment variable.	2024-04-01 16:57:10 -04:00
Trevor Hilton	8d49b5e776	fix: tests broken by recent merge (#24852 )	2024-03-28 15:41:49 -04:00
Trevor Hilton	7784749bca	feat: support v1 and v2 write APIs (#24793 ) feat: support v1 and v2 write APIs This adds support for two APIs: /write and /api/v2/write. These implement the v1 and v2 write APIs, respectively. In general, the difference between these and the new /api/v3/write_lp API is in the request parsing. We leverage the WriteRequestUnifier trait from influxdb3_core to handle parsing of v1 and v2 HTTP requests, to keep the error handling at that level consistent with distributed versions of InfluxDB 3.0. Specifically, we use the SingleTenantRequestUnifier implementation of the trait. Changes: - Addition of two new routes to the route_request method in influxdb3_server::http to serve /write and /api/v2/write requests. - Database name validation was updated to handle cases where retention policies may be passed in /write requests, and to also reject empty names. A unit test was added to verify the validate_db_name function. - HTTP request authorization in the router will extract the full Authorization header value, and store it in the request extensions; this is used in the write request parsing from the core iox_http crate to authorize write requests. - E2E tests to verify correct HTTP request parsing / response behaviour for both /write and /api/v2/write APIs - E2E tests to check that data sent in through /write and /api/v2/write can be queried back	2024-03-28 13:33:17 -04:00
Trevor Hilton	c79821b246	feat: add `_series_id` to tables on write (#24842 ) feat: add _series_id to tables on write New _series_id column is added to tables; this stores a 32 byte SHA256 hash of the tag set of a line of Line Protocol. The tag set is checked for sort order, then sorted if not already, before producing the hash. Unit tests were added to check hashing and sorting functions work. Tests that performed queries needed to be modified to account for the new _series_id column; in general, SELECT * queries were altered to use a select clause with specific column names. The Column limit was increased to 501 internally, to account for the new _series_id column, but the user-facing limit is still 500	2024-03-26 15:22:19 -04:00
Trevor Hilton	2febaff24b	feat: support query parameters (#24804 ) feat: support query parameters This adds support for parameters in the /api/v3/query_sql and /api/v3/query_influxql API The new parameter `params` is supported in the URL query string of a GET request, or in the JSON body of a POST request. Two new E2E tests were added to check successful GET/POST as well as error scenario when params are not provided for a query string that would expect them.	2024-03-23 10:41:00 -04:00
BiKangNing	67cce99df7	chore: fix some typos (#24803 ) Signed-off-by: depthlending <bikangning@outlook.com>	2024-03-22 09:32:37 -04:00
Trevor Hilton	84b85a9b1c	refactor: use `/query` for v1 query API endpoint (#24790 ) feat: handle v1 query API at /query and update tests	2024-03-20 08:26:28 -04:00
Trevor Hilton	1fe414c14b	feat: support v1 query API (#24746 ) feat: support the v1 query API This PR adds support for the `/api/v1/query` API, which is meant to serve the original InfluxDB v1 query API, to serve single statement `SELECT` and `SHOW` queries. The response, which is returned as JSON, can be chunked via the `chunked` and optional `chunk_size` parameters. An optional `epoch` parameter can be supplied to have `time` column timestamps converted to a UNIX epoch with the given precision. ## Buffering The response is buffered by default, but if the `chunked` parameter is not supplied, or is passed as `false`, then the entire query result will be buffered into memory before being returned in the response. This is how the original API behaves, so we are replicating that here. When `chunked` is passed as `true`, then the response will be a stream of chunks, where each chunk is a self-contained response, with the same structure as that of the non-chunked response. Chunks are split up by the provided `chunk_size`, or by series, i.e., measurement, which ever comes first. The default chunk size is 10,000 rows. Buffering is implemented with the `QueryResponseStream` and `ChunkBuffer` types, the former implements the `Stream` trait, which allows it to be streamed in the HTTP response directly with `hyper`'s `Body::wrap_stream`. The `QueryResponseStream` is a wrapper around the inner arrow `RecordBatchStream`, which buffers the streamed `RecordBatch`es according to the requested chunking parameters. ## Testing Two new E2E tests were added to test basic query functionality and chunking behaviour, respectively. In addition, some manual testing was done to verify that the InfluxDB Grafana plugin works with this API.	2024-03-15 13:38:15 -04:00
Michael Gattozzi	1f8e079579	fix: slow limits test (#24761 ) This commit re-enables the limits test after making a fix that has it run <1 second on my laptop vs the old behavior of >=30 seconds. It does so by constructing one single write_lp request to create 1995 tables rather than 1995 individual requests that make a table. This is far more efficient.	2024-03-12 15:51:23 -04:00
Trevor Hilton	6849576ce0	feat: support authenticating v1 APIs with p parameter (#24760 ) feat: support authenticating v1 APIs with p parameter The p URL query parameter can be used to authenticate requests to the /api/v1/query and /api/v1/write APIs A test was added to ensure this works	2024-03-12 11:42:59 -04:00
Trevor Hilton	c4d651fbd1	feat: implement `Authorizer` to authorize all HTTP requests (#24738 ) * feat: add `Authorizer` impls to authz REST and gRPC This adds two new Authorizer implementations to Edge: Default and AllOrNothing, which will provide the two auth options for Edge. Both gRPC requests and HTTP REST request will be authorized by the same Authorizer implementation. The SHA512 digest action was moved into the `Authorizer` impl. * feat: add `ServerBuilder` to construct `Server A builder was added to the Server in this commit, as part of an attempt to get the server creation to be more modular. * refactor: use test server fixture in auth e2e test Refactored the `auth` integration test in `influxdb3` to use the `TestServer` fixture; part of this involved extending the fixture to be configurable, so that the `TestServer` can be spun up with an auth token. * test: add test for authorized gRPC A new end-to-end test, auth_grpc, was added to check that authorization is working with the influxdb3 Flight service.	2024-03-08 14:18:17 -05:00
Trevor Hilton	fad681c06c	chore: gate `limits` test behind a feature flag (#24737 ) chore: gate limits test behind a feature flag	2024-03-06 14:37:38 -05:00
Michael Gattozzi	ce8c158956	feat: Change Bearer Auth Token to use random bits (#24733 ) This changes the 'influxdb3 create token' command so that it will just automatically generate a completely random base64 encoded token prepended with 'apiv3_' that is then fed into a Sha512 algorithm instead of Sha256. The user can no longer pass in a token to be turned into the proper output. This also changes the server code to handle the change to Sha512 as well. Closes #24704	2024-03-06 12:43:00 -05:00
Trevor Hilton	971676b498	test: add tests to check InfluxQL over Flight (#24732 ) test: add tests to check InfluxQL over Flight	2024-03-05 15:41:30 -05:00
Trevor Hilton	fb4f09d675	feat: support `SHOW RETENTION POLICIES` (#24729 ) feat: support SHOW RETENTION POLICIES Added support through the influxdb3 Query Executor to perform SHOW RETENTION POLICIES queries, both on a specific database as well as accross all databases. Test cases were added to check this functionality.	2024-03-05 15:40:58 -05:00
Michael Gattozzi	a5082ec432	feat: Add limits for InfluxDB Edge (#24703 ) This commit is the final piece for the write_lp endpoint. It adds limits to Edge such that: - There can only be 5 Databases - There can only be 500 Columns per Table - There can only be 2000 Tables across all Databases We do this by modifying the catalog code to error out whenever one of these limits would be exceeded before permanently modifying the schema. These are hard coded limits and cannot be configured by the user. Closes #24554	2024-03-04 10:24:33 -05:00
Trevor Hilton	f7892ebee5	feat: add the `api/v3/query_influxql` API (#24696 ) feat: add query_influxql api This PR adds support for the /api/v3/query_influxql API. This re-uses code from the existing query_sql API, but some refactoring was done to allow for code re-use between the two. The main change to the original code from the existing query_sql API was that the format is determined up front, in the event that the user provides some incorrect Accept header, so that the 400 BAD REQUEST is returned before performing the query. Support of several InfluxQL queries that previously required a bridge to be executed in 3.0 was added: SHOW MEASUREMENTS SHOW TAG KEYS SHOW TAG VALUES SHOW FIELD KEYS SHOW DATABASES Handling of qualified measurement names in SELECT queries (see below) This is accomplished with the newly added iox_query_influxql_rewrite crate, which provides the means to re-write an InfluxQL statement to strip out a database name and retention policy, if provided. Doing so allows the query_influxql API to have the database parameter optional, as it may be provided in the query string. Handling qualified measurement names in SELECT The implementation in this PR will inspect all measurements provided in a FROM clause and extract the database (DB) name and retention policy (RP) name (if not the default). If multiple DB/RP's are provided, an error is thrown. Testing E2E tests were added for performing basic queries against a running server on both the query_sql and query_influxql APIs. In addition, the test for query_influxql includes some of the InfluxQL-specific queries, e.g., SHOW MEASUREMENTS. Other Changes The influxdb3_client now has the api_v3_query_influxql method (and a basic test was added for this)	2024-03-01 12:27:38 -05:00
Michael Gattozzi	8fec1d636e	feat: Add write_lp partial write, name check, and precision (#24677 ) * feat: Add partial write and name check to write_lp This commit adds new behavior to the v3 write_lp http endpoint by implementing both partial writes and checking the db name for validity. It also sets the partial write behavior as the default now, whereas before we would reject the entire request if one line was incorrect. Users who do actually want that behavior can now opt in by putting 'accept_partial=false' into the url of the request. We also check that the db name used in the request contains only numbers, letters, underscores and hyphens and that it must start with either a number or letter. We also introduce a more standardized way to return errors to the user as JSON that we can expand over time to give actionable error messages to the user that they can use to fix their requests. Finally tests have been included to mock out and test the behavior for all of the above so that changes to the error messages are reflected in tests, that both partial and not partial writes work as expected, and that invalid db names are rejected without writing. * feat: Add precision to write_lp http endpoint This commit adds the ability to control the precision of the time stamp passed in to the endpoint. For example if a user chooses 'second' and the timestamp 20 that will be 20 seconds past the Unix Epoch. If they choose 'millisecond' instead it will be 20 milliseconds past the Epoch. Up to this point we assumed that all data passed in was of nanosecond precision. The data is still stored in the database as nanoseconds. Instead upon receiving the data we convert it to nanoseconds. If the precision URL parameter is not specified we default to auto and take a best effort guess at what the user wanted based on the order of magnitude of the data passed in. This change will allow users finer grained control over what precision they want to use for their data as well as trying our best to make a good user experience and having things work as expected and not creating a failure mode whereby a user wanted seconds and instead put in nanoseconds by default.	2024-02-27 11:57:10 -05:00
Trevor Hilton	298055e9fb	feat: support FlightSQL in 3.0 (#24678 ) * feat: support FlightSQL by serving gRPC requests on same port as HTTP This commit adds support for FlightSQL queries via gRPC to the influxdb3 service. It does so by ensuring the QueryExecutor implements the QueryNamespaceProvider trait, and the underlying QueryDatabase implements QueryNamespace. Satisfying those requirements allows the construction of a FlightServiceServer from the service_grpc_flight crate. The FlightServiceServer is a gRPC server that can be served via tonic at the API surface; however, enabling this required some tower::Service wrangling. The influxdb3_server/src/server.rs module was introduced to house this code. The objective is to serve both gRPC (via the newly introduced tonic server) and standard REST HTTP requests (via the existing HTTP server) on the same port. This is accomplished by the HybridService which can handle either gRPC or non-gRPC HTTP requests. The HybridService is wrapped in a HybridMakeService which allows us to serve it via hyper::Server on a single bind address. End-to-end tests were added in influxdb3/tests/flight.rs. These cover some basic FlightSQL cases. A common.rs module was added that introduces some fixtures to aid in end-to-end tests in influxdb3.	2024-02-26 15:07:48 -05:00
Michael Gattozzi	de102bc927	feat: Add All or Nothing Bearer token auth support (#24666 ) This commit adds basic authorization support to Edge. Up to this point we didn't need have authorization at all and so the server would receive and accept requests from anyone. This isn't exactly secure or ideal for a deployment and so we add a basic form of authentication. The way this works is that a user passes in a hex encoded sha256 hash of a given token to the '--bearer-token' flag of the serve command. When the server starts with this flag it will now check a header of the form 'Authorization: Bearer <token>' by making sure it is valid in the sense that it is not malformed and that when token is hashed it matches the value passed in on the command line. The request is denied with either a 400 Bad Request if the header is malformed or a 401 Unauthorized if the hash does not match or the header is missing. The user is provided a new subcommand of the form: 'influxdb3 create token <token>' where the output contains the command to run the server with and what the header should look like to make requests. I can see future work including multiple tokens and rotating between them or adding new ones to a live service, but for now this shall suffice. As part of the commit end-to-end tests are included to run the server and make requests against the HTTP API and to make sure that requests are denied for being unauthorized, accepted for having the right header, or denied for being malformed. Also as part of this commit a small fix is included for 'Accept: /' headers. We were not checking for them and if this header was included we were denying it instead of sending back the default payload return value.	2024-02-20 15:34:39 -05:00

1 2 3

137 Commits (pbarnett/update-script-to-dockerhub)