influxdb

Commit Graph

Author	SHA1	Message	Date
Trevor Hilton	7474c0b3b4	feat: add `system.parquet_files` table (#25225 ) This extends the system tables available with a new `parquet_files` table which will list the parquet files associated with a given table in a database. Queries to system.parquet_files must provide a table_name predicate to specify the table name of interest. The files are accessed through the QueryableBuffer. In addition, a test was added to check success and failure modes of the new system table query. Finally, the Persister trait had its associated error type removed. This was somewhat of a consequence of how I initially implemented this change, but I felt cleaned the code up a bit, so I kept it in the commit.	2024-08-08 08:46:26 -04:00
Paul Dix	43877beb15	fix: query bugs with buffer (#25213 ) * fix: query bugs with buffer This fixes three different bugs with the buffer. First was that aggregations would fail because projection was pushed down to the in-buffer data that de-duplication needs to be called on. The test in influxdb3/tests/server/query.rs catches that. I also added a test in write_buffer/mod.rs to ensure that data is correctly queryable when combining with different states: only data in buffer, only data in parquet files, and data across both. This showed two bugs, one where the parquet data was being doubled up (parquet chunks were being created in write buffer mod and in queryable buffer. The second was that the timestamp min max on table buffer would panic if the buffer was empty. * refactor: PR feedback * fix: fix wal replay and buffer snapshot Fixes two problems uncovered by adding to the write_buffer/mod.rs test. Ensures we can replay wal data and that snapshots work properly with replayed data. * fix: run cargo update to fix audit	2024-08-07 16:00:17 -04:00
Michael Gattozzi	29d3a28a9c	fix: make ParquetChunk fields and mod chunk pub (#25219 ) * fix: make ParquetChunk fields and mod chunk pub This doesn't affect anything in the OSS version, but these changes are needed for Pro as part of our compactor work. * fix: cargo deny failure	2024-08-06 15:07:14 -04:00
Paul Dix	2b8fc7b44e	refactor: Move Catalog into influxdb3_catalog crate (#25210 ) * refactor: Move Catalog into influxdb3_catalog crate This moves the catalog and its serialization logic into its own crate. This is a precursor to recording more catalog modifications into the WAL. Fixes #25204 * fix: cargo update * fix: add version = 2 to deny.toml * fix: update deny.toml * fix: add CCO to deny.toml	2024-08-02 16:04:12 -04:00
Paul Dix	3265960010	refactor: implement new wal and refactor write buffer (#25196 ) * feat: refactor WAL and WriteBuffer There is a ton going on here, but here are the high level things. This implements a new WAL, which is backed entirely by object store. It then updates the WriteBuffer to be able to work with how the new WAL works, which also required an update to how the Catalog is modified and persisted. The concept of Segments has been removed. Previously there was a separate WAL per segment of time. Instead, there is now a single WAL that all writes and updates flow into. Data within the write buffer is organized by Chunk(s) within tables, which is based on the timestamp of the row data. These are known as the Level0 files, which will be persisted as Parquet into object store. The default chunk duration for level 0 files is 10 minutes. The WAL is written as single files that get created at the configured WAL flush interval (1s by default). After a certain number of files have been created, the server will attempt to snapshot the WAL (default is to snapshot the first 600 files of the WAL after we have 900 total, i.e. snapshot 10 minutes of WAL data). The design goal with this is to persist 10 minute chunks of data that are no longer receiving writes, while clearing out old WAL files. This works if data getting written in around "now" with no more than 5 minutes of delay. If we continue to have delayed writes, a snapshot of all data will be forced in order to clear out the WAL and free up memory in the buffer. Overall, this structure of a single wal, with flushes and snapshots and chunks in the queryable buffer led to a simpler setup for the write buffer overall. I was able to clear out quite a bit of code related to the old segment organization. Fixes #25142 and fixes #25173 * refactor: address PR feedback * refactor: wal to replay and background flush on new * chore: remove stray println	2024-08-01 15:04:15 -04:00
Trevor Hilton	f472d9d241	chore: update object_store to 0.10.2 (#25195 )	2024-07-26 10:14:19 -04:00
Michael Gattozzi	05a8a7da43	chore: Upgrade to rustc 1.80 (#25193 ) This commit updates us to rustc 1.80. There are three significant changes here: 1. LazyLock and LazyCell have been stabilized meaning we can replace our usage of Lazy from the once_cell crate with the std lib versions 2. Lints were added to handle unknown cfg directives. `tokio_unstable` is affected by this and while we do have the flags in our .cargo/config.toml Cargo still output a lint for it so we supress that warning now in our Cargo.toml for the workspace 3. clippy now throws a new warning about priority levels for lints. It's quite frankly a thing that doesn't make sense to me and should be something cargo fixes, but here we are. Besides that it was a painless upgrade and now we're on the latest and greatest.	2024-07-25 11:38:18 -04:00
Trevor Hilton	10dd22b6de	fix: last cache catalog configuration tracks explicit vs. non-explicit value columns (#25185 ) * fix: catalog support for last caches that accept new fields Last cache definitions in the catalog were augmented to either store an explicit set of column names (including time), or to accept new fields. This will allow these caches to be loaded properly on server restart such that all non-key columns are cached. * refactor: use tagged serialization for last cache values def This also updated the client code to accept the new structure in influxdb3_client. * test: add e2e tests to catch regressions in influxdb3_client * chore: cargo update for audit	2024-07-24 11:00:40 -04:00
Trevor Hilton	56488592db	feat: API to create last caches (#25147 ) Closes #25096 - Adds a new HTTP API that allows the creation of a last cache, see the issue for details - An E2E test was added to check success/failure behaviour of the API - Adds the mime crate, for parsing request MIME types, but this is only used in the code I added - we may adopt it in other APIs / parts of the HTTP server in future PRs	2024-07-16 10:32:26 -04:00
Trevor Hilton	0279461738	feat: hook up last cache to query executor using DataFusion traits (#25143 ) * feat: impl datafusion traits on last cache Created a new module for the DataFusion table function implementations. The TableProvider impl for LastCache was moved there, and new code that implements the TableFunctionImpl trait to make the last cache queryable was also written. The LastCacheProvider and LastCache were augmented to make this work: - The provider stores an Arc<LastCache> instead of a LastCache - The LastCache uses interior mutability via an RwLock, to make the above possible. * feat: register last_cache UDTF on query context * refactor: make server accept listener instead of socket addr The server used to accept a socket address and bind it directly, returning error if the bind fails. This commit changes that so the ServerBuilder accepts a TcpListener. The behaviour is essentially the same, but this allows us to bind the address from tests when instantiating the server, so we can easily assign unused ports. Tests in the influxdb3_server were updated to exploit this in order to use port 0 auto assignment and stop flaky test failures. A new, failing, test was also added to that module for the last cache. * refactor: naive implementation of last cache key columns Committing here as the last cache is in a working state, but it is naively implemented as it just stores all key columns again (still with the hierarchy) * refactor: make the last cache work with the query executor * chore: fix my own feedback and appease clippy * refactor: remove lower lock in last cache * chore: cargo update * refactor: rename function * fix: broken doc comment	2024-07-16 10:10:47 -04:00
Trevor Hilton	8fd50cefe1	chore: sync latest core (#25138 ) * chore: sync latest core * chore: clippy	2024-07-10 12:25:09 -04:00
wayne	cd6734a7c4	chore: remove unused dependencies ioxd_common and test_helpers_end_to_end (#25134 ) Co-authored-by: Trevor Hilton <thilton@influxdata.com>	2024-07-10 09:17:54 -06:00
Trevor Hilton	53e5c5f5c5	feat: last cache implementation (#25109 ) * feat: base for last cache implementation Each last cache holds a ring buffer for each column in an index map, which preserves the insertion order for faster record batch production. The ring buffer uses a custom type to handle the different supported data types that we can have in the system. * feat: implement last cache provider LastCacheProvider is the API used to create last caches and write table batches to them. It uses a two-layer RwLock/HashMap: the first for the database, and the second layer for the table within the database. This allows for table-level locks when writing in buffered data, and only gets a database-level lock when creating a cache (and in future, when removing them as well). * test: APIs on write buffer and test for last cache Added basic APIs on the write buffer to access the last cache and then a test to the last_cache module to see that it works with a simple example * docs: add some doc comments to last_cache * chore: clippy * chore: one small comment on IndexMap * chore: clean up some stale comments * refactor: part of PR feedback Addressed three parts of PR feedback: 1. Remove double-lock on cache map 2. Re-order the get when writing to the cache to be outside the loop 3. Move the time check into the cache itself * refactor: nest cache by key columns This refactors the last cache to use a nested caching structure, where the key columns for a given cache are used to create a hierarchy of nested maps, terminating in the actual store for the values in the cache. Access to the cache is done via a set of predicates which can optionally specify the key column values at any level in the cache hierarchy to only gather record batches from children of that node in the cache. Some todos: - Need to handle the TTL - Need to move the TableProvider impl up to the LastCache type * refactor: TableProvider impl to LastCache This re-writes the datafusion TableProvider implementation on the correct type, i.e., the LastCache, and adds conversion from the filter Expr's to the Predicate type for the cache. * feat: support TTL in last cache Last caches will have expired entries walked when writes come in. * refactor: add panic when unexpected predicate used * refactor: small naming convention change * refactor: include keys in query results and no null keys Changed key columns so that they do not accept null values, i.e., rows that are pushed that are missing key column values will be ignored. When producing record batches for a cache, if not all key columns are used in the predicate, then this change makes it so that the non-predicate key columns are produced as columns in the outputted record batches. A test with a few cases showing this was added. * fix: last cache key column query output Ensure key columns in the last cache that are not included in the predicate are emitted in the RecordBatches as a column. Cleaned up and added comments to the new test. * chore: clippy and some un-needed code * fix: clean up some logic errors in last_cache * test: add tests for non default cache size and TTL Added two tests, as per commit title. Also moved the eviction process to a separate function so that it was not being done on every write to the cache, which could be expensive, and this ensures that entries are evicted regardless of whether writes are coming in or not. * test: add invalid predicate test cases to last_cache * test: last_cache with field key columns * test: last_cache uses series key for default keys * test: last_cache uses tag set as default keys * docs: add doc comments to last_cache * fix: logic error in last cache creation CacheAlreadyExists errors were only being based on the database and table names, and not including the cache names, which was not correct. * docs: add some comments to last cache create fn * feat: support null values in last cache This also adds explicit support for series key columns to distinguish them from normal tags in terms of nullability A test was added to check nulls work * fix: reset last cache last time when ttl evicts all data	2024-07-09 15:22:04 -04:00
Jean Arhancet	1fd355ed83	refactor: v1 recordbatch to json (#25085 ) * refactor: refactor serde json to use recordbatch * fix: cargo audit with cargo update * fix: add timestamp datatype * fix: add timestamp datatype * fix: apply feedbacks * fix: cargo audit with cargo update * fix: add timestamp datatype * fix: apply feedbacks * refactor: test data conversion	2024-07-05 09:21:40 -04:00
Lorrens Pantelis	8b6c2a3b3d	refactor: Replace use of `std::HashMap` with `hashbrown::HashMap` (#25094 ) * refactor: use hashbrown with entry_ref api * refactor: use hashbrown hashmap instead of std hashmap in places that would from the `entry_ref` API * chore: Cargo update to pass CI	2024-06-26 12:43:35 -04:00
Trevor Hilton	7cfaa6aeaf	chore: clean up log statements in query_executor (#25102 ) * chore: clean up log statements in query_executor There were several tracing statements that were making the log output for each query rather verbose. This reduces the amount of info! statements by converting them to debug!, and clarifies some of the logged messages. The type of query is also logged, i.e, "sql" vs. "influxql", which was not being done before. * refactor: switch back important log to info	2024-06-26 11:51:12 -04:00
Jean Arhancet	b6718e59e3	feat: add csv influx v1 (#25030 ) * feat: add csv influx v1 * fix: clippy error * fix: cargo.lock * fix: apply feedbacks * test: add csv integration test * fix: cargo audit	2024-06-25 08:45:55 -04:00
Trevor Hilton	5cb7874b2c	feat: v3 write API with series key (#25066 ) Introduce the experimental series key feature to monolith, along with the new `/api/v3/write` API which accepts the new line protocol to write to tables containing a series key. Series key * The series key is supported in the `schema::Schema` type by the addition of a metadata entry that stores the series key members in their correct order. Writes that are received to `v3` tables must have the same series key for every single write. Series key columns are `NOT NULL` * Nullability of columns is enforced in the core `schema` crate based on a column's membership in the series key. So, when building a `schema::Schema` using `schema::SchemaBuilder`, the arrow `Field`s that are injected into the schema will have `nullable` set to false for columns that are part of the series key, as well as the `time` column. * The `NOT NULL` _constraint_, if you can call it that, is enforced in the buffer (see [here](https://github.com/influxdata/influxdb/pull/25066/files#diff-d70ef3dece149f3742ff6e164af17f6601c5a7818e31b0e3b27c3f83dcd7f199R102-R119)) by ensuring there are no gaps in data buffered for series key columns. Series key columns are still tags * Columns in the series key are annotated as tags in the arrow schema, which for now means that they are stored as Dictionaries. This was done to avoid having to support a new column type for series key columns. New write API * This PR introduces the new write API, `/api/v3/write`, which accepts the new `v3` line protocol. Currently, the only part of the new line protocol proposed in https://github.com/influxdata/influxdb/issues/24979 that is supported is the series key. New data types are not yet supported for fields. Split write paths * To support the existing write path alongside the new write path, a new module was set up to perform validation in the `influxdb3_write` crate (`write_buffer/validator.rs`). This re-uses the existing write validation logic, and replicates it with needed changes for the new API. I refactored the validation code to use a state machine over a series of nested function calls to help distinguish the fallible validation/update steps from the infallible conversion steps. * The code in that module could potentially be refactored to reduce code duplication.	2024-06-17 14:52:06 -04:00
Trevor Hilton	039dea2264	refactor: add dedicated type for serializaing catalog tables (#25042 ) Remove reliance on data_types::ColumnType Introduce TableSnapshot for serializing table information in the catalog. Remove the columns BTree from the TableDefinition an use the schema directly. BTrees are still used to ensure column ordering when tables are created, or columns added to existing tables. The custom Deserialize impl on TableDefinition used to block duplicate column definitions in the serialized data. This preserves that bevaviour using serde_with and extends it to the other types in the catalog, namely InnerCatalog and DatabaseSchema. The serialization test for the catalog was extended to include multiple tables in a database and multiple columns spanning the range of available types in each table. Snapshot testing was introduced using the insta crate to check the serialized JSON form of the catalog, and help catch breaking changes when introducing features to the catalog. Added a test that verifies the no-duplicate key rules when deserializing the map components in the Catalog	2024-06-04 11:38:43 -04:00
Trevor Hilton	0201febd52	feat: add the `system.queries` table (#24992 ) The system.queries table is now accessible, when queries are initiated in debug mode, which is not currently enabled via the HTTP API, therefore this is not yet accessible unless via the gRPC interface. The system.queries table lists all queries in the QueryLog on the QueryExecutorImpl.	2024-05-17 12:04:25 -04:00
Trevor Hilton	adeb1a16e3	chore: sync latest core (#25005 )	2024-05-16 09:09:47 -04:00
Trevor Hilton	8f72bf06e1	chore: use latest `influxdb3_core` changes (#24982 ) Introduction of the `TokioDatafusionConfig` clap block for configuring the DataFusion runtime - this exposes many new `--datafusion-*` options on start, including `--datafusion-num-threads` To accommodate renaming of `QueryNamespaceProvider` to `QueryDatabase` in `influxdb3_core`, I renamed the `QueryDatabase` type to `Database`. Fixed tests that broke as a result of sync.	2024-05-13 12:33:50 -04:00
Trevor Hilton	9354c22f2c	chore: remove _series_id (#24969 ) Removed the _series_id column that stored a SHA256 hash of the tag set for each write. Updated all test assertions that made reference to it. Corrected the limits on columns to un-account for the additional _series_id column.	2024-05-08 12:28:49 -04:00
Trevor Hilton	09fe268419	chore: clean up heappy, pprof, and jemalloc (#24967 ) * chore: clean up heappy, pprof, and jemalloc Setup the use of jemalloc as default allocator using tikv-jemallocator crate instead of tikv-jemalloc-sys. Removed heappy and pprof, and also cleaned up all the mutually exclusive compiler flags for using heappy as the allocator. * chore: remove heappy from ci	2024-05-06 15:21:18 -04:00
Michael Gattozzi	c88cb5f093	feat: build binaries and Docker images in CI (#24751 ) For releases we need to have Docker images and binary images available for the user to actually run influxdb3. These CI changes will build the binaries on a release tag and the Docker image as well, test, sign, and publish them and make them available for download. Co-Authored-By: Brandon Pfeifer <bpfeifer@influxdata.com>	2024-05-03 16:39:42 -04:00
Michael Gattozzi	43368981c7	feat: implement parquet cache persistance (#24907 ) * feat: use concrete type for Persister Up to this point we'd been using a generic `Persister` trait, however, in practice even for tests we only use one singular type, the `PersisterImpl`. In order to share the `MemoryPool` between it and the upcoming `ParquetCache` we need it to be the concrete type. This simplifies the code to grok as well by removing uneeded generic bounds. * fix: new_with_partition_key fn name typo * feat: implement parquet cache persistance * fix: incorporate feedback and don't hold across await	2024-04-29 14:34:32 -04:00
Jure Bajic	db8c8d5cc4	feat: Add `with_params_from` method to clients query request builder (#24927 ) Closes #24812	2024-04-29 13:08:51 -04:00
Trevor Hilton	0d5b591ec9	chore: point at latest core (#24937 ) Minor core update to bring in security updates and cargo optimizations from core.	2024-04-23 12:55:30 -04:00
Michael Gattozzi	2291ebeae7	feat: sort and dedupe on persist (#24870 ) When persisting parquet files we now will sort and dedupe on persist using the COMPACT operation implemented in IOx Query. Note that right now we don't choose any column to sort on and default to no column. This means that we dedupe and sort on whatever the default behavior is for the COMPACT operation. Future changes can figure out what columns to sort by when compacting the data.	2024-04-03 15:13:36 -04:00
Trevor Hilton	1982244e65	chore: update to latest core (#24876 ) * chore: update to latest core	2024-04-03 09:36:28 -04:00
Trevor Hilton	2dde602995	feat: report system stats in load generator (#24871 ) * feat: report system stats in load generator Added the mechanism to report system stats during load generation. The following stats are saved in a CSV file: - cpu_usage - disk_written_bytes - disk_read_bytes - memory - virtual_memory This only works when running the load generator against a local instance of influxdb3, i.e., one that is running on your machine. Generating system stats is done by passing the --system-stats flag to the load generator.	2024-04-02 17:16:17 -04:00
Trevor Hilton	e0465843be	feat: `/ping` API to serve version and revision (#24864 ) * feat: /ping API to serve version The /ping API was added, which is served at GET and POST methods. The API responds with a JSON body containing the version and revision of the build. A new crate was added, influxdb3_process, which takes the process_info.rs module from the influxdb3 crate, and puts it in a separate crate so that other crates (influxdb3_server) can depend on it. This was needed in order to have access to the version and revision values, which are generated at build time, in the HTTP API code of influxdb3_server. A E2E test was added to check that /ping works. E2E TestServer can now have logs emitted using the TEST_LOG environment variable.	2024-04-01 16:57:10 -04:00
Trevor Hilton	b55bfba475	feat: initial query load generator (#24854 ) Implement the query load generator. The design follows that of the existing write load generator. A QuerySpec is defined that will be used by the query command to generate a set of queriers to perform queries against a running server in parallel.	2024-03-29 14:58:03 -04:00
Trevor Hilton	7784749bca	feat: support v1 and v2 write APIs (#24793 ) feat: support v1 and v2 write APIs This adds support for two APIs: /write and /api/v2/write. These implement the v1 and v2 write APIs, respectively. In general, the difference between these and the new /api/v3/write_lp API is in the request parsing. We leverage the WriteRequestUnifier trait from influxdb3_core to handle parsing of v1 and v2 HTTP requests, to keep the error handling at that level consistent with distributed versions of InfluxDB 3.0. Specifically, we use the SingleTenantRequestUnifier implementation of the trait. Changes: - Addition of two new routes to the route_request method in influxdb3_server::http to serve /write and /api/v2/write requests. - Database name validation was updated to handle cases where retention policies may be passed in /write requests, and to also reject empty names. A unit test was added to verify the validate_db_name function. - HTTP request authorization in the router will extract the full Authorization header value, and store it in the request extensions; this is used in the write request parsing from the core iox_http crate to authorize write requests. - E2E tests to verify correct HTTP request parsing / response behaviour for both /write and /api/v2/write APIs - E2E tests to check that data sent in through /write and /api/v2/write can be queried back	2024-03-28 13:33:17 -04:00
Trevor Hilton	c79821b246	feat: add `_series_id` to tables on write (#24842 ) feat: add _series_id to tables on write New _series_id column is added to tables; this stores a 32 byte SHA256 hash of the tag set of a line of Line Protocol. The tag set is checked for sort order, then sorted if not already, before producing the hash. Unit tests were added to check hashing and sorting functions work. Tests that performed queries needed to be modified to account for the new _series_id column; in general, SELECT * queries were altered to use a select clause with specific column names. The Column limit was increased to 501 internally, to account for the new _series_id column, but the user-facing limit is still 500	2024-03-26 15:22:19 -04:00
Paul Dix	1827866d00	feat: initial load generator implementation (#24808 ) * feat: initial load generator implementation This adds a load generator as a new crate. Initially it only generates write load, but the scaffolding is there to add a query load generator to complement the write load tool. This could have been added as a subcommand to the influxdb3 program, but I thought it best to have it separate for now. It's fairly light on tests and error handling given its an internal tooling CLI. I've added only something very basic to test the line protocol generation and run the actual write command by hand. I included pretty detailed instructions and some runnable examples. * refactor: address PR feedback	2024-03-25 08:26:24 -04:00
Trevor Hilton	4f3288b4c4	feat: support query parameters in the `influxdb3_client` (#24806 ) feat: add query parameter support to influxdb3 client This adds the ability to use parameterized queries in the influxdb3_client crate when calling the /api/v3/query_sql and /api/v3/query_influxql APIs. The QueryRequestBuilder now has two new methods: with_param and with_try_param, that allow binding of parameters to a query being made. Tests were added in influxdb3_client to verify their usage with both sql and influxql query APIs.	2024-03-23 11:06:08 -04:00
Trevor Hilton	caae9ca9f2	chore: `influxdb3_core` update (#24798 ) chore: sync in latest core changes	2024-03-21 10:29:56 -04:00
Trevor Hilton	1fe414c14b	feat: support v1 query API (#24746 ) feat: support the v1 query API This PR adds support for the `/api/v1/query` API, which is meant to serve the original InfluxDB v1 query API, to serve single statement `SELECT` and `SHOW` queries. The response, which is returned as JSON, can be chunked via the `chunked` and optional `chunk_size` parameters. An optional `epoch` parameter can be supplied to have `time` column timestamps converted to a UNIX epoch with the given precision. ## Buffering The response is buffered by default, but if the `chunked` parameter is not supplied, or is passed as `false`, then the entire query result will be buffered into memory before being returned in the response. This is how the original API behaves, so we are replicating that here. When `chunked` is passed as `true`, then the response will be a stream of chunks, where each chunk is a self-contained response, with the same structure as that of the non-chunked response. Chunks are split up by the provided `chunk_size`, or by series, i.e., measurement, which ever comes first. The default chunk size is 10,000 rows. Buffering is implemented with the `QueryResponseStream` and `ChunkBuffer` types, the former implements the `Stream` trait, which allows it to be streamed in the HTTP response directly with `hyper`'s `Body::wrap_stream`. The `QueryResponseStream` is a wrapper around the inner arrow `RecordBatchStream`, which buffers the streamed `RecordBatch`es according to the requested chunking parameters. ## Testing Two new E2E tests were added to test basic query functionality and chunking behaviour, respectively. In addition, some manual testing was done to verify that the InfluxDB Grafana plugin works with this API.	2024-03-15 13:38:15 -04:00
Paul Dix	01d33f69b5	feat: wire up query from parquet files (#24749 ) * feat: wire up query from parquet files This adds the functionality to query from Parquet files that have been persisted in object storage. Any segments that are loaded up on boot up will be included (limit of 1k segments at the time of this PR). In a follow on PR we should add a good end-to-end test that has persistence and query through the main API (might be tricky). * Move BufferChunk and ParquetChunk into chunk module * Add object_store_url to Persister * Register object_store on server startup * Add loaded persisted_segments to SegmentState * refactor: PR feedback	2024-03-12 09:47:32 -04:00
Paul Dix	bf931970d3	feat: Segment the write buffer on time (#24745 ) * Split WriteBuffer into segments * Add SegmentRange and SegmentDuration * Update WAL to store SegmentRange and to be able to open up multiple ranges * Remove Partitioner and PartitionBuffer * Update SegmentState and loader * Update SegmentState with current, next and outside * Update loader and tests to load up current, next and previous outside segments based on the passed in time and desired segment duration * Update WriteBufferImpl and Flusher * Update the flusher to flush to multiple segments * Update WriteBufferImpl to split data into segments getting written to * Update HTTP and WriteBuffer to use TimeProvider * Wire up outside segment writes and loading * Data outside current and next no longer go to a single segment, but to a segment based on that data's time. Limits to 100 segments of time that can be written to at any given time. * Refactor SegmentDuration add config option * Refactors SegmentDuration to be a new type over duration * Adds the clap block configuration to pass SegmentDuration, defaulting to 1h * refactor: SegmentState and loader * remove the current_segment and next_segment from the loader and segment state, instead having just a collection of segments * open up only the current_segment by default * keep current and next segments open if they exist, while others go into persisting or persisted * fix: cargo audit * refactor: fixup PR feedback	2024-03-11 13:54:09 -04:00
Trevor Hilton	c4d651fbd1	feat: implement `Authorizer` to authorize all HTTP requests (#24738 ) * feat: add `Authorizer` impls to authz REST and gRPC This adds two new Authorizer implementations to Edge: Default and AllOrNothing, which will provide the two auth options for Edge. Both gRPC requests and HTTP REST request will be authorized by the same Authorizer implementation. The SHA512 digest action was moved into the `Authorizer` impl. * feat: add `ServerBuilder` to construct `Server A builder was added to the Server in this commit, as part of an attempt to get the server creation to be more modular. * refactor: use test server fixture in auth e2e test Refactored the `auth` integration test in `influxdb3` to use the `TestServer` fixture; part of this involved extending the fixture to be configurable, so that the `TestServer` can be spun up with an auth token. * test: add test for authorized gRPC A new end-to-end test, auth_grpc, was added to check that authorization is working with the influxdb3 Flight service.	2024-03-08 14:18:17 -05:00
Michael Gattozzi	ce8c158956	feat: Change Bearer Auth Token to use random bits (#24733 ) This changes the 'influxdb3 create token' command so that it will just automatically generate a completely random base64 encoded token prepended with 'apiv3_' that is then fed into a Sha512 algorithm instead of Sha256. The user can no longer pass in a token to be turned into the proper output. This also changes the server code to handle the change to Sha512 as well. Closes #24704	2024-03-06 12:43:00 -05:00
Trevor Hilton	971676b498	test: add tests to check InfluxQL over Flight (#24732 ) test: add tests to check InfluxQL over Flight	2024-03-05 15:41:30 -05:00
Trevor Hilton	fb4f09d675	feat: support `SHOW RETENTION POLICIES` (#24729 ) feat: support SHOW RETENTION POLICIES Added support through the influxdb3 Query Executor to perform SHOW RETENTION POLICIES queries, both on a specific database as well as accross all databases. Test cases were added to check this functionality.	2024-03-05 15:40:58 -05:00
Trevor Hilton	423308dcd4	feat: extend InfluxQL rewriter for SELECT and EXPLAIN (#24726 ) Extended the InfluxQL rewriter to handle SELECT statements with nested sub-queries, as well as EXPLAIN statements. Tests were added to check all the rewrite cases for happy path and failure modes.	2024-03-05 15:40:16 -05:00
Trevor Hilton	f7892ebee5	feat: add the `api/v3/query_influxql` API (#24696 ) feat: add query_influxql api This PR adds support for the /api/v3/query_influxql API. This re-uses code from the existing query_sql API, but some refactoring was done to allow for code re-use between the two. The main change to the original code from the existing query_sql API was that the format is determined up front, in the event that the user provides some incorrect Accept header, so that the 400 BAD REQUEST is returned before performing the query. Support of several InfluxQL queries that previously required a bridge to be executed in 3.0 was added: SHOW MEASUREMENTS SHOW TAG KEYS SHOW TAG VALUES SHOW FIELD KEYS SHOW DATABASES Handling of qualified measurement names in SELECT queries (see below) This is accomplished with the newly added iox_query_influxql_rewrite crate, which provides the means to re-write an InfluxQL statement to strip out a database name and retention policy, if provided. Doing so allows the query_influxql API to have the database parameter optional, as it may be provided in the query string. Handling qualified measurement names in SELECT The implementation in this PR will inspect all measurements provided in a FROM clause and extract the database (DB) name and retention policy (RP) name (if not the default). If multiple DB/RP's are provided, an error is thrown. Testing E2E tests were added for performing basic queries against a running server on both the query_sql and query_influxql APIs. In addition, the test for query_influxql includes some of the InfluxQL-specific queries, e.g., SHOW MEASUREMENTS. Other Changes The influxdb3_client now has the api_v3_query_influxql method (and a basic test was added for this)	2024-03-01 12:27:38 -05:00
Michael Gattozzi	73e261c021	feat: Split out shared core crates from Edge (#24714 ) This commit is a major refactor for the code base. It mainly does four things: 1. Splits code shared between the internal IOx repository and this one into it's own repo over at https://github.com/influxdata/influxdb3_core 2. Removes any docs or anything else that did not relate to this project 3. Reorganizes the Cargo.toml files to use the top level Cargo.toml to declare dependencies and versions to keep all crates in sync and sets all others to use `<dep>.workspace = true` unless it's an optional dependency 4. Set the top level Cargo.toml to point to the core crates as git dependencies With this any changes specific to Edge will be contained here, updating deps will be a PR over in `influxdata/influxdb3_core`, and we can prove out the viability for this model to use for IOx.	2024-02-29 16:21:41 -05:00
Paul Dix	2da5803bfd	feat: implement loader for persisted state (#24705 ) * fix: persister loading with no segments Fixes a bug where the persister would throw an error if attempting to load segments when none had been persisted. Moved persister tests into tests block. * feat: implement loader for persisted state This implements a loader for the write buffer. It loads the catalog and the buffer from the WAL. Move Persister errors into their own type now that the write buffer load could return errors from the persister. This doesn't yet rotate segments or trigger persistence of newly closed segments, which will be addressed in a future PR. * fix: cargo update to fix audit * refactor: add error type to persister trait * refactor: use generics instead of dyn --------- Co-authored-by: Trevor Hilton <thilton@influxdata.com>	2024-02-29 15:58:19 -05:00
Michael Gattozzi	8fec1d636e	feat: Add write_lp partial write, name check, and precision (#24677 ) * feat: Add partial write and name check to write_lp This commit adds new behavior to the v3 write_lp http endpoint by implementing both partial writes and checking the db name for validity. It also sets the partial write behavior as the default now, whereas before we would reject the entire request if one line was incorrect. Users who do actually want that behavior can now opt in by putting 'accept_partial=false' into the url of the request. We also check that the db name used in the request contains only numbers, letters, underscores and hyphens and that it must start with either a number or letter. We also introduce a more standardized way to return errors to the user as JSON that we can expand over time to give actionable error messages to the user that they can use to fix their requests. Finally tests have been included to mock out and test the behavior for all of the above so that changes to the error messages are reflected in tests, that both partial and not partial writes work as expected, and that invalid db names are rejected without writing. * feat: Add precision to write_lp http endpoint This commit adds the ability to control the precision of the time stamp passed in to the endpoint. For example if a user chooses 'second' and the timestamp 20 that will be 20 seconds past the Unix Epoch. If they choose 'millisecond' instead it will be 20 milliseconds past the Epoch. Up to this point we assumed that all data passed in was of nanosecond precision. The data is still stored in the database as nanoseconds. Instead upon receiving the data we convert it to nanoseconds. If the precision URL parameter is not specified we default to auto and take a best effort guess at what the user wanted based on the order of magnitude of the data passed in. This change will allow users finer grained control over what precision they want to use for their data as well as trying our best to make a good user experience and having things work as expected and not creating a failure mode whereby a user wanted seconds and instead put in nanoseconds by default.	2024-02-27 11:57:10 -05:00
Trevor Hilton	298055e9fb	feat: support FlightSQL in 3.0 (#24678 ) * feat: support FlightSQL by serving gRPC requests on same port as HTTP This commit adds support for FlightSQL queries via gRPC to the influxdb3 service. It does so by ensuring the QueryExecutor implements the QueryNamespaceProvider trait, and the underlying QueryDatabase implements QueryNamespace. Satisfying those requirements allows the construction of a FlightServiceServer from the service_grpc_flight crate. The FlightServiceServer is a gRPC server that can be served via tonic at the API surface; however, enabling this required some tower::Service wrangling. The influxdb3_server/src/server.rs module was introduced to house this code. The objective is to serve both gRPC (via the newly introduced tonic server) and standard REST HTTP requests (via the existing HTTP server) on the same port. This is accomplished by the HybridService which can handle either gRPC or non-gRPC HTTP requests. The HybridService is wrapped in a HybridMakeService which allows us to serve it via hyper::Server on a single bind address. End-to-end tests were added in influxdb3/tests/flight.rs. These cover some basic FlightSQL cases. A common.rs module was added that introduces some fixtures to aid in end-to-end tests in influxdb3.	2024-02-26 15:07:48 -05:00
dependabot[bot]	ada6561f4a	chore(deps): Bump serde_json from 1.0.113 to 1.0.114 (#24687 )	2024-02-25 14:34:37 +00:00
dependabot[bot]	fca7b702f0	chore(deps): Bump ring from 0.17.7 to 0.17.8 (#24684 )	2024-02-25 14:32:26 +00:00
dependabot[bot]	f67968c159	chore(deps): Bump insta from 1.34.0 to 1.35.1 (#24688 )	2024-02-25 14:27:40 +00:00
dependabot[bot]	278ecbeb56	chore(deps): Bump serde from 1.0.196 to 1.0.197 (#24689 )	2024-02-25 14:26:15 +00:00
dependabot[bot]	bc1e8fc15e	chore(deps): Bump unicode-normalization from 0.1.22 to 0.1.23 (#24690 )	2024-02-25 14:24:47 +00:00
dependabot[bot]	f817d63cf7	chore(deps): Bump ahash from 0.8.8 to 0.8.9 (#24692 )	2024-02-25 14:22:32 +00:00
dependabot[bot]	4b6f630387	chore(deps): Bump clap from 4.5.0 to 4.5.1 (#24691 )	2024-02-25 14:22:09 +00:00
Trevor Hilton	6ce3165aac	feat: add write and query CLI sub-commands (#24671 ) * feat: add query and write cli for influxdb3 Adds two new sub-commands to the influxdb3 CLI: - query: perform queries against the running server - write: perform writes against the running server Both share a common set of parameters for connecting to the database which are managed in influxdb3/src/commands/common.rs. Currently, query supports all underlying output formats, and can write the output to a file on disk. It only supports SQL as the query language, but will eventually also support InfluxQL. Write supports line protocol for input and expects the source of data to be from a file.	2024-02-20 16:14:19 -05:00
Michael Gattozzi	de102bc927	feat: Add All or Nothing Bearer token auth support (#24666 ) This commit adds basic authorization support to Edge. Up to this point we didn't need have authorization at all and so the server would receive and accept requests from anyone. This isn't exactly secure or ideal for a deployment and so we add a basic form of authentication. The way this works is that a user passes in a hex encoded sha256 hash of a given token to the '--bearer-token' flag of the serve command. When the server starts with this flag it will now check a header of the form 'Authorization: Bearer <token>' by making sure it is valid in the sense that it is not malformed and that when token is hashed it matches the value passed in on the command line. The request is denied with either a 400 Bad Request if the header is malformed or a 401 Unauthorized if the hash does not match or the header is missing. The user is provided a new subcommand of the form: 'influxdb3 create token <token>' where the output contains the command to run the server with and what the header should look like to make requests. I can see future work including multiple tokens and rotating between them or adding new ones to a live service, but for now this shall suffice. As part of the commit end-to-end tests are included to run the server and make requests against the HTTP API and to make sure that requests are denied for being unauthorized, accepted for having the right header, or denied for being malformed. Also as part of this commit a small fix is included for 'Accept: /' headers. We were not checking for them and if this header was included we were denying it instead of sending back the default payload return value.	2024-02-20 15:34:39 -05:00
Trevor Hilton	80505d2b42	feat: add the `influxdb3_client` crate (#24665 ) A new crate, influxdb3_client, was added, which provides the Client struct. This gives programmatic access to the influxdb3 HTTP API. Two primary methods are provided: - `api_v3_write_lp` - `api_v3_query_sql` Each API uses a builder approach to composing the request to be sent. Response handling was kept somewhat naive, in `write_lp` case not returning anything, and in `query_sql`, returning raw `Bytes`. We may improve this in future once the respective APIs have their responses more finalized. Both methods, as well as all associated types are documented with rustdocs. The general approach to these methods was to use a builder style API so that the user of the client can build their requests functionally before sending them to the server.	2024-02-16 15:02:16 -05:00
Paul Dix	3c5e5bf241	feat: Add segment persist of closed buffer segment (#24659 ) * feat: add catalog sequence tracking to OpenBufferSegment * feat: Add segment persist of closed buffer * refactor: pr review updates * refactor: PR updates	2024-02-14 10:55:09 -05:00
Paul Dix	4d9095e58d	feat: add segmenting and wal persistence to WriteBuffer (#24624 ) * refactor: move write buffer into its own dir * feat: implement write buffer segment with wal flushing This creates the WriteBufferFlusher and OpenBufferSegment. If a wal is passed into the buffer, data written into it will be persisted to the wal for the initialized segment id. * refactor: use crossbeam in flusher and pr cleanup	2024-02-12 12:36:10 -05:00
Michael Gattozzi	b555ddf18b	feat: Add different output support to queries (#24616 ) This commit adds the ability to choose the output format of a query via the v3 api so that a user can choose, whether by Accept headers or the format url param, how the data will be returned to them. Prior to this commit the default was a pretty printed text format, but that instead has been changed to json as the default. There are multiple formats one can choose: 1. json 2. csv 3. pretty printed text 4. parquet I've tested each of these out and it works well. In particular the parquet output is exciting as users will be able to perform a query and receive back parquet data that they can then load into say a Python script or something else to work on and operate it. As we extend what data can be queried, as well as persisting it, what people will be able to do with Edge will be really cool and I'm interested to see how users will end up using this functionality in the future.	2024-02-12 12:04:05 -05:00
Trevor Hilton	397ee6e73b	fix: add rust-analyzer to toolchain file (#24636 ) * fix: add rust-analyzer to toolchain file Added the rust-analyzer component to the rust-toolchain.toml file so that the correct version of rust-analyzer is installed on Apple Silicone. This will allow the LSP to work on Apple Silicone machines. * chore: update deps for cargo deny	2024-02-06 16:04:03 -05:00
Michael Gattozzi	ff567cd33f	chore(deps): Update arrow and datafusion to 49.0.0 (#24605 ) * chore(deps): Update arrow and datafusion to 49.0.0 This commit copies in our dependency code from influxdb_iox in order for us to be able to upgrade from a forked version of 46.0.0 to 49.0.0 of both arrow and datafusion. Most of the important changes were around how we consumed the crates in influxdb3(_server/_write). Those diffs are particularly worth looking at as the rest was a straight copy and we don't touch those crates in our development currently for influxdb3 edge. * fix: regenerate workspace hack crate * fix: Protobuf issues with incompatibility labels * fix: Broken CI yaml * fix: buf version * fix: Only check IOx repo * fix: Remove protobuf lint * fix: Comment out call to protobuf-lint	2024-01-31 19:18:51 -05:00
Michael Gattozzi	001a2a6653	feat: Implement Persister for PersisterImpl (#24588 ) * feat: Implement Catalog r/w for persister This commit implements reading and writing the Catalog to the object store. This was already stubbed out functionality, but it just needed an implementation. Saving it to the object store is pretty straight forward as it just serializes it to JSON and writes it to the object store. For loading, it finds the most recently added Catalog based on the file name and returns that from the object store in it's deserialized form and returned to the caller. This commit also adds some tests to make sure that the above functionality works as intended. * feat: Implement Segment r/w for persister This commit continues the work on the persister by implementing the persist_segment and load_segment functions for the persister. Much like the Catalog implementation, it's serialized to JSON before being persisted to the object store in persist_segment. This is pretty straightforward. For the loading though we need to find the most recent n segment files and so we need to list them and then return the most recent n. This is a little more complicated to do, but there are comments in the code to make it easier to grok. We also implement more tests to make sure that this part of the persister works as expected. * feat: Implement Parquet r/w to persister This commit does a few things: - First we add methods to the persister trait for reading and writing parquet files as these were not stubbed out in prior commits - Secondly we add a method to serialize a SendableRecordBatchStream into Parquet bytes - With these in place implementing the trait methods is pretty straightforward: hand a path in and a stream and get back some metadata about the file persisted and also get the bytes back if loading from the store Of course we also add more tests to make sure this all works as expected. Do note that this does nothing to make sure that we bound how much memory is used or if this is the most efficient way to write parquet files. This is mostly to get things working with the understanding that future refinement on the approach might be needed. * fix: Update smallvec for crate advisory * fix: Implement better filename handling * feat: Handle loading > 1000 Segment Info files	2024-01-25 14:31:57 -05:00
Michael Gattozzi	e13cc476bb	feat: Add paths module to influxdb3_write (#24579 ) This commit introduces 4 new types in the paths module for the influxdb3_write crate. They are: - ParquetFilePath - CatalogFilePath - SegmentInfoFilePath - SegmentWalFilePath Each of these corresponds to an object store path and for the WAL file an on disk path that we can use to address the needed files in a consistent way and not need to have path construction be duplicated to address these files. These types also Deref/AsRef to the object_store::path::Path type (or the std::path::Path type for the Wal) so that they can be used in places that expect the type such as various object_store/std::fs and so that we can use the underlying type's methods without needing to implement them for each type as they are just a thin wrapper around those types. This commit adds some tests to make sure that the path construction works as intended and also updates the `wal.rs` file to use the new `SegmentWalFilePath` instead of just a `PathBuf`. Closes: #24578	2024-01-19 10:57:54 -05:00
Paul Dix	02b4d28637	feat: add basic wal implementation for Edge (#24570 ) * feat: add basic wal implementation for Edge This WAL implementation uses some of the code from the wal crate, but departs pretty significantly from it in many ways. For now it uses simple JSON encoding for the serialized ops, but we may want to switch that to Protobuf at some point in the future. This version of the wal doesn't have its own buffering. That will be implemented higher up in the BufferImpl, which will use the wal and SegmentWriter to make data in the buffer durable. The write flow will be that writes will come into the buffer and validate/update against an in memory Catalog. Once validated, writes will get buffered up in memory and then flushed into the WAL periodically (likely every 10-20ms). After being flushed to the wal, the entire batch of writes will be put into the in memory queryable buffer. After that responses will be sent back to the clients. This should reduce the write lock pressure on the in-memory buffer considerably. In this PR: - Update the Wal, WalSegmentWriter, and WalSegmentReader traits to line up with new design/understanding - Implement wal (mainly just a way to identify segment files in a directory) - Implement WalSegmentWriter (write header, op batch with crc, and track sequence number in segment, re-open existing file) - Implement WalSegmentReader * refactor: make Wal return impl reader/writer * refactor: clean up wal segment open * fix: WriteBuffer and Wal usage Turn wal and write buffer references into a concrete type, rather than dyn. * fix: have wal loading ignore invalid files	2024-01-12 11:52:28 -05:00
Michael Gattozzi	8ee13bca48	fix: Failing CI on main (#24562 ) * fix: build, upgrade rustc, and deps This commit upgrades Rust to 1.75.0, the latest release. We also upgraded our dependencies to stay up to date and to clear out any uneeded deps from the lockfile. In order to make sure everything works this also fixes the build by upgrading the workspace-hack crate using cargo hikari and removing the `workspace.lint` that was in influxdb3_write that didn't need to be there, probably from a merge issue. With this we can build influxdb3 as our default on main, but this alone is not enough to fix CI and will be addressed in future commits. * fix: warnings for influxdb3 build This commit fixes the warnings emitted by `cargo build` when compiling influxdb3. Mainly it adds needed lifetimes and removes uneccesary imports and functions calls. * fix: all of the clippy lints This for the most part just applies suggested fixes by clippy with a few exceptions: - Generated type crates had additional allows added since we can't control what code gets made - Things that couldn't be automatically fixed were done so manually in particular adding a Send bound for traits that created a Future that should be Send We also had to fix a build issue by adding a feature for tokio-compat due to the upgrade of deps. The workspace crate was updated accordingly. * fix: failing test due to rust panic message change Inbetween rustc 1.72 and rustc 1.75 the way that error messages were displayed when panicing changed. One of our tests depended on the output of that behavior and this commit updates the error message to the new form so that tests will pass. * fix: broken cargo doc link * fix: cargo formatting run * fix: add workspace-hack to influxdb3 crates This was the last change needed to make sure that the workspace-hack crate CI lint would pass. * fix: remove tests that can not run anymore We removed iox code from this code base and as a result some tests cannot be run anymore and so this commit removes them from the code base so that we can get a green build.	2024-01-09 15:11:35 -05:00
Paul Dix	5831cf8cee	feat: Add basic Edge server structure (#24552 ) * WIP: basic influxdb3 command and http server * WIP: write lp, buffer, query out * WIP: test write & query on influxdb3_server, fix warnings * WIP: pull write buffer and catalog into separate crate * WIP: sketch out types used for write: buffer, wal, persister * WIP: remove a bunch of old IOx stuff and fmt	2024-01-08 11:50:59 -05:00
Joshua Powers	acfef87659	chore: Sync and release v1.0.1 of influxdb-line-protocol (#24527 ) * chore: Backport influxdb line protocol changes, release v1.0.1 * chore: Update influxdb_line_protocol to 2.0 --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2023-12-22 15:12:41 -05:00
dependabot[bot]	d34fc59217	chore(deps): Bump rustix from 0.38.8 to 0.38.19 (#24421 ) Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.38.8 to 0.38.19. - [Release notes](https://github.com/bytecodealliance/rustix/releases) - [Commits](https://github.com/bytecodealliance/rustix/compare/v0.38.8...v0.38.19) --- updated-dependencies: - dependency-name: rustix dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-19 16:24:15 -05:00
Paul Dix	cafe37bd1f	Merge branch 'pd/influxdb3-oss'	2023-09-21 09:15:41 -04:00
Dom	25f3147dc7	Merge branch 'main' into dependabot/cargo/tokio-util-0.7.9	2023-09-21 13:37:57 +01:00
dependabot[bot]	82382b9b3a	chore(deps): Bump insta from 1.31.0 to 1.32.0 Bumps [insta](https://github.com/mitsuhiko/insta) from 1.31.0 to 1.32.0. - [Changelog](https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md) - [Commits](https://github.com/mitsuhiko/insta/compare/1.31.0...1.32.0) --- updated-dependencies: - dependency-name: insta dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2023-09-21 10:10:26 +00:00
Dom	bd1b668dbb	Merge branch 'main' into dependabot/cargo/tokio-util-0.7.9	2023-09-21 11:04:42 +01:00
dependabot[bot]	37d37f3626	chore(deps): Bump smallvec from 1.11.0 to 1.11.1 Bumps [smallvec](https://github.com/servo/rust-smallvec) from 1.11.0 to 1.11.1. - [Release notes](https://github.com/servo/rust-smallvec/releases) - [Commits](https://github.com/servo/rust-smallvec/compare/v1.11.0...v1.11.1) --- updated-dependencies: - dependency-name: smallvec dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-09-21 02:02:29 +00:00
dependabot[bot]	661acc77f0	chore(deps): Bump tokio-util from 0.7.8 to 0.7.9 Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.8 to 0.7.9. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.8...tokio-util-0.7.9) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-09-21 02:01:19 +00:00
Carol (Nichols \|\| Goulding)	11f916eee1	refactor: Extract test helper functions to improve readability	2023-09-20 10:42:26 -04:00
Dom Dwyer	39768fa989	feat(router): init anti-entropy merkle search tree Adds initialisation code to the routers to instantiate an AntiEntropyActor, pre-populate the Merkle Search Tree during schema warmup, and maintain it at runtime.	2023-09-20 13:47:16 +02:00
Andrew Lamb	65d0ea2055	chore: Update DataFusion (#8765 ) * chore: Update DataFusion pin again * chore: update for different type * fix: statistics --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-19 22:26:53 +00:00
Dom Dwyer	33a441fbec	build: pick up latest merkle-search-tree version Pick up the improvements allowing construction of PageRangeSnapshots from owned keys / no cloning.	2023-09-19 14:09:07 +02:00
dependabot[bot]	b135cb8d23	chore(deps): Bump pbjson from 0.5.1 to 0.6.0 (#8755 ) Bumps [pbjson](https://github.com/influxdata/pbjson) from 0.5.1 to 0.6.0. - [Commits](https://github.com/influxdata/pbjson/commits) --- updated-dependencies: - dependency-name: pbjson dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-19 10:24:39 +00:00
dependabot[bot]	9123c6126d	chore(deps): Bump predicates from 3.0.3 to 3.0.4 (#8761 ) Bumps [predicates](https://github.com/assert-rs/predicates-rs) from 3.0.3 to 3.0.4. - [Changelog](https://github.com/assert-rs/predicates-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/assert-rs/predicates-rs/compare/v3.0.3...v3.0.4) --- updated-dependencies: - dependency-name: predicates dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-19 10:18:03 +00:00
Dom	500112bd47	Merge branch 'main' into dependabot/cargo/clap-4.4.4	2023-09-19 10:28:35 +01:00
Marco Neumann	949635b324	feat: use time-based column ranges in querier (#8732 ) Use output of #8725 within the column ranges of the querier. Currently this won't have any effect since the column ranges are only used to prune parquet files and parquet files come with their own, more precise time range (and that information has priority). However for #8705 we want to use it to prune partitions before needing to deal with the parquet files. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-19 09:13:50 +00:00
dependabot[bot]	38ea9a6cc8	chore(deps): Bump clap from 4.4.3 to 4.4.4 Bumps [clap](https://github.com/clap-rs/clap) from 4.4.3 to 4.4.4. - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v4.4.3...v4.4.4) --- updated-dependencies: - dependency-name: clap dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-09-18 18:15:43 +00:00
Andrew Lamb	58d892fcdf	chore: Update DataFusion pin (#8749 ) * chore: Update DataFusion pin and `chrono` * chore: Update for deprecation * chore: Update plans * fix: fix update logic in percentile * chore: update to avoid deprecated from_exprs api * fix: Update arrow pin, fix plan errors * test: for describe --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-18 18:11:23 +00:00
dependabot[bot]	1760fe7736	chore(deps): Bump chrono from 0.4.30 to 0.4.31 (#8752 ) * chore(deps): Bump chrono from 0.4.30 to 0.4.31 Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.30 to 0.4.31. - [Release notes](https://github.com/chronotope/chrono/releases) - [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md) - [Commits](https://github.com/chronotope/chrono/compare/v0.4.30...v0.4.31) --- updated-dependencies: - dependency-name: chrono dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * fix: chrono ts -> nanos can fail, fix deprecation warning --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-18 12:57:48 +00:00
Marco Neumann	012df69974	feat: i->q V2 circuit breaker (#8743 ) * feat: impl `PartialEq + Eq` for `TestError` * feat: i->q V2 circuit breaker This is a straight port from V1, it even uses the same test. The code is copied though (instead of reusing the old one) because the interface in the V2 client is so different and the new testing infra is also nicer (IMHO). For #8349. --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-18 08:58:56 +00:00
dependabot[bot]	eb80fea517	chore(deps): Bump mockito from 1.1.1 to 1.2.0 (#8751 ) Bumps [mockito](https://github.com/lipanski/mockito) from 1.1.1 to 1.2.0. - [Release notes](https://github.com/lipanski/mockito/releases) - [Commits](https://github.com/lipanski/mockito/compare/1.1.1...1.2.0) --- updated-dependencies: - dependency-name: mockito dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-18 07:55:49 +00:00
dependabot[bot]	8fddbc395b	chore(deps): Bump mockito from 1.1.0 to 1.1.1 (#8741 ) Bumps [mockito](https://github.com/lipanski/mockito) from 1.1.0 to 1.1.1. - [Release notes](https://github.com/lipanski/mockito/releases) - [Commits](https://github.com/lipanski/mockito/compare/1.1.0...1.1.1) --- updated-dependencies: - dependency-name: mockito dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-15 08:51:03 +00:00
Dom Dwyer	1bb4c08067	perf: use half of logical cores for persist exec Changes the default ingester configuration to assign half the logical cores to datafusion for persist execution. Prior to this commit, datafusion always used 4 threads by default. In situations where the ingesters are configured with 4 logical cores or less, the periodic persist can start enough persist jobs to keep the 4 threads assigned to datafusion busy. Because there are enough threads to saturate all CPU cores, these CPU-heavy persist threads can impact write latency by stealing CPU time from the tokio runtime threads. This change assigns exactly half the threads to DF by default, ensuring there's always N/2 cores to service I/O heavy API requests.	2023-09-14 17:54:33 +02:00
dependabot[bot]	0d51a1ca6f	chore(deps): Bump serde_json from 1.0.106 to 1.0.107 (#8731 ) Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.106 to 1.0.107. - [Release notes](https://github.com/serde-rs/json/releases) - [Commits](https://github.com/serde-rs/json/compare/v1.0.106...v1.0.107) --- updated-dependencies: - dependency-name: serde_json dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-14 09:28:30 +00:00
dependabot[bot]	71315d8ab6	chore(deps): Bump toml from 0.7.8 to 0.8.0 (#8730 ) Bumps [toml](https://github.com/toml-rs/toml) from 0.7.8 to 0.8.0. - [Commits](https://github.com/toml-rs/toml/compare/toml-v0.7.8...toml-v0.8.0) --- updated-dependencies: - dependency-name: toml dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-14 09:22:18 +00:00
dependabot[bot]	82c45a798c	chore(deps): Bump libc from 0.2.147 to 0.2.148 (#8729 ) Bumps [libc](https://github.com/rust-lang/libc) from 0.2.147 to 0.2.148. - [Release notes](https://github.com/rust-lang/libc/releases) - [Commits](https://github.com/rust-lang/libc/compare/0.2.147...0.2.148) --- updated-dependencies: - dependency-name: libc dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-14 08:30:21 +00:00
dependabot[bot]	2477bdbbee	chore(deps): Bump clap from 4.4.2 to 4.4.3 (#8719 ) Bumps [clap](https://github.com/clap-rs/clap) from 4.4.2 to 4.4.3. - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v4.4.2...v4.4.3) --- updated-dependencies: - dependency-name: clap dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-13 09:17:52 +00:00
Andrew Lamb	ed2da2a831	Revert "chore: Update DataFusion pin (#8698 )" (#8714 ) This reverts commit `74c0851fc2`. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-11 17:19:04 +00:00
Andrew Lamb	74c0851fc2	chore: Update DataFusion pin (#8698 ) * chore: Update DataFusion pin * chore: Update for new API * fix: fix test * fix: only check error messages --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-11 13:54:24 +00:00

1 2 3 4 5 ...

2747 Commits (8966cfb3d3e436f224b4fa2523f91eb78d427f73)