influxdb

Commit Graph

Author	SHA1	Message	Date
dependabot[bot]	17af5fcbd1	chore(deps): Bump tokio-util from 0.7.0 to 0.7.1 (#4154 ) * chore(deps): Bump tokio-util from 0.7.0 to 0.7.1 Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.0 to 0.7.1. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.0...tokio-util-0.7.1) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-29 08:39:02 +00:00
Marco Neumann	ec88df6cb7	feat: cache namespace schemas (#4142 ) * feat: `TestColumn` for IOx tests * feat: cache namespace schemas In preparation to remove the querier sync loop completely. Ref #4123.	2022-03-29 08:19:21 +00:00
Nga Tran	80b7e9cce1	feat: delete fully processed tombstones & integration tests for find_and_compact (#4116 ) * feat: remove fully processed tombstones * test: first few tests * fix: delete SQL * fix: test how IN (...) works in PG * fix: test how IN (?) works in PG * fix: test how IN (?) works in PG * fix: dynamically add IN (?, ?, ...) * fix: dynamically add IN (?, ?, ...) & its dynamic values * fix: add argument directly in the SQL * test: more tests for catalog read and update functions * chore: move a subfunction to make it easier to read) * test: first test for find_can_compact but disabled due to bug * test: integration tests and a bug fix for find_and_compact * chore: cleanup * refactor: address review comments * fix: put 2 delete processed tombstones and tombstones in a transaction	2022-03-28 18:35:54 +00:00
Marco Neumann	fb186c6733	refactor: `QueryDatabaseProvider::db` should be async (#4143 ) This is required to fetch querier namespaces on demand. Ref #4123.	2022-03-28 11:18:20 +00:00
dependabot[bot]	4f9515ffba	chore(deps): Bump async-trait from 0.1.52 to 0.1.53 (#4141 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.52 to 0.1.53. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.52...0.1.53) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-03-28 08:55:24 +00:00
Marco Neumann	9886ff42cc	refactor: clean up querier public interface	2022-03-25 11:54:52 +01:00
Marco Neumann	942c3198f6	refactor: pass through backoff config to querier table	2022-03-25 11:18:45 +01:00
Andrew Lamb	5c69a3f43b	chore: Update deps: datafusion, arrow/arrow-flight/parquet to 11, zstd to 0.11 (#4119 ) * chore: update datafusion * chore(deps): Bump arrow from 10.0.0 to 11.0.0 Bumps [arrow](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0) --- updated-dependencies: - dependency-name: arrow dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore(deps): Bump arrow-flight from 10.0.0 to 11.0.0 Bumps [arrow-flight](https://github.com/apache/arrow-rs) from 10.0.0 to 11.0.0. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/apache/arrow-rs/compare/10.0.0...11.0.0) --- updated-dependencies: - dependency-name: arrow-flight dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * chore: update parquet to 11.0.0 * fix: error on create schema, test for same * fix: upgrade zstd * chore: Run cargo hakari tasks * fix: fix logical merge conflict * fix: hakari * fix: hakari * fix: update newly introduced dep Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-24 15:27:36 +00:00
Andrew Lamb	29b89aaca7	refactor: extract influxrpc, flight and testing gRPC out of influxdb_ioxd (#4106 ) * refactor: extract grpc service implementations out of influxdb_ioxd * chore: Run cargo hakari tasks * refactor: rename server_common to service_common * refactor: rename server_grpc_influxrpc to service_grpc_influxrpc * refactor: rename server_grpc_flight to service_grpc_flight * refactor: rename server_grpc_testing to service_grpc_testing * fix: Cargo.toml Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-03-23 20:14:45 +00:00
Marco Neumann	51da6dd7fa	feat: store sort key in NG metadata (#4110 ) The sort key is optional and currently only produced by `iox_tests`. Writing it within the ingester/compactor is tracked by #3968. The sort key is read by the querier (and this will be verified by the query tests and is required to merge #4103). Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-23 18:24:46 +00:00
Marco Neumann	89206e013c	test: run SOME query tests for querier (#4098 ) This includes some type changes to dispatch between OG and NG and allows some tests to be run against the NG querier. This only contains parquet files though, so it's somewhat a limited scope. For #3934.	2022-03-22 17:39:19 +00:00
Nga Tran	886f9dc8c1	feat: split compacted data into 2 compacted sets (#4088 ) * feat: split compacted data into 2 compacted sets * chore: clean up * refactor: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-22 13:28:32 +00:00
Andrew Lamb	b83b000590	chore: Update datafusion (#4071 ) * chore: update to datafusion 5936edc2a94d5fb20702a41eab2b80695961b9dc * chore: Update apis to match datafusion changes	2022-03-22 13:17:41 +00:00
Marco Neumann	c9908b260c	refactor: dyn-dispatch database in query subsystem (#4083 ) * refactor: dyn-dispatch database in query subsystem This is similar to #4080 but concerns the database itself. For #3934. * docs: improve wording Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-22 09:15:52 +00:00
Marco Neumann	55643945a1	refactor: `querier` w/o `db` (#4063 ) * feat: `TombstoneRepo::list_by_table` * feat: `ParquetFileRepo::list_by_table_not_to_delete` * refactor: `querier` w/o `db` Get the `querier` to work w/o relying on `db`. A few notes: - Testing is kinda shallow, we really need to get `query_tests` working w/ `querier` (see #3934). - We still run a sync loop for namespaces, tables and schemas. This will be a replaced by "update namespace incl. tables and schemas on demand". Note however that we cannot fetch single tables and schemas on demand at the moment, because DataFusion doesn't implement async schema inspection (only `scan` / "give me all the chunks" is async). I think that's OK for now and we can address this later. - There is NO cache for parquet files and tombstones at the moment. For correctness, they need to be fetched in a single transaction (or we need a kinda tricky sequence number / logical clock tracking) and I am not sure yet how this makes sense when we have the ingester data wired up and predicates pushed down to the catalog (see next point). So let's measure first and then decide on a caching strategy for this. - Predicates are currently NOT pushed down to the catalog. I'll need to figure out how to extract time range from generic DataFusion expressions to make that work (it's easier for InfluxRPC queries, but they are not tested at the moment, see first point). Sorry that this commit is kinda huge. I initially planned to only migrate the chunks away from `db` and leave the tables and schemas for a follow-up PR, but the DataFusion trait structure (chunks are bound to their tables) makes this kinda pointless. Closes #3974. * docs: explain what we're doing Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * docs: mention tracking issues * docs: explain what we're doing Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-03-21 16:58:00 +00:00
Marco Neumann	0779f81b6b	refactor: rework `TableCache (#4054 ) * feat: `TableRepo::get_by_namespace_and_name` * refactor: rework `TableCache` - dual cache that can also map table names to IDs - deal w/ missing tables w/o panics - set proper timeouts to missing data For #3974. * test: extend table cache tests	2022-03-21 13:40:06 +00:00
Marco Neumann	d1df95df87	refactor: dyn-dispatch chunks in query subsystem - this is what DataFusion is doing as well; it's also fast enough because the number of chunks in a query is not THAT massive (it's not like we are doing row-level dyn dispatching) - it simplifies abstracting over different databases - it allows us to drop our enum-based dispatching that we have for `DbChunk` and that we would also need for the querier (e.g. depending on if a chunk is backed by a parquet file or ingester data) - it likely speeds up compile times because the `query` is no longer contains massive amounts of generic code For #3934.	2022-03-21 12:47:54 +01:00
Marco Neumann	ca152e7934	refactor: avoid generics in `QueryDatabase` A step to make this trait object-safe. Ref #3934.	2022-03-21 10:45:05 +01:00
Marco Neumann	0071b85c22	refactor: make `ExecutionContextProvider` object-safe Ref #3934.	2022-03-21 10:40:53 +01:00
Marco Neumann	98c8475e3b	feat: add "dual" cache pattern (#4039 ) * feat: add "dual" cache pattern This will be useful for certain parts that are addressed internally via ID but where the user-facing APIs use names. For #3985. * refactor: rework "dual" cache construct to be backend based Pros: - easiser to reason about the locking and consistency, esp. in concurrent applications Cons: - we are not canceling running queries for the dual cache any longer Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-17 13:39:09 +00:00
Marco Neumann	0850a93f20	refactor: make `QueryDatabase::chunks` async (#4047 ) For OG we can determine the chunks w/o any IO, for NG however this might require a few catalog queries. This is likely not the last change of this sort, i.e. the whole schema handling is currently sync as well. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-17 12:55:25 +00:00
Marco Neumann	dc67570e1c	feat: `OptionalValueTtlProvider` (#4040 ) Quite a few caches will request data from the catalog w/o knowing if it exists (e.g. a table by name). We should have different TTLs for "exists" and "unknown" w/o writing much boilerplate code. For #3985. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-16 11:56:21 +00:00
Marco Neumann	31d6a7e6b3	fix: race condition in `Cache::set` (#4038 ) In theory on a multi-threaded tokio executor, the following could have happened: \| Thread 1 \| Thread 2 \| \| --------------------- \| ----------------------------------- \| \| \| Running query begin \| \| \| ... \| \| \| `loader.await` finished \| \| `Cache::set` begin \| \| \| state locked \| \| \| \| try state lock, blocking \| \| running query removed \| \| \| ... \| \| \| state unlocked \| \| \| `Cache::set` end \| \| \| \| state locked \| \| \| panic because running query is gone \| Another issue that could happen is if we: 1. issue a get request, loader takes a while, this results in task1 2. side-load data into the running query (task1 still running) 3. the underlying cache backend drops the result very quickly (task1 still running) 4. we request the same data again, resulting in yet another query task (task2), task1 is still running at this point In this case the original not-yet-finalized query task (task1) would remove the new query task (task2) from the active query set, even though task2 is actually not done. We fix this by the following measures: - task tagging: tasks are tagged so if two tasks for the same key are running, we can tell them apart - task->backend propagation: let the query task only write to the underlying backend if it is actually sure that it is running - prefer side-loaded results: restructure the query task to strongly prefer side-loaded data over whatever comes from the loader - async `Cache::set`: Let `Cache::set` wait until a running query task completes. This has NO correctness implications, it's probably just nicer for resource management. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-16 11:36:20 +00:00
Marco Neumann	1cbd878379	feat: do not attempt to store entries that will immediately expire (#4045 ) Let's keep the TTL cache as clean as possible. For #3985.	2022-03-16 11:16:46 +00:00
Dom Dwyer	5585dd3c21	refactor: switch to using DynObjectStore Changes all consumers of the object store to use the dynamically dispatched DynObjectStore type, instead of using a hardcoded concrete implementation type.	2022-03-15 16:32:52 +00:00
Dom Dwyer	1d5066c421	refactor: rename ObjectStore -> ObjectStoreImpl Frees up the name for so we can use `dyn ObjectStore` throughout the code instead of `ObjectStoreApi`.	2022-03-15 16:29:43 +00:00
Marco Neumann	97d595e4fb	feat: `Cache::set` (#4036 ) * feat: `Cache::set` This will be helpful to fill caches if we got the information from somewhere else. For #3985. * docs: improve Co-authored-by: Edd Robinson <me@edd.io> * docs: explain lock gap * feat: add debug log to `Cache` Co-authored-by: Edd Robinson <me@edd.io>	2022-03-15 12:12:26 +00:00
Marco Neumann	4b5cf6a70e	feat: cache processed tombstones (#4030 ) * test: allow to mock time in `iox_test` * feat: cache processed tombstones For #3974. * refactor: introduce `TTL_NOT_PROCESSED`	2022-03-15 10:28:08 +00:00
Marco Neumann	87e53f30d1	refactor: add TTL cache backend (#4027 ) * feat: `CacheBackend::as_any` * refactor: add TTL cache backend This is based on the new `AddressableHeap`, which simplifies the implementation quite a lot. For #3985. * refactor: `TtlBackend::{update->evict_expired}` * docs: exlain ttl cache eviction Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-14 15:35:27 +00:00
Marco Neumann	27efb66237	test: add proptest for `AddressableHeap` (#4025 ) * test: add proptest for `AddressableHeap` For #3985. * refactor: simplify code Co-authored-by: Edd Robinson <me@edd.io> Co-authored-by: Edd Robinson <me@edd.io>	2022-03-14 12:38:30 +00:00
Marco Neumann	632c4953b4	feat: add addressable heap for query cache (#4016 ) * feat: add addressable heap for query cache This will be used as a helper data structure for TTL and LRU. It's probably not the most performant implementation but it's good enough for now. This is for #3985. * fix: test + explain tie breaking in `AddressableHeap`	2022-03-14 09:35:38 +00:00
Marco Neumann	d46de98183	feat: extract "backend" from querier cache (#4015 ) * feat: extract "backend" from querier cache The backend will implement pruning policies like LRU and TTL as well as where/how the data is stored. Having a proper interface for that simplifies the implementation since we don't need to have one massive `Cache` object with a super complex mechanism. This is for #3985. * refactor: `Backend` -> `CacheBackend` Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-11 11:49:57 +00:00
Nga Tran	f03ebd79ab	refactor: move querier's test utils to a new crate to get reused by tests in other crates (#4013 ) * refactor: move querier's test utils to a new crate to be able resued by tests in other crates * chore: remove unused import	2022-03-10 18:17:58 +00:00
Paul Dix	27999ff72f	feat: add compaction_level and created_at to parquet_file (#3972 )	2022-03-10 15:56:57 +00:00
Andrew Lamb	2c3d30ca32	chore: Update datafusion, arrow, flight and parquet (#4000 ) * chore: Update datafusion, arrow, flight and parquet * fix: api change * fix: fmt * fix: update test metadata size * fix: Update sizes in parquet test * fix: more metadata size update	2022-03-10 12:24:47 +00:00
Marco Neumann	30d1c77d36	feat: querier test system, ground work (#3991 ) * feat: querier test system, ground work See #3985 for the motivation. This introduces a cache system for the querier which can later be extended to support the remaining features listed in #3985 (e.g. metrics, LRU/TTL). All current caches are wired up to go throw the new cache system. Once we move away from (ab)using `db`, the set of caches will be different but the system will remain. * test: explain it Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com> * refactor: simplify cache result broadcast * refactor: introduce `Loader` crate * fix: docs * docs: explain why we manually drop removed hashmap entries * docs: fix intra-doc link Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>	2022-03-10 11:27:24 +00:00
Nga Tran	c6cab3538f	refactor: move parquet chunk's new and decode to parquet_file crate (#3987 )	2022-03-08 22:04:32 +00:00
Marco Neumann	77f6153f72	refactor: remove `QueryDatabase::chunk_summaries` (#3977 ) - This is not used by the query engine at all. - The query engine should not care about ALL chunks but only about the chunks it gets via `QueryDatabase::chunks` (which includes a table name and a predicate). - All other users of that API are NOT really query-related.	2022-03-08 11:34:26 +00:00
Marco Neumann	5cc1c697fc	refactor: remove `QueryDatabase::partition_addr` (#3976 ) - This was not actually used by the query engine. - The query engine doesn't have a concept of a "partition", it only cares about chunks. - Unbound access to all partitions in the database is quite expensive (esp. on NG).	2022-03-08 11:17:31 +00:00
Marco Neumann	a3e952847a	refactor: split `querier::namespace` into submodules (#3975 ) This makes it easier to see what's required to support the query interface.	2022-03-08 10:49:00 +00:00
Marco Neumann	db3f1e8db7	feat: wire up tombstones into querier (#3962 ) * feat: `TombstoneRepo::list_by_namespace` * test: model sequencer properly * feat: wire up tombstones into querier Closes #3932. * refactor: `override_delete_predicates` => `set_delete_predicates`	2022-03-08 10:06:22 +00:00
Carol (Nichols \|\| Goulding)	9961efd702	feat: Send parquet and tombstone seq nums with ingester query response (#3925 ) Fixes #3867.	2022-03-04 15:22:29 +00:00
Marco Neumann	8d00aaba90	feat: sync chunks in querier (#3911 ) * feat: `ParquetFileRepo::list_by_namespace_not_to_delete` * feat: `ChunkAddr: Clone` * test: ensure that querier keeps same partition objects * test: improve `create_parquet_file` flexibility * feat: sync chunks in querier * test: improve `test_parquet_file`	2022-03-04 08:53:39 +00:00
Raphael Taylor-Davies	e304613546	feat: include trace ID in query log (#3912 ) (#3923 ) * feat: include trace ID in query log (#3912) * chore: fmt * chore: lint	2022-03-03 17:50:49 +00:00
Marco Neumann	bbeba73345	feat: sync partitions in querier (#3900 ) * feat: sync partitions in querier * docs: explain per-table partition grouping	2022-03-03 09:28:44 +00:00
kodiakhq[bot]	caba3e9fd2	Merge branch 'main' into cn/querier-flight-request	2022-03-02 20:30:00 +00:00
Edd Robinson	3d047073b9	feat: add tracing down to the chunk level (#3804 ) * refactor: wire exectution context to Deduplicator * feat: example trace to chunk read_filter * refactor: make execution context required * refactor: expose metadata API * refactor: more span context for chunk read_filter * refactor: fix build * refactor: push context into result stream * refactor: make executor optional	2022-03-02 19:08:22 +00:00
Carol (Nichols \|\| Goulding)	3f2a58b47f	refactor: pub use data_types from data_types2 So it's clearer which parts of data_types the NG design is using, and which types can be cleaned up eventually.	2022-03-02 13:55:31 -05:00
Carol (Nichols \|\| Goulding)	2a90841715	refactor: Move IngesterQueryRequest to data_types2 So that querier doesn't need to depend on ingester.	2022-03-02 13:52:13 -05:00
Carol (Nichols \|\| Goulding)	8f3e44bf76	refactor: Extract a crate for shared data types in the new design	2022-03-02 12:16:15 -05:00

1 2

65 Commits (17af5fcbd182cd95604aa9debea423915dc77562)