influxdb

Commit Graph

Author	SHA1	Message	Date
Marco Neumann	f34f99c5ed	refactor: port LRU cache backend to policy framework (#5406 ) * refactor: port LRU cache backend to policy framework Closes #5320. * test: extend `test_oversized_entries` Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-17 14:43:24 +00:00
Andrew Lamb	7f0ae53d6f	chore: Update to (almost) released object_store 0.4.0 (#5419 ) * chore: update object_store * chore: update hakari config * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-17 13:44:48 +00:00
Marco Neumann	49ab568ca8	refactor: convert `remove_if` feature to policy framework (#5398 ) * refactor: allow `ChangeRequest` to carry a lifetime Let's not restrict our change functions to `'static` because this would require us to clone loads of data to achieve predicate-based `remove_if`. * refactor: convert `remove_if` feature to policy framework Decided to drop the "shared" functionality. We only use the small `remove_if` bit which is way easier to reason about. For #5320. * refactor: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-16 08:23:27 +00:00
Marco Neumann	0ccefa0d0c	refactor: port TTL backend to policy framework (#5396 ) * refactor: port TTL backend to policy framework Note that this is "just" a port, it does NOT change how TTL works. This will be done in #5318. Helps with #5320. * fix: ensure inner backend is empty * test: add some smoke test	2022-08-15 16:48:16 +00:00
Carol (Nichols \|\| Goulding)	b982bdaf2f	fix: Derive Eq when we derive PartialEq and members can derive Eq Allow this in generated code that we don't control, though. Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq	2022-08-11 15:04:06 -04:00
Andrew Lamb	b834bc630c	chore: more readability improvements to sort keys (#5366 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-10 17:59:25 +00:00
Andrew Lamb	16ddc5efc6	chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360 ) * chore: Update datafusion and arrow * chore: Update Cargo.lock * chore: update to Decimal128 * chore: Update tonic/prost/pbjson/etc * chore: Run cargo hakari tasks * fix: doctest in generated types Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-09 17:30:44 +00:00
Andrew Lamb	172f893368	fix: fix logging typo in querier (#5345 ) * fix: fix logging typo * fix: fix type in typo fix ;(	2022-08-09 06:34:06 +00:00
Marco Neumann	cd0dc42b4a	refactor: use a single chunk filter/pruning step in querier (#5338 ) We already prune all chunks in the query-access layer. There's no need to do that another time (which is actually the first time) in `QuerierTable::chunks`. The time savings we get from feeding less chunks into the state reconciling should be negligible. On the pro-side however we get a more streamlined data flow and actually correct chunk pruning metrics. Also see #5336. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-08 12:55:14 +00:00
Marco Neumann	fc1870ff76	fix: chunk pruning stats (#5319 ) - emit a warning if we cannot even attempt to prune chunks due to an error. This is always either a missing feature or a bug (even though it does not impact correctness but _only_ performance). Also see https://github.com/influxdata/conductor/issues/1107 - change metrics to clearly differentiate between "could not prune" and "not pruned" - add new "not pruned" observer hook (this was missing for some reason, the "pruned" hook existed though)	2022-08-05 10:50:31 +00:00
Marco Neumann	0d714878ca	feat: chunk pruning metrics (#5273 ) * refactor: make could-not-prune reason a static string * refactor: introduce `QuerierTableArgs` * feat: chunk pruning metrics Closes #4974. * refactor: address review comments * refactor: use static typing for not-pruned reason * refactor: pass chunk to not-pruned observer and use it for some metrics Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-04 15:29:21 +00:00
Nga Tran	34ccc9c7f5	chore: Revert "chore: Revert "refactor: bump batch size (#5251 )" (#5288 )" (#5300 ) This reverts commit `471b8be92f`. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-04 13:19:46 +00:00
Marco Neumann	840e4801b8	feat: make querier RAM pool split a proper feature (#5283 ) * feat: make querier RAM pool split a proper feature - use propre pool names - expose sizing via CLI/env Closes https://github.com/influxdata/conductor/issues/1102. * refactor: improve naming and docs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-03 15:27:23 +00:00
Marco Neumann	663a20d743	refactor: remove `--ingster-address` (#5255 ) Closes #5002.	2022-08-03 15:05:01 +00:00
Nga Tran	471b8be92f	chore: Revert "refactor: bump batch size (#5251 )" (#5288 ) This reverts commit `bb172f8fa8`.	2022-08-03 14:23:45 +00:00
Marco Neumann	8e2443d879	feat: use two RAM pools in querier (#5271 ) Quick&Dirty implementation of a RAM-pool split to see if this has any effect. I expect the querier performance to improve due to this because large read buffers can no longer evict precious metadata. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-02 15:14:26 +00:00
Marco Neumann	ee491cbbfc	fix: re-enable querier read buffer cache (#5268 ) This reverts commit `82913743f1` / #5252. I misjudged the cache hit ratio for the RB, see https://github.com/influxdata/k8s-infra/pull/4548 So let's bring back the RB cache until we have some form of parquet cache in place.	2022-08-02 08:37:30 +00:00
Marco Neumann	a8f6d579c8	feat: add metric for predicate-based cache entry removal (#5257 )	2022-08-02 07:44:53 +00:00
Marco Neumann	fec6b18d80	feat: add metric for TTL cache expiration (#5256 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-02 07:00:30 +00:00
Marco Neumann	82913743f1	refactor: disable querier read buffer cache (#5252 ) Let's try and see how this performs in prod. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-01 15:43:22 +00:00
Marco Neumann	bb172f8fa8	refactor: bump batch size (#5251 ) This is what DataFusion uses by default and I don't see a reason why we should use such small batch sizes. The affect is probably only visible in certain filter-aggregate queries that don't focus on a single series (because there we likely end up with 1 or 2 batches only, esp. after #5250) for coarse-grained filters, esp. when the filter key is not the first sort key. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-01 13:49:58 +00:00
dependabot[bot]	fbd39844d8	chore(deps): Bump async-trait from 0.1.56 to 0.1.57 (#5247 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.56 to 0.1.57. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.56...0.1.57) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-08-01 08:30:33 +00:00
Andrew Lamb	9215a534d0	chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` (#5229 ) * chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` * chore: Run cargo hakari tasks * fix: Update for API changes * fix: clippy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-28 08:10:47 +00:00
Marco Neumann	9a9a1a4777	feat: limit per-table chunk data for every query (#5223 ) * feat: `QueryChunk::as_any` * feat: allo `ChunkPruner::prune_chunks` to fail * feat: limit per-table chunk data for every query Closes #5211. * fix: address review comments Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-07-27 13:20:05 +00:00
Marco Neumann	85c186f5b8	feat: cache projected chunk schemas in querier (#5213 ) * feat: cache projected chunk schemas in querier Ref #5202. * refactor: simplify size calculations Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-27 08:23:20 +00:00
Andrew Lamb	495bbe48f2	refactor: Reduce boiler plate calling `SpanRecorder::child` (#5180 ) * refactor: call SpanRecorder::child * refactor: update more locations Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-22 11:11:45 +00:00
Marco Neumann	0f54281d24	feat: trace namespace cache For #5129.	2022-07-21 16:10:06 +02:00
Marco Neumann	9031ed390b	feat: trace parquet_file cache For #5129.	2022-07-21 16:10:06 +02:00
Marco Neumann	4c5227292f	feat: trace partition cache For #5129.	2022-07-21 16:10:06 +02:00
Marco Neumann	ff88702749	feat: wire up cache tracing (1/2) (#5170 ) * feat: trace tombstone cache For #5129. * feat: trace table cache For #5129. * feat: trace read buffer cache For #5129. * feat: trace processed_tombstones cache For #5129. * refactor: improve span name Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-21 13:59:55 +00:00
Nga Tran	69cb3f2b19	refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151 ) * refactor: remove min_sequnce_number * fix: typos * fix: remove min_sequencer_number from new files from merging main * fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier * test: add test_compactor_collision back but modify the input to make it work woth new changes Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-21 13:51:54 +00:00
Marco Neumann	b35502ce61	feat: cache tracing (#5164 ) * feat: cache tracing Add tracing to the metrics cache wrapper. The extra arguments for GET and PEEK make this quite simple, because the wrapper can just extend the inner args with the trace information. We currently terminate the span in `querier::cache` (i.e. only pass in `None`, so no tracing will occur) to keep this PR rather small. This will be changed in subsequent PRs. For #5129. * fix: typo Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-21 11:54:22 +00:00
Marco Neumann	0561423475	refactor: enforce proper `IOxSessionContext` (#5158 ) - remove `IOxSessionContext::default()` because untracked contexts should only be created by tests - remove `Option<IOxSessionContext>` because it is a typed workaround for `IOxSessionContext::default` Tests should use `IOxSessionContext::testing` and all _normal_ users should create proper contexts. I suspect this will help tracing or at least prevent silent regressions. See #5129. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-20 16:25:43 +00:00
Marco Neumann	3b8f98c7b8	feat: allow passing for extra arguments to `Cache::peek` (#5161 ) This will be used to pass spans down to `CacheWithMetrics` (or a new wrapper specific to tracing) and will help with #5129. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-20 13:51:21 +00:00
Marco Neumann	8b9119a0c6	feat: trace querier->ingester, stopping at gRPC layer (#5159 ) This adds tracing of querire->ingester request up to the point where we perform the network request, i.e. the trace will only appear on the querier side. We may extend this at some point to carry the tracing information to the ingester as well. Ref #5129. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-20 11:48:52 +00:00
Marco Neumann	b8d9799a26	feat: wire span all the way to `QuerierTable::chunks` (#5134 ) * feat: pass context to `QueryDatabase::chunks` * feat: wire span all the way to `QuerierTable::chunks` This is required for #5129.	2022-07-19 14:12:55 +00:00
Andrew Lamb	e2d871b00b	chore: Update datafusion and arrow/parquet/arrow-flight to `18.0.0` (#5079 ) * chore: Update datafusion to 10.0.0, arrow/parquet/arrow-flight to 18 * chore: Run cargo hakari tasks * fix: update cargo pin Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-18 15:01:03 +00:00
Marco Neumann	f0bd278652	feat: add tracing to instrumented semaphores (#5130 ) This will allow us to easily see how much time we spend during query processing waiting for the query semaphore. Ref #5129.	2022-07-15 07:50:28 +00:00
dependabot[bot]	9b67de2f43	chore(deps): Bump tokio from 1.19.2 to 1.20.0 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.19.2 to 1.20.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.19.2...tokio-1.20.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2022-07-14 01:21:43 +00:00
Carol (Nichols \|\| Goulding)	61c023139b	refactor: Switch compaction levels to an enum with values rather than separate consts Bonuses: - Type checking - Validation - Less casting - Exhaustiveness checking - Less use of the numerical value	2022-07-13 11:30:36 -04:00
Marco Neumann	89c24dfec0	fix: do not force-load chunks into read buffer (#5112 ) I forgot to address a TODO in #5091. Extends to test to actually check the chunk stage and removes the function for manual force-loads. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-13 14:46:24 +00:00
Marco Neumann	b1b2cb5d4a	feat: load read buffer on demand (#5091 ) * refactor: extract `select_schema` * refactor: improve `InternalLostInputField` error message * test: improve SQL runner output * feat: load read buffer on demand Closes #5032. * refactor: move `[Half]OwnedSelection` to `schema` crate`	2022-07-13 08:51:40 +00:00
Nga Tran	bce8924b4c	refactor: use max_sequence_number to sort chunks for deduplication (#5101 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-12 16:23:53 +00:00
Marco Neumann	96da584139	test: do NOT create expensive bloom filters when we do not need them (#5089 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-11 16:29:53 +00:00
Marco Neumann	0a61989df8	refactor: `QuerierParquet` + `QuerierRBChunk` = ❤️ (merge them together) (#5063 ) * refactor: `QuerierParquet` + `QuerierRBChunk` = ❤️ * refactor: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-08 08:06:53 +00:00
Marco Neumann	41c8a8428f	feat: `ReadBufferCache::peek` (#5064 ) For #5032. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-08 07:35:24 +00:00
Andrew Lamb	c46e1c6347	chore: Update datafusion + arrow/parquet/arrow-flight to `17.0.0` (#5021 ) * fix: correct nullability declaration of system tables * chore: Update datafusion and arrow/parquet/arrow-flight * chore: Run cargo hakari tasks * fix: Update tests * fix: Update tests * fix: predicate pruning * fix: add some tests * fix: query_functions * fix: fix read_buffer test * fix: fix clippy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-07 19:22:15 +00:00
Marco Neumann	aacdeaca52	refactor: prep work for #5032 (#5060 ) * refactor: remove parquet chunk ID to `ChunkMeta` * refactor: return `Arc` from `QueryChunk::summary` This is similar to how we handle other chunk data like schemas. This allows a chunk to change/refine its "believe" over its own payload while it is passed around in the query stack. Helps w/ #5032.	2022-07-07 13:21:48 +00:00
Marco Neumann	2e5366a62a	refactor: disable TTL (caching) for non-existing namespaces (#5053 ) This is not relevant at the moment for prod since other layers prevent/filter queries for non-existing namespaces. However this messes up the flux integration tests, see https://github.com/influxdata/conductor/issues/997 So let's disable this specific cache case until #4617 is implemented which may be used by the flux tests. Fixes https://github.com/influxdata/conductor/issues/997 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-06 15:22:58 +00:00
Marco Neumann	16bd3e67c0	refactor: unify `apply_predicate_to_metadata` (#5030 ) Instead of using some hand-rolled timestamp-based logic (or just "unknown") all over the place, just use logic introduced in #5017. This requires slightly improved table summaries within the querier that at least has min/max for the timestamp column. For that, the former `IngesterChunk`-specific `calculate_summary` method was extended to `create_basic_summary` to include that data and is now also used by `QuerierParquetChunk`. Note: `QuerierRBChunk` already has detailled metrics that are provided by the read buffer implementation. Should we ever need even better pruning for `QuerierParquetChunk` (or `IngesterChunk`) then we _only_ need add extra data to the table summaries. Closes #4976. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-05 12:51:59 +00:00
Andrew Lamb	c4c251129e	chore: Update datafusion (#5020 ) * chore: Update datafusion * fix: Update plan * fix: update explain plans Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-01 19:59:41 +00:00
kodiakhq[bot]	84d2573ab6	Merge branch 'main' into cn/move-sharding-logic	2022-07-01 17:46:33 +00:00
Marco Neumann	016dd93d9c	feat: filter chunks before requesting read buffers (#4996 ) Fixes #4976.	2022-07-01 08:59:07 +00:00
Marco Neumann	87a8579742	refactor: `ChunkOrder::new` cannot fail (#5004 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-30 22:26:20 +00:00
Carol (Nichols \|\| Goulding)	380166f4c0	refactor: Move sharding query from namespace to table Better supports future work. Fixes #5003.	2022-06-30 11:32:23 -04:00
Marco Neumann	be53716e4d	refactor: use IDs for `parquet_file.column_set` (#4965 ) * feat: `ColumnRepo::list_by_table_id` * refactor: use IDs for `parquet_file.column_set` Closes #4959. * refactor: introduce `TableSchema::column_id_map`	2022-06-30 15:08:41 +00:00
Carol (Nichols \|\| Goulding)	3049479b78	feat: Implement new querier to ingester config design	2022-06-30 08:26:50 -04:00
Carol (Nichols \|\| Goulding)	59da2dccb8	feat: Assert if no ingester addresses are found Temporarily support `--ingester-addresses` (and always return all ingesters) so that this PR can be deployed during the switchover.	2022-06-30 08:22:47 -04:00
Carol (Nichols \|\| Goulding)	0e450deca8	feat: Support a sequencer being mapped to multiple ingesters	2022-06-30 08:22:47 -04:00
Carol (Nichols \|\| Goulding)	44bce8e3ec	fix: Don't assume one ingester per shard/table	2022-06-30 08:22:47 -04:00
Carol (Nichols \|\| Goulding)	4e91121e29	feat: Allow specification of sequencer to ingester mappings in a JSON file	2022-06-30 08:22:46 -04:00
Carol (Nichols \|\| Goulding)	f37f8013ec	feat: Assign a sequencer id to QuerierTables to know which ingester to query	2022-06-30 08:22:46 -04:00
Carol (Nichols \|\| Goulding)	1824dbdebd	feat: Create IngesterConnection optionally using a map of sequencer IDs to ingester addresses	2022-06-30 08:22:46 -04:00
Raphael Taylor-Davies	835e1c91c7	chore: update object_store to 0.3.0 (#4707 ) * chore: update object_store to 0.3.0 * chore: review feedback Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-29 21:44:03 +00:00
Andrew Lamb	01fb2e132d	chore: Update datafusion pin (#4969 ) * chore: Update datafusion pin * fix: Update for api * fix: Explicitly set coalsce batch size * fix: Update batch size as well * fix: update tests for new explain plan, and improved coercion	2022-06-29 17:52:37 +00:00
Marco Neumann	1eac304305	refactor: fetch RB chunks in parallel (#4952 ) Currently the querier fetches RB in a serial manner, which is probably not good since each cache miss takes between 10ms and 250ms. Let's try to fetch 2 in parallel and if that works well, make this a proper config. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-28 07:54:58 +00:00
Marco Neumann	9b8086df74	fix: size estimates (#4950 ) * fix: `Tombstone::size` must include serialized predicate * fix: `CachedPartition::size` must include `Arc` heap allocation Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-27 15:25:32 +00:00
Marco Neumann	1a74f84494	refactor: remove `ParquetFileWithMetadata` usage outside the catalog (#4948 ) * refactor: remove `DecodedParquetFile` from `iox_tests` * refactor: remove `DecodedParquetFile` from querier Also pull out all the chunk schema and sort key handling into a function so that RB chunks and parquet chunks mostly use the same code path. * refactor: remove `DecodedParquetFile` * refactor: remove `ParquetFileWithMetadata` usage * fix: test data consistency	2022-06-27 15:19:29 +00:00
Marco Neumann	3b78bf1c48	refactor: remove binary parquet file MD from compactor (#4938 ) * refactor: simplify sort key calculation * refactor: use schema from catalog instead from file * refactor: do not request parquet file MD in compactor * test: ensure that `QueryableParquetChunk` works correctly	2022-06-27 15:11:15 +00:00
Marco Neumann	b9cbb3dfca	refactor: do not use in-parquet IOx metadata in compactor () (#4935 ) refactor: avoid feeding sort key from struct into same struct * feat: allow namespace schema query by ID * refactor: do not use binary parquet file MD in compactor tests * refactor: do not use in-parquet IOx metadata * refactor: reduce number of catalog queries	2022-06-27 08:06:11 +00:00
Marco Neumann	bd6c4659af	refactor: slim down parquet chunk (remove Metadata) (#4934 ) * feat: conversion from `ParquetFile` to `ParquetFilePath` * refactor: slim down parquet chunk - ensure it works without binary parquet metadata - timestamp range is no longer optional (ensured by the NG type system) - remove table summary: this is only needed for SOME API users. The compactor can perfectly work without statistics since has the timestamp range which is sufficient for the current overlap check (we don't use any other primary key stats at the moment). The querier currently does NOT use parquet chunks (was replaced by read buffer) but if it will again in some future it will likely need to find a way to fetch and cache the statistics. - the schema is now provided by the API user since it can be reconstructed using the NG catalog only (and "wrong" column orders are tolerated as of #4921) Ref #4124 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-23 10:55:16 +00:00
Marco Neumann	463d430d43	refactor: do not fetch parquet MD from catalog in querier (#4926 ) Ref #4124	2022-06-23 09:03:19 +00:00
Marco Neumann	4b7d02fad1	feat: do not rely on encoded parquet metadata for RB chunks (#4924 ) * fix: use proper sort key in tests * feat: do not rely on encoded parquet metadata for RB chunks Ref #4124. * refactor: allocate less strings * refactor: use upstream PK calculation * fix: cache expiration w/o a good reason * refactor: make namespace cache safer to use * refactor: make partition cache safer to use	2022-06-23 08:55:52 +00:00
Marco Neumann	0534b80886	fix: `ParquetFile::size` must include column set (#4925 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-22 13:06:02 +00:00
Marco Neumann	9591bed696	refactor: make querier internals private (#4922 ) Queries internals are not meant to be used by other crates. Only a handful selected interfaces should be used by IOxD and the query tests. The compactor only used a very small subset just to read parquet files back into memory. It shall rather use the official `parquet_file` interface instead. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-22 13:00:08 +00:00
Marco Neumann	59accfe862	refactor: assorted fixes and prep work for #4124 (#4912 ) * refactor: `TestPartition::update_sort_key` should return an `Arc` The whole test framework is built around `Arc`s, so let's fix this consistency issue. * fix: actually calculate correct column set in test framework * feat: check expected parquet file schema While working on the querier I made some mistakes regarding schemas and such a check would have greatly improved the debugging experience. * feat: namespace cache expiration * fix: improve parquet schema check * fix: remove clone	2022-06-21 16:08:28 +00:00
Marco Neumann	70337087a8	refactor: do not require parquet metadata for RB cache (#4911 ) * test: add `TestParquetFile::schema` * refactor: do not require parquet metadata for RB cache Ref #4124.	2022-06-21 12:59:23 +00:00
Marco Neumann	db24838221	refactor: remove table name from read buffer (#4910 ) The low-level chunk storage shouldn't care about the table name (this is also true for parquet chunks btw). In fact, the table name is already only a partial information since it misses the namespace. If we need a table name, then the high-level chunk/data management is responsible for that. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-21 11:57:28 +00:00
Marco Neumann	0f63be26c3	refactor: pass path instead of metadata around to load parquet files (#4909 )	2022-06-21 10:57:10 +00:00
Marco Neumann	c3912e34e9	refactor: store per-file column set in catalog (#4908 ) * refactor: store per-file column set in catalog Together with the table-wide schema and the partition-wide sort key, this should be everything we need to read a parquet file directly into memory without peeking any file-level metadata. The querier will use this to directly load parquet files into the read buffer. WARNING: This requires a catalog wipe! Ref #4124. * refactor: use proper `ColumnSet` type	2022-06-21 10:26:12 +00:00
Marco Neumann	730f85a619	refactor(querier): split ingester partitions into chunks (#4893 ) * refactor(querier): split ingester partitions into chunks With the new wire protocol the ingester can now transmit multiple snapshots per partition with different schemas. This changes the querier to reflect this and and splits uses the individual snapshots as chunks for the query engine instead of a single partition. The schema handling was changed so that instead of a table-wide schema enforcement, we now use the snapshot-specific projections. This means we do not need to create all-NULL columns any longer because the batches within the chunks now always have the correct schema. * refactor: "disassembler" -> "decoder"	2022-06-20 08:58:58 +00:00
Nga Tran	72c8cfa6ed	fix: make ChunkOrder i64 data type to accept min sequence number 0 and match with data type of sequence number (#4888 ) * fix: make ChunkOrder u64 data type to accept min sequence number 0 * fix: make ChunkOrder i64 to match with sequence number type Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-17 13:45:17 +00:00
Marco Neumann	0fbff981ec	chore(deps): Bump sqlx to 0.6.0 and uuid to 1 (#4894 ) Closes #4889. Closes #4890. Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-17 10:28:28 +00:00
Marco Neumann	c6bffac5d3	refactor: make querier->ingester request metrics per-ingester (#4879 ) The metrics and logs introduced in #4806 will be emitted once for all ingesters instead of per request. The accumulated view makes it pretty hard to judge the actual request-response timings and the number of requests. Instead we now measure the data per request. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-16 15:09:47 +00:00
Marco Neumann	66c7d95312	refactor: use new ingester<>querier wire protocol (#4867 ) * refactor: use new ingester<>querier wire protocol Use and document the new and more flexible ingester<>querier wire protocol. Note that the ingester does NOT stream the response data yet, but the internal data structures would allow that. A follow-up change will adjust the ingester code to stream the data. Ref #4849. * fix: typos Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: clarify naming and public interface * test: add schema assertion to `ingester_response_to_record_batches` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-06-16 08:02:28 +00:00
kodiakhq[bot]	fa9a094068	Merge branch 'main' into cn/talk-to-ingesters-less	2022-06-15 17:42:40 +00:00
Carol (Nichols \|\| Goulding)	8331cb1afe	fix: Add retry to querying of catalog for sequencers in querier startup	2022-06-15 12:09:42 -04:00
Carol (Nichols \|\| Goulding)	03f6f59a9b	fix: Change the sharder to return error instead of panicking for no shards	2022-06-15 11:23:31 -04:00
Marco Neumann	7c60edd38c	refactor: prepare new ingester<>querier protocol on the querier side (#4863 ) * refactor: prepare new ingester<>querier protocol on the querier side This changes the querier internals to work with the new protocol. The wire protocol stays the same (for now). There's a (somewhat hackish) adapter in place on the querier side that converts the old to the new protocol on-the-fly. This is an intermediate step before we actually change the wire protocol (and in a step after that also take advantage of the new possibilites on the ingester side). Ref #4849. * docs: explain adapter	2022-06-15 14:32:24 +00:00
Carol (Nichols \|\| Goulding)	e9cdaffe74	fix: Create querier sharder from catalog sequencer info Panic if there are no sharders in the catalog.	2022-06-15 10:18:54 -04:00
Carol (Nichols \|\| Goulding)	874ef89daa	feat: Make specifying the write buffer, and thus getting a sharder, optional in querier	2022-06-15 10:01:45 -04:00
Marco Neumann	3bd24b67ba	feat: extend flight client to accept multiple (changing) schemas (#4853 ) * feat: extend flight client to accept multiple (changing) schemas See #4849. Originally I intended not to use Flight at all for the new ingester<>querier protocol. However since flight also deals with dictionary batches and multiple batches and the gRPC protocol that I would write would look very similar, I will use Flight with a bit more flexible message types. The rough idea for the protocol is the following stream: - for each partition: 1. "none" message with partition metadata 2. for each chunk (can have different schemas under certain circumstances): 1. "schema" message (resets dictionary state) 2. (optional) dictionary batch messages 3. one or more "record batch" message The nice thing about it is that the same arrow client works also for the existing client<>querier protocol since there we just send: 1. "schema" message (no app metadata) 2. (optional) dictionary batch messages 3. zero, one or more "record batch" message (no app metadata) * refactor: separate high- and low-level flight client It is very unlikely that a user will use the high-level batch-producing functionality and the low-level stuff within the same session. So let's split this into to clients (high-level uses the low-level one internally) to avoid confusion. Also add documentation on our protocol handling. * refactor: enumerate all variants in match statement to better catch errors in the future	2022-06-15 11:38:08 +00:00
Carol (Nichols \|\| Goulding)	e875a92cf8	feat: Log time spent requesting ingester partitions (#4806 ) * feat: Log time spent requesting ingester partitions Fixes #4558. * feat: Record a metric for the duration queriers wait on ingesters * fix: Use DurationHistogram instead of U64 Histogram * test: Add a test for the ingester ms metric * feat: Add back the logging to provide both logging and metrics for ingester duration * refactor: Use sample_count method on metrics * feat: Record ingester duration separately for success or failure * fix: Create a separate test for the ingester metrics Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-14 17:58:19 +00:00
Andrew Lamb	e91d00b10c	chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `16.0.0 (#4851 ) * chore: TEMP Update DataFusion to pre-release * chore: update arrow et al to 16.0.0 * chore: Run cargo hakari tasks * fix: update reader read_dictionary API * chore: Update to real Datafusion release * fix: Update parquet API * fix: update test Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-06-14 16:31:40 +00:00
Dom Dwyer	b41ea1d718	refactor: PartitionKey type This commit changes the code base to use a new reference-counted PartitionKey type wrapper, instead of passing a bare String around. This allows the compiler to type check & verify usage of the partition key, instead of passing a bare string around. By reference counting the underlying string, we reduce memory usage for some use cases.	2022-06-14 14:47:56 +01:00
Marco Neumann	2b84e5c087	feat: measure "probably reloaded" cache loads (#4813 ) To roughly gauge how much data we re-load into cached (i.e. data that was already loaded but was later evicted due to LRU pressure or TTL eviction) this change introduces a new metric that estimates if a cache entry that is requested from the loader was already seen before (using a probabilistic filter).	2022-06-13 13:51:45 +00:00
Marco Neumann	66623fe0cd	feat: expose query semaphore metrics (#4836 ) The groundwork for that was already done, just needed a bit of wiring. This might help us to judge timeouts.	2022-06-13 09:36:50 +00:00
Andrew Lamb	ddf61c5e98	refactor: Consolidate `Selection` creation, add tests (#4832 ) * refactor: Consolidate Selection --> DataFusion projection * fix: remove now unused function	2022-06-10 18:30:43 +00:00
kodiakhq[bot]	dd8d44e24f	Merge branch 'main' into cn/duration	2022-06-10 14:23:09 +00:00
Nga Tran	13c57d524a	feat: Change data type of catalog partition's sort_key from a string to an array of string (#4801 ) * feat: Change data type of catalog Postgres partition's sort_key from a string to an array of string * test: add column with comma * fix: use new protonuf field to avoid incompactible * fix: ensure sort_key is an empty array rather than NULL * refactor: address review comments * refactor: address more comments * chore: clearer comments * chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql * chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql * fix: Rename migration so it will be applied after Co-authored-by: Marko Mikulicic <mkm@influxdata.com>	2022-06-10 13:31:31 +00:00

1 2 3 4 5 ...

347 Commits (7202dddab6d9ede46c74664c0675fe349da2fd13)