influxdb

Commit Graph

Author	SHA1	Message	Date
Marco Neumann	513fdf1e26	feat: split "pruned" metric into "early" and "late" (#5645 ) * feat: split "pruned" metric into "early" and "late" * docs: improve Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * docs: explain `PruningMetrics` * test: try to test pruning Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-09-15 13:42:00 +00:00
Marco Neumann	f7b6f81fe1	feat: concurrent chunk creation (#5646 ) Create chunks in querier concurrently after we've pre-filtered them. Chunk creation still may require a bit of cached information (e.g. the partition sort key) and we can easily fetch these concurrently instead of in order. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-15 12:30:02 +00:00
Nga Tran	7c4c918636	chore: add parttion id into panic message (#5641 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-15 02:21:13 +00:00
Marco Neumann	2332e5de10	refactor: slightly increase querier namespace cache TTLs (#5635 ) This should lower catalog load and eliminate a few costly cache misses. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-14 13:54:51 +00:00
Andrew Lamb	f86d3e31da	chore: Update datafusion + object_store (#5619 ) * chore: Update datafusion pin * chore: update object_store to 0.5.0 * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-13 12:34:54 +00:00
Andrew Lamb	1fd31ee3bf	chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 (#5591 ) * chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 * fix: enable dynamic comparison flag * chore: derive Eq for clippy * chore: update explain plans * chore: Update sizes for ReadBuffer encoding * chore: update more tests Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-12 17:45:03 +00:00
Marco Neumann	8933f47ec1	refactor: make `QueryChunk::partition_id` non-optional (#5614 ) In our data model, a chunk always belongs to a partition[^1], so let's not make this attribute optional. The optional value only leads to -- mostly surprising -- conditional behavior, ranging from "do not equalize the partition sort key" (querier) to "always consider the chunk overlapping" (iox_query when dealing with ingester chunks). [^1]: This is even true when the chunk belongs to a parquet file that is not yet added to the catalog, contrary to what a comment in the ingester stated. The catalog and data model used by the querier are two totally different things.	2022-09-12 13:52:51 +00:00
Marco Neumann	df5ef875b4	revert: disable read buffer usage in querier (#5579 ) (#5603 ) This results in a 2x-3x slow down. It's not horrible, but also not good.	2022-09-09 11:26:09 +00:00
dependabot[bot]	786ce75e26	chore(deps): Bump tokio-util from 0.7.3 to 0.7.4 (#5596 ) Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.3 to 0.7.4. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.3...tokio-util-0.7.4) --- updated-dependencies: - dependency-name: tokio-util dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-09-09 07:40:16 +00:00
Marco Neumann	c3b47dfe59	refactor: disable read buffer usage in querier (#5579 ) * refactor: read querier parquet files from cache * refactor: only use parquet files in querier (no RB) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-08 13:18:22 +00:00
YIXIAO SHI	52ae60bf2e	chore: fix comment typo (#5551 ) Co-authored-by: Dom <dom@itsallbroken.com>	2022-09-07 08:49:29 +00:00
Luke Bond	a280acb860	Merge branch 'main' into alamb/guilio-python-main	2022-09-06 16:57:00 +01:00
Marco Neumann	adeacf416c	ci: fix (#5569 ) * ci: use same feature set in `build_dev` and `build_release` * ci: also enable unstable tokio for `build_dev` * chore: update tokio to 1.21 (to fix console-subscriber 0.1.8 * fix: "must use"	2022-09-06 14:13:28 +00:00
Marco Neumann	87772a6aec	refactor: debug log improvements (#5553 ) * feat: extend log output for ingester responses * feat: add debug log for parquet `read_filter` calls * feat: add debug log to `get_write_info` * feat: add debug log parquet cache invalidation	2022-09-05 13:54:13 +00:00
Marco Neumann	064f0e9b29	refactor: use DataFusion to read parquet files (#5531 ) Remove our own hand-rolled logic and let DataFusion read the parquet files. As a bonus, this now supports predicate pushdown to the deserialization step, so we can use parquets as in in-mem buffer. Note that this currently uses some "nested" DataFusion hack due to the way the `QueryChunk` interface works. Midterm I'll change the interface so that the `ParquetExec` nodes are directly visible to DataFusion instead of some opaque `SendableRecordBatchStream`.	2022-09-05 09:25:04 +00:00
Marco Neumann	f45cbfb88d	refactor: fine-grained file size mocking (#5541 ) * refactor: do not override parquet file size in querier This is going to be an issue when we actually rely on the size for reading, see #5531. * refactor: use selected file size mocking in compactor Do not blindly override parquet file sizes for all subsystems. This is going to be an issue when we actually rely on the size for reading, see #5531. * refactor: remove ability to override file sizes in catalog Blindly overriding data for all subsystems is dangerous, because some parts of our stack actually rely on the actual file size. See #5531. * docs: explain `size_overrides`	2022-09-05 08:50:04 +00:00
Andrew Lamb	1e1d964fdb	fix: Some other stragglers	2022-09-04 07:59:07 -04:00
Marco Neumann	0a0b3bd95b	feat: querier object store cache (#5527 ) * feat: querier object store cache * docs: improve Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com> Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>	2022-09-02 09:48:53 +00:00
Marco Neumann	5e187ae1c0	refactor: use concrete type in `MetricsLoader` (#5525 ) The API user may still use a `Box<dyn ...>` if they want, but they technically don't have to. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-01 12:22:12 +00:00
Marco Neumann	c59dd01742	refactor: use concrete inner type in `CacheWithMetrics` (#5522 ) The API user still CAN use dynamic dispatch but doesn't have to. This also simplifies the generics a bit. This is similar to #5520. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-01 06:05:59 +00:00
Marco Neumann	c0dda14cef	refactor: use concrete backend type in `CacheDriver` (#5520 ) This removes some `Box<dyn ...>` indirection when the user doesn't want it (you still can, but don't have to) and makes the whole type handling easier to understand. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-31 14:58:25 +00:00
Andrew Lamb	6669d85fb4	chore: Update datafusion + arrow/parquet to `21.0.0` (#5519 ) * chore: Update arrow/arrow-flight/parquet to 21.0.0 * chore: Update datafusion pin * chore: Fix arrow update script * chore: Update Cargo.lock * chore: Update for new API	2022-08-31 13:30:47 +00:00
Marco Neumann	fecbbd9fa1	refactor: improve namespace caching in querier (#5492 ) 1. Cache converted schema instead of catalog schema. This safes a buch of memcopies during conversion. 2. Simplify creation of new chunks, we now only need a `CachedTable` instead of a namespace and a table schema. In an artificial benchmark, this removed around 10ms from the query (although that was prior to #5467 which moved schema conversion one level up). Still I think it is the cleaner cache design. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-30 11:42:21 +00:00
Marco Neumann	430536f05f	refactor: use a single timestamp in policy backend (#5508 ) * refactor: use a single timestamp in policy backend Prior to this PR we had at least 1 `TimeProvider::now` calls per GET request (for caches that only used LRU) and up to 3 calls (caches with LRU + refresh + TTL). Let's instead use a single timestamp that is created by the policy backend itself (instead of the policies). This has the following consequences: - efficiency: `SystemProvider::now` is not free, even though under Linux this doesn't result in a syscall, it uses the stdlib time system which also checks for monotonicity - consistency: All changes for a single trigger (e.g. a GET cache call) now use a single timestamp instead of slightly increasing ones. I argue this is the better semantic, simpler to understand and better to debug. For some (slightly artificial) local performance experiment, this shaves off around 2ms per single-table SQL query. However I expect that there might be more degenerated cases (e.g. multi-table SQL queries or some InfluxRPC requests that hit multiple tables). The majority of this patch is moving the `TimeProvider` from the policies into the policy backend. * docs: explain `now` parameter	2022-08-30 11:23:25 +00:00
Carol (Nichols \|\| Goulding)	1b49ad25f7	refactor: Rename KafkaTopicId to TopicId	2022-08-29 14:27:02 -04:00
Carol (Nichols \|\| Goulding)	58f0b63cdc	refactor: Rename KafkaTopic to Topic or TopicMetadata or topic name as appropriate	2022-08-29 14:27:02 -04:00
Carol (Nichols \|\| Goulding)	cb52683a1a	fix: Redo uses after rebase	2022-08-29 14:08:33 -04:00
Carol (Nichols \|\| Goulding)	74c9529062	fix: Rename KafkaPartition to ShardIndex	2022-08-29 14:07:18 -04:00
Carol (Nichols \|\| Goulding)	6443858870	fix: Rename compactor option from sequencer to shard	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	95b7529079	fix: Rename more test values to shard	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	fe9c474620	fix: rustfmt	2022-08-29 14:06:45 -04:00
Carol (Nichols \|\| Goulding)	952a3ea498	fix: Return querier sharding to use sequencer ID	2022-08-29 14:06:44 -04:00
Carol (Nichols \|\| Goulding)	698f1a47ff	refactor: Rename test structures from sequencer to shard where appropriate	2022-08-29 14:06:44 -04:00
Jake Goulding	4abf21c724	refactor: Rename Sequencer (and its entourage) to Shard	2022-08-29 14:06:43 -04:00
Sam Arnold	05657ea068	fix: optimizations for metadata fetch and chunk pruning (#5467 ) * fix: hoist repeated computation out of chunk creation We have hundreds of chunks per table, so it is beneficial to only do common work once. * chore: remove TableCache as it is no longer used * fix: prune chunks both before and after metadata fetch Fetching the metadata for all the chunks in a table is expensive, especially when we have a narrow time range query that only needs a few chunks. * chore: fix clippy * fix: fix up some last tests * fix: review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-29 14:59:05 +00:00
Marco Neumann	3a4a17a48e	feat: refresh namespace cache before expiration (#5449 ) Closes #5318.	2022-08-29 11:52:18 +00:00
Dom Dwyer	abf26767c1	refactor: infallible JumpHash initialisation This doesn't really need to be fallible but forces propagation of a ton of error handling - no shards is always a sign of something being very wrong, and can be caught in the caller if it's for some reason an acceptable state / can be recovered from.	2022-08-24 13:18:57 +02:00
Marco Neumann	f34f99c5ed	refactor: port LRU cache backend to policy framework (#5406 ) * refactor: port LRU cache backend to policy framework Closes #5320. * test: extend `test_oversized_entries` Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-17 14:43:24 +00:00
Andrew Lamb	7f0ae53d6f	chore: Update to (almost) released object_store 0.4.0 (#5419 ) * chore: update object_store * chore: update hakari config * chore: Run cargo hakari tasks Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-17 13:44:48 +00:00
Marco Neumann	49ab568ca8	refactor: convert `remove_if` feature to policy framework (#5398 ) * refactor: allow `ChangeRequest` to carry a lifetime Let's not restrict our change functions to `'static` because this would require us to clone loads of data to achieve predicate-based `remove_if`. * refactor: convert `remove_if` feature to policy framework Decided to drop the "shared" functionality. We only use the small `remove_if` bit which is way easier to reason about. For #5320. * refactor: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-16 08:23:27 +00:00
Marco Neumann	0ccefa0d0c	refactor: port TTL backend to policy framework (#5396 ) * refactor: port TTL backend to policy framework Note that this is "just" a port, it does NOT change how TTL works. This will be done in #5318. Helps with #5320. * fix: ensure inner backend is empty * test: add some smoke test	2022-08-15 16:48:16 +00:00
Carol (Nichols \|\| Goulding)	b982bdaf2f	fix: Derive Eq when we derive PartialEq and members can derive Eq Allow this in generated code that we don't control, though. Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq	2022-08-11 15:04:06 -04:00
Andrew Lamb	b834bc630c	chore: more readability improvements to sort keys (#5366 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-10 17:59:25 +00:00
Andrew Lamb	16ddc5efc6	chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360 ) * chore: Update datafusion and arrow * chore: Update Cargo.lock * chore: update to Decimal128 * chore: Update tonic/prost/pbjson/etc * chore: Run cargo hakari tasks * fix: doctest in generated types Co-authored-by: CircleCI[bot] <circleci@influxdata.com>	2022-08-09 17:30:44 +00:00
Andrew Lamb	172f893368	fix: fix logging typo in querier (#5345 ) * fix: fix logging typo * fix: fix type in typo fix ;(	2022-08-09 06:34:06 +00:00
Marco Neumann	cd0dc42b4a	refactor: use a single chunk filter/pruning step in querier (#5338 ) We already prune all chunks in the query-access layer. There's no need to do that another time (which is actually the first time) in `QuerierTable::chunks`. The time savings we get from feeding less chunks into the state reconciling should be negligible. On the pro-side however we get a more streamlined data flow and actually correct chunk pruning metrics. Also see #5336. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-08 12:55:14 +00:00
Marco Neumann	fc1870ff76	fix: chunk pruning stats (#5319 ) - emit a warning if we cannot even attempt to prune chunks due to an error. This is always either a missing feature or a bug (even though it does not impact correctness but _only_ performance). Also see https://github.com/influxdata/conductor/issues/1107 - change metrics to clearly differentiate between "could not prune" and "not pruned" - add new "not pruned" observer hook (this was missing for some reason, the "pruned" hook existed though)	2022-08-05 10:50:31 +00:00
Marco Neumann	0d714878ca	feat: chunk pruning metrics (#5273 ) * refactor: make could-not-prune reason a static string * refactor: introduce `QuerierTableArgs` * feat: chunk pruning metrics Closes #4974. * refactor: address review comments * refactor: use static typing for not-pruned reason * refactor: pass chunk to not-pruned observer and use it for some metrics Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-04 15:29:21 +00:00
Nga Tran	34ccc9c7f5	chore: Revert "chore: Revert "refactor: bump batch size (#5251 )" (#5288 )" (#5300 ) This reverts commit `471b8be92f`. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-04 13:19:46 +00:00
Marco Neumann	840e4801b8	feat: make querier RAM pool split a proper feature (#5283 ) * feat: make querier RAM pool split a proper feature - use propre pool names - expose sizing via CLI/env Closes https://github.com/influxdata/conductor/issues/1102. * refactor: improve naming and docs Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-03 15:27:23 +00:00

1 2 3 4 5 ...

334 Commits (513fdf1e2639da99d97d74aaa4ec7ec1a5afd6e7)