influxdb

Commit Graph

Author	SHA1	Message	Date
Marco Neumann	4ad21e1eca	feat: decode time portion of the partition key (#8725 ) * refactor: make partition key parsing more flexible * feat: decode time portion of the partition key Helpful for #8705 because we can prune partitions earlier during the query planning w/o having to consider their parquet files at all.	2023-09-14 09:15:11 +00:00
Marco Neumann	c49c6159ef	refactor: change "normalization" in projected schema cache (#8720 ) * refactor: "projected schema" cache inputs must be normalized Normalizing under the hood and returning normalized schemas w/o the user knowing about it is a good source for subtle bugs. * refactor: do not normalize projected schema by name Normalizing makes it harder to predict the output and potentially requires additional string lookups just to work with the schema. * fix: typos Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: Martin Hilton <mhilton@influxdata.com> --------- Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: Martin Hilton <mhilton@influxdata.com>	2023-09-13 15:25:38 +00:00
Marco Neumann	3bdaafe36a	refactor: optimize hash-collection constructions (#8707 ) * refactor: optimize `SortKey` construction * refactor: optimize column set construction * refactor: optimize "should cover" calculcation for partitions --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-11 09:46:47 +00:00
Marco Neumann	b741dca8b8	refactor: wrap `CachedPartition` into `Arc` (#8706 ) Even though all subfields of `CachedPartition` are `Arc`ed, the size of this structure grows and copying more and more fields around for every cache access gets quite expensive. `Arc` the whole thing and simplify management a bit. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-11 09:22:15 +00:00
Andrew Lamb	50799e1f66	chore: Add ticket reference to comment (#8697 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-10 10:32:29 +00:00
Andrew Lamb	45c6bfea9c	chore: Update datafusion, arrow/flight/parquet to `46.0.0` , object_store to `0.7.0` (#8577 ) * chore: Update DataFusion pin * chore: Update for new API * fix: Update for API * fix: update compactor test * fix: Update to patched version of arrow 46.0.0 * fix: map `DataFusionError::Configuration` to an internal error * fix: do not use deprecated API --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-08 12:49:57 +00:00
dependabot[bot]	7f20b0faa0	chore(deps): Bump bytes from 1.4.0 to 1.5.0 (#8692 ) Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.4.0 to 1.5.0. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/bytes/compare/v1.4.0...v1.5.0) --- updated-dependencies: - dependency-name: bytes dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-08 12:17:12 +00:00
Nga Tran	93f3ec6999	feat: teach querier to use sort_key_ids (#8604 ) * feat: teach querier to use sort_key_ids * chore: add an assert to capture bugs --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-01 14:45:42 +00:00
Carol (Nichols \|\| Goulding)	12b8095c46	feat: Upgrade to Rust 1.72.0 (#8589 ) * feat: Upgrade to Rust 1.72.0 * fix: Allow a warning about an error we're intentionally creating This is a test for an error. This lint warns that this code will cause an error. Thanks lint, that's what we wanted! * chore: rustfmt 1.72 * fix: Remove unnecessary hashes in raw string literals Thanks Clippy! https://rust-lang.github.io/rust-clippy/master/index.html#/needless_raw_string_hashes Note that there are a number of false negatives with this lint; see https://github.com/rust-lang/rust-clippy/issues/11420 * fix: Remove unnecessary explicit iteration Looks like clippy::explicit_iter_loop was improved. https://rust-lang.github.io/rust-clippy/master/index.html#/explicit_iter_loop * fix: Allow clippy::manual_try_fold in a few places Some of these might not be possible to rewrite with try_fold, or at least not trivially. I don't feel confident enough to change these, in any case. I think the lint is good to have on for future code though, so that new code can be written with try_fold. * fix: Remove useless creation of vectors when an array will do Mostly in tests. Also fix some long lines. Thanks Clippy! https://rust-lang.github.io/rust-clippy/master/index.html#/useless_vec * fix: Allow a single range in a vec init, which is actually what we want Looks like Clippy's trying to catch a common mistake here, but for realz we actually want `Vec<Range<usize>>` not `Vec<usize>` https://rust-lang.github.io/rust-clippy/master/index.html#/single_range_in_vec_init * fix: Remove a useless conversion This looks like removing explicit iteration, but it's actually caught by useless_conversion. https://rust-lang.github.io/rust-clippy/master/index.html#/useless_conversion * fix: Remove redundant pattern matching Thanks Clippy! https://rust-lang.github.io/rust-clippy/master/index.html#/redundant_pat * fix: Allow an unwrap on a literal None in a test This matches with the other tests better, and also when I tried to remove the `unwrap_or_default` it changed the JSON sent from something with an empty value to `null`, so I think the `or_default` part is actually changing from one `None` to another `None`. https://rust-lang.github.io/rust-clippy/master/index.html#/unnecessary_literal_unwrap	2023-08-29 05:57:38 +00:00
Nga Tran	3e98f7ea5c	feat: fill sort_key_ids when partition is inserted and updated (#8517 ) * feat: read null sort_key_ids * chore: clearer explanation about test strategy * chore: Apply suggestions from code review Co-authored-by: Marco Neumann <marco@crepererum.net> * test: tests that add partition with NULL sort_key_ids * feat: set sort_key_ids to empty array {} during partition insertion * feat: initial step to update sort_key_ids * chore: address review comments * chore: remove unecessary comments and tests * fix: typos * chore: remove unecessary tests * feat: continue the work of updating sort_key_ids * fix: chec duplicates for SortedColumnSet * test: tests for sort ley ids * test: fix a test * chore: remove unused comments * chore: address first half of review comments and removing tests of tests * chore: address review commnets for fetching colums in ingester --------- Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-21 14:26:57 +00:00
Marco Neumann	3612b1c482	refactor: use DF `Expr` instead of `Predicate` for chunk pruning (#8500 ) `Predicate` is InfluxRPC specific and contains way more than just filter expression. Ref #8097. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-17 08:18:45 +00:00
dependabot[bot]	7094189004	chore(deps): Bump tokio from 1.31.0 to 1.32.0 (#8507 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.31.0 to 1.32.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.31.0...tokio-1.32.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-08-17 08:06:29 +00:00
kodiakhq[bot]	f43ab88148	Merge branch 'main' into cn/querier-changes-for-bloom-filter	2023-08-14 13:56:35 +00:00
Marco Neumann	a13ff617a4	refactor: decouple runtimes in `CachedObjectStore` (#8474 ) This should prevent the CPU-bound DataFusion runtime from stalling our main IO runtime. This is similar to: - `iox_query::exec::cross_rt_stream` - https://github.com/apache/arrow-rs/pull/4015 - https://github.com/apache/arrow-rs/pull/4040 Note: I currently have no concrete evidence that this is an issue, but worker stalling in tokio is really hard to debug and I would like to be better safe than sorry. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-14 11:49:32 +00:00
Marco Neumann	ad4068bbea	refactor: decouple `QueryNamespace` from synchronous schema interface (`QueryNamespaceMeta`) (#8472 ) * refactor: remove unused impl * refactor: inline `ExecutionContextProvider` into `QueryNamespace` * refactor: use global `DEFAULT_SCHEMA` * refactor: decouple `QueryNamespace` from `QueryNamespaceMeta` --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-14 08:09:08 +00:00
dependabot[bot]	34b8585931	chore(deps): Bump tokio from 1.30.0 to 1.31.0 (#8482 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.30.0 to 1.31.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.30.0...tokio-1.31.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-08-14 06:32:34 +00:00
dependabot[bot]	4c63338354	chore(deps): Bump async-trait from 0.1.72 to 0.1.73 (#8481 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.72 to 0.1.73. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.72...0.1.73) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-14 06:25:33 +00:00
Marco Neumann	d3ad1a838b	refactor: remove LIST operations on cached object store (#8473 ) Remove LIST operations from `CachedObjectStore` because: - not used/desired: The querier should NEVER use LIST operations on the object store. All the planning is done using catalog data. - misleading interface: The `CachedObjectStore` -- that stores parquet data -- should not implement uncached LIST operations, because this is misleading. This operation will never be cached. Or in other words: less code, less potential bugs. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-14 06:20:05 +00:00
NGA-TRAN	9bf1c8c11c	chore: revert fill sort_key_ids	2023-08-11 11:36:27 -04:00
Carol (Nichols \|\| Goulding)	74685de39b	fix: Set partition column ranges with an empty column range if cached range isn't available	2023-08-11 09:10:01 -04:00
Carol (Nichols \|\| Goulding)	7939b92bb5	fix: Correct spelling in method name	2023-08-11 09:10:01 -04:00
Carol (Nichols \|\| Goulding)	e868b3cc13	fix: Don't prune ingester chunks because they've already been pruned	2023-08-11 09:10:01 -04:00
Carol (Nichols \|\| Goulding)	3ad533201a	fix: Don't filter out partitions not in the partition cache	2023-08-11 09:10:01 -04:00
Nga Tran	da92a5c9e1	feat: fill catalog `sort_key_ids` for partitions with coming data (#8462 ) * feat: fill catalog sort_key_ids for partition with coming data * test: sort_key_ids has empty array for newly create partition * test: name of non-existing column * chore: add comments to ask Andrew about the code * chore: make comments clearer * chore: fix a comment to avoid failure in doc * chore: add comment for the panic if column name of sort key not found * fix: during import files the partition has to be created with empty sort key first. Then after its files are created, the partition will be uodated with sort key * chore: remove no longer needed comments after the bug in build_catalog test is fixed * chore: address review comments * refactor: Use ColumnSet type * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * chore: fix a clippy --------- Co-authored-by: Carol (Nichols \|\| Goulding) <carol.nichols@gmail.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2023-08-10 18:12:40 +00:00
Marco Neumann	9358ec74db	refactor: remove `Predicate` usage from `QueryNamespace` (#8468 ) For #8097. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-10 16:32:55 +00:00
Marco Neumann	71e6b66476	refactor: replace `Predicate` w/ `&[Expr]` in querier internals (#8465 ) * refactor: replace `Predicate` w/ `&[Expr]` in querier internals First step towards #8097. This replaces most internal usages of `Predicate` with the more appropriate `&[Expr]` within the querier code. This is also triggered by #8443 because the new ingester protocol shall not use `Predicate` anymore. Note that the querier still uses `Predicate` for a few interfaces. These will be fixed later: - the current ingester RPC version - chunk pruning - `QuerierNamespace::chunks` * fix: docs	2023-08-10 13:00:43 +00:00
dependabot[bot]	3675043585	chore(deps): Bump tokio from 1.29.1 to 1.30.0 (#8464 ) Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.29.1 to 1.30.0. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.29.1...tokio-1.30.0) --- updated-dependencies: - dependency-name: tokio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-08-10 07:50:18 +00:00
Marco Neumann	7a462754e4	fix: avoid tracing flood (#8458 ) * feat: `RemoveIfHandle::remove_if_and_get_with_status` * fix: avoid tracing flood Do not create a span for every partition that we get from the cache system. Ref https://github.com/influxdata/idpe/issues/17884. --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-09 14:10:41 +00:00
Marco Neumann	8966cc88df	fix: `O(n^2)` delete handling in LRU (#8416 ) * test: add regression test * feat: `AddressableHeap::iter` * fix: `O(n^2)` delete handling in LRU * refactor: address review comments --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-07 13:46:03 +00:00
Carol (Nichols \|\| Goulding)	92ae8e4084	refactor: Extract a convenience constructor for Deterministic transition ids	2023-08-02 10:17:23 -04:00
Carol (Nichols \|\| Goulding)	a9b0daef8e	fix: Make partition identifier a oneof protobuf field	2023-08-02 10:17:23 -04:00
Carol (Nichols \|\| Goulding)	308d7f3d4b	feat: Use TransitionPartitionId everywhere in the querier	2023-08-02 10:17:22 -04:00
Carol (Nichols \|\| Goulding)	ffa09b0911	fix: Update impl QueryChunk for QuerierParquetChunk	2023-08-02 10:17:22 -04:00
Carol (Nichols \|\| Goulding)	ea33c06946	fix: Update IngesterChunk's implementation of QueryChunk to return TransitionPartitionId This doesn't get querier compiling yet.	2023-08-02 10:17:22 -04:00
Carol (Nichols \|\| Goulding)	e4b9455344	feat: Have QueryChunk return a reference from partition_id()	2023-08-02 10:17:22 -04:00
Carol (Nichols \|\| Goulding)	13d51f40df	fix: Make partition_id optionally sent from ingesters to queriers	2023-08-02 10:17:21 -04:00
Marco Neumann	aa7a38be55	fix: re-design LRU cache to be deadlock-free (#8345 ) * fix: re-design LRU cache to be deadlock-free Fixes #8334. * test: explain test * test: add regression test * docs: extend "overdelete" section	2023-07-31 13:04:34 +00:00
Carol (Nichols \|\| Goulding)	4a9e76b8b7	feat: Make parquet_file.partition_id optional in the catalog (#8339 ) * feat: Make parquet_file.partition_id optional in the catalog This will acquire a short lock on the table in postgres, per: <https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads> This allows us to persist data for new partitions and associate the Parquet file catalog records with the partition records using only the partition hash ID, rather than both that are used now. * fix: Support transition partition ID in the catalog service * fix: Use transition partition ID in import/export This commit also removes support for the `--partition-id` flag of the `influxdb_iox remote store get-table` command, which Andrew approved. The `--partition-id` filter was getting the results of the catalog gRPC service's query for Parquet files of a table and then keeping only the files whose partition IDs matched. The gRPC query is no longer returning the partition ID from the Parquet file table, and really, this command should instead be using `GetParquetFilesByPartitionId` to only request what's needed rather than filtering. * feat: Support looking up Parquet files by either kind of Partition id Regardless of which is actually stored on the Parquet file record. That is, say there's a Partition in the catalog with: Partition { id: 3, hash_id: abcdefg, } and a Parquet file that has: ParquetFile { partition_hash_id: abcdefg, } calling `list_by_partition_not_to_delete(PartitionId(3))` should still return this Parquet file because it is associated with the partition that has ID 3. This is important for the compactor, which is currently only dealing in PartitionIds, and I'd like to keep it that way for now to avoid having to change Even More in this PR. * fix: Use and set new partition ID fields everywhere they want to be --------- Co-authored-by: Dom <dom@itsallbroken.com>	2023-07-31 12:40:56 +00:00
Marco Neumann	edf77c73d8	fix: avoid panic when clock goes backwards (#8322 ) I've seen at least one case in prod where the UTC clock goes backwards. The `TimeProvider` and `Time` interface even warns about that. However there was a `Sub` impl that would panic if that happens and even though this was documented, I think we can do better and just not offer a panicky interface at all. So this removes the `Sub` impl. and replaces all uses with `checked_duration_since`.	2023-07-24 12:10:41 +00:00
dependabot[bot]	cd31492e5b	chore(deps): Bump async-trait from 0.1.71 to 0.1.72 (#8317 ) Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.71 to 0.1.72. - [Release notes](https://github.com/dtolnay/async-trait/releases) - [Commits](https://github.com/dtolnay/async-trait/compare/0.1.71...0.1.72) --- updated-dependencies: - dependency-name: async-trait dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-24 10:07:18 +00:00
Marco Neumann	748e66731c	feat: batch partition catalog requests in querier (take 2) (#8299 ) * feat: batch partition catalog requests in querier This is mostly wiring that builds on top of the other PRs linked to #8089. I think we eventually could make the batching code nicer by adding better wrappers / helpers, but lets do that if we have other batched caches and this patterns proofs to be useful. Closes #8089. * test: extend `test_multi_get` * test: regression test for #8286 * fix: prevent auto-flush CPU looping * fix: panic when loading different tables at the same time --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-24 08:24:10 +00:00
Marco Neumann	d3432198b6	revert: batch partition catalog requests in querier (#8269 ) (#8283 ) Panics in prod. This reverts commit `0c347e8e64`.	2023-07-20 09:42:40 +00:00
Marco Neumann	0c347e8e64	feat: batch partition catalog requests in querier (#8269 ) This is mostly wiring that builds on top of the other PRs linked to #8089. I think we eventually could make the batching code nicer by adding better wrappers / helpers, but lets do that if we have other batched caches and this patterns proofs to be useful. Closes #8089. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-20 08:31:49 +00:00
kodiakhq[bot]	ebba032399	Merge branch 'main' into cn/all-over-again	2023-07-17 14:46:48 +00:00
Carol (Nichols \|\| Goulding)	cf046d0b3e	refactor: Extract a from implementation for creating TransitionPartitionId	2023-07-17 10:34:01 -04:00
Carol (Nichols \|\| Goulding)	a9b788b58f	feat: Collate chunks based on their partition hash id if they have it	2023-07-17 10:34:01 -04:00
dependabot[bot]	4c0e5db3a5	chore(deps): Bump insta from 1.30.0 to 1.31.0 (#8242 ) Bumps [insta](https://github.com/mitsuhiko/insta) from 1.30.0 to 1.31.0. - [Changelog](https://github.com/mitsuhiko/insta/blob/master/CHANGELOG.md) - [Commits](https://github.com/mitsuhiko/insta/compare/1.30.0...1.31.0) --- updated-dependencies: - dependency-name: insta dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-17 14:01:21 +00:00
Carol (Nichols \|\| Goulding)	313baca8b6	fix: Use sort_by rather than sort_by_key to use references These places are sorting by `PartitionId` currently, which implements `Copy`, but are about to be changed to be sorted on `PartitionHashId`, which does not implement `Copy`.	2023-07-17 09:56:55 -04:00
kodiakhq[bot]	699fb70616	Merge branch 'main' into savage/propagate-tracing-spans-from-router-to-ingester	2023-07-14 12:28:56 +00:00
Dom Dwyer	7f7d1f2ee7	fix(ingester): projection without time column The ingester can project arbitrary columns at query time, and has no special requirement that the "time" column be part of that projection. Because the timestamp summary generation explicitly requires the time column to exist, it panics when there's no "time" column in the projection - this is a bit of a modelling mismatch more than anything.	2023-07-13 14:22:48 +02:00

1 2 3 4 5 ...

597 Commits (18c45d39bb311d052f535ffc7885e3c13b4d1fec)