influxdb

Commit Graph

Author	SHA1	Message	Date
Dom Dwyer	55631a4f83	refactor: ColumnsByName::from_iter() Allow a ColumnsByName to be constructed by collecting a set of (name, column_schema) tuples.	2023-08-22 12:45:21 +02:00
Joe-Blount	53915f0653	feat: move vertical splitting & detect non-linear data (#8506 ) * chore: test changes and additions in preparation for functional changes * feat: move vertical splitting to RoundInfo calculation, align splits to L1 files * chore: insta test churn * feat: detect non-linear data distribution in vertical splitting * chore: add tests for non-linear data distribution * chore: insta churn * chore: cleanup & comment additions * chore: some variable renaming	2023-08-21 18:22:25 +00:00
Nga Tran	3e98f7ea5c	feat: fill sort_key_ids when partition is inserted and updated (#8517 ) * feat: read null sort_key_ids * chore: clearer explanation about test strategy * chore: Apply suggestions from code review Co-authored-by: Marco Neumann <marco@crepererum.net> * test: tests that add partition with NULL sort_key_ids * feat: set sort_key_ids to empty array {} during partition insertion * feat: initial step to update sort_key_ids * chore: address review comments * chore: remove unecessary comments and tests * fix: typos * chore: remove unecessary tests * feat: continue the work of updating sort_key_ids * fix: chec duplicates for SortedColumnSet * test: tests for sort ley ids * test: fix a test * chore: remove unused comments * chore: address first half of review comments and removing tests of tests * chore: address review commnets for fetching colums in ingester --------- Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-21 14:26:57 +00:00
Nga Tran	5d17a99dbb	feat: read null sort_key_ids (#8489 ) * feat: read null sort_key_ids * chore: clearer explanation about test strategy * chore: Apply suggestions from code review Co-authored-by: Marco Neumann <marco@crepererum.net> * test: tests that add partition with NULL sort_key_ids * chore: address review comments * chore: remove unecessary comments and tests * fix: typos * chore: remove unecessary tests * fix: chec duplicates for SortedColumnSet --------- Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-18 14:15:27 +00:00
dependabot[bot]	d2c71bfe67	chore(deps): Bump thiserror from 1.0.46 to 1.0.47 (#8519 ) Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.46 to 1.0.47. - [Release notes](https://github.com/dtolnay/thiserror/releases) - [Commits](https://github.com/dtolnay/thiserror/compare/1.0.46...1.0.47) --- updated-dependencies: - dependency-name: thiserror dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-08-18 09:02:48 +00:00
dependabot[bot]	fff313b80c	chore(deps): Bump thiserror from 1.0.44 to 1.0.46 (#8496 ) Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.44 to 1.0.46. - [Release notes](https://github.com/dtolnay/thiserror/releases) - [Commits](https://github.com/dtolnay/thiserror/compare/1.0.44...1.0.46) --- updated-dependencies: - dependency-name: thiserror dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-08-16 10:54:47 +00:00
NGA-TRAN	9bf1c8c11c	chore: revert fill sort_key_ids	2023-08-11 11:36:27 -04:00
Nga Tran	da92a5c9e1	feat: fill catalog `sort_key_ids` for partitions with coming data (#8462 ) * feat: fill catalog sort_key_ids for partition with coming data * test: sort_key_ids has empty array for newly create partition * test: name of non-existing column * chore: add comments to ask Andrew about the code * chore: make comments clearer * chore: fix a comment to avoid failure in doc * chore: add comment for the panic if column name of sort key not found * fix: during import files the partition has to be created with empty sort key first. Then after its files are created, the partition will be uodated with sort key * chore: remove no longer needed comments after the bug in build_catalog test is fixed * chore: address review comments * refactor: Use ColumnSet type * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * chore: fix a clippy --------- Co-authored-by: Carol (Nichols \|\| Goulding) <carol.nichols@gmail.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2023-08-10 18:12:40 +00:00
Dom Dwyer	03f7025211	perf: minimise partition catalog queries This commit implements a PartitionProvider decorator that probabilistically determines if a partition is going to be a an "old-style" row-addressed partition created prior to #7963, or a "new-style" hash-addressed partition created after using a fast, space-efficient, compressed bloom filter. If a partition is identified as a new-style, hash-addressed partition, the PartitionData is immediately initialised using the deterministic hash ID without performing a catalog query at all. If a partition is identified as an old-style, row-addressed partition, a catalog query is performed to resolve the row ID as it would without this filter. A new-style, hash-addressed partition may sometimes be incorrectly identified as a row-addressed partition, causing a spurious catalog query, which is then correctly identified as a hash-addressed partition. This is tuned to happen ~1-0.1% of the time, eliminating 99% to 99.9% of unnecessary catalog queries.	2023-08-03 16:40:38 +02:00
Carol (Nichols \|\| Goulding)	641324b261	docs: Explain TransitionPartitionId more thoroughly Co-authored-by: Dom <dom@itsallbroken.com>	2023-08-02 10:17:24 -04:00
Carol (Nichols \|\| Goulding)	92ae8e4084	refactor: Extract a convenience constructor for Deterministic transition ids	2023-08-02 10:17:23 -04:00
Dom Dwyer	6ea8c99c01	refactor: accessor for table partition proto Allow the Table partition template protobuf to be accessed (if specified).	2023-08-02 13:36:35 +02:00
Dom Dwyer	e3ec091881	refactor: accessor for namespace partition proto Allow the Namespace partition template protobuf to be accessed (if specified).	2023-08-02 13:36:34 +02:00
Dom Dwyer	2ebd2e2236	feat: ColumnSchema instantiation from gossip Implement converting a Column received via gossip into a ColumnSchema.	2023-08-02 13:36:24 +02:00
Carol (Nichols \|\| Goulding)	4a9e76b8b7	feat: Make parquet_file.partition_id optional in the catalog (#8339 ) * feat: Make parquet_file.partition_id optional in the catalog This will acquire a short lock on the table in postgres, per: <https://stackoverflow.com/questions/52760971/will-making-column-nullable-lock-the-table-for-reads> This allows us to persist data for new partitions and associate the Parquet file catalog records with the partition records using only the partition hash ID, rather than both that are used now. * fix: Support transition partition ID in the catalog service * fix: Use transition partition ID in import/export This commit also removes support for the `--partition-id` flag of the `influxdb_iox remote store get-table` command, which Andrew approved. The `--partition-id` filter was getting the results of the catalog gRPC service's query for Parquet files of a table and then keeping only the files whose partition IDs matched. The gRPC query is no longer returning the partition ID from the Parquet file table, and really, this command should instead be using `GetParquetFilesByPartitionId` to only request what's needed rather than filtering. * feat: Support looking up Parquet files by either kind of Partition id Regardless of which is actually stored on the Parquet file record. That is, say there's a Partition in the catalog with: Partition { id: 3, hash_id: abcdefg, } and a Parquet file that has: ParquetFile { partition_hash_id: abcdefg, } calling `list_by_partition_not_to_delete(PartitionId(3))` should still return this Parquet file because it is associated with the partition that has ID 3. This is important for the compactor, which is currently only dealing in PartitionIds, and I'd like to keep it that way for now to avoid having to change Even More in this PR. * fix: Use and set new partition ID fields everywhere they want to be --------- Co-authored-by: Dom <dom@itsallbroken.com>	2023-07-31 12:40:56 +00:00
Fraser Savage	5453ad8ba4	feat(router): Include table/column diff for namespace schema cache update This adds some computational overhead during the merging of new namespace schema with what's in the router's local cache, but will allow gossiping of changes.	2023-07-27 13:37:47 +01:00
Dom Dwyer	b4b7822f2b	perf: cache summary statistics in partition FSM Cache the row count & timestamp min/max values within the partition FSM / buffer, and make them available through the Queryable trait. This allows the PartitionData to read the row count of a buffer (either "hot" for writes, a "snapshot" of immutable RecordBatch, or "persisting" for in-flight persisting data). These values will enable early partition pruning.	2023-07-25 14:44:37 +02:00
Fraser Savage	c834ec171f	test(router): Custom partition template API create using `time` tag value is rejected This removes the double negative from the error message and adds coverage at the router's gRPC API level for the rejection of the bad TagValue value.	2023-07-24 13:07:04 +01:00
Fraser Savage	aac4166bf0	fix: Reject `time` as a tag value for custom partition templates Time has a special meaning and can be partitioned on by the strftime formatter. It should not be used as a tag value part in a custom partitioning template.	2023-07-24 12:49:13 +01:00
dependabot[bot]	faa8d44492	chore(deps): Bump thiserror from 1.0.43 to 1.0.44 (#8315 ) Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.43 to 1.0.44. - [Release notes](https://github.com/dtolnay/thiserror/releases) - [Commits](https://github.com/dtolnay/thiserror/compare/1.0.43...1.0.44) --- updated-dependencies: - dependency-name: thiserror dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-24 10:18:44 +00:00
Marco Neumann	004b401a05	chore: upgrade to sqlx 0.7.1 (#8266 ) There are a bunch of dependencies in `Cargo.lock` that are related to mysql. These are NOT compiled at all, and are also not part of `cargo tree`. The reason for the inclusion is a bug in cargo: https://github.com/rust-lang/cargo/issues/10801 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-07-19 12:18:57 +00:00
dependabot[bot]	e33a078128	chore(deps): Bump paste from 1.0.13 to 1.0.14 (#8244 ) Bumps [paste](https://github.com/dtolnay/paste) from 1.0.13 to 1.0.14. - [Release notes](https://github.com/dtolnay/paste/releases) - [Commits](https://github.com/dtolnay/paste/compare/1.0.13...1.0.14) --- updated-dependencies: - dependency-name: paste dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-17 16:10:02 +00:00
Carol (Nichols \|\| Goulding)	cf046d0b3e	refactor: Extract a from implementation for creating TransitionPartitionId	2023-07-17 10:34:01 -04:00
Carol (Nichols \|\| Goulding)	c2606ff3ac	test: Add and use methods creating arbitrary TransitionPartitionId and PartitionHashIds	2023-07-17 09:56:55 -04:00
Carol (Nichols \|\| Goulding)	158c5119d1	fix: Make TransitionPartitionId and PartitionHashId sortable	2023-07-17 09:56:55 -04:00
kodiakhq[bot]	5fa861abab	Merge branch 'main' into savage/individually-sequence-partitions-within-writes	2023-07-10 12:48:37 +00:00
dependabot[bot]	057ee40cb9	chore(deps): Bump thiserror from 1.0.41 to 1.0.43 (#8181 ) Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.41 to 1.0.43. - [Release notes](https://github.com/dtolnay/thiserror/releases) - [Commits](https://github.com/dtolnay/thiserror/compare/1.0.41...1.0.43) --- updated-dependencies: - dependency-name: thiserror dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-07 09:25:12 +00:00
Fraser Savage	54a8f7d007	feat(data_types): Add `Extend<SequenceNumberSet>` for `SequenceNumberSet` Although callers could manually extend the sequence number set by continually adding in an iterator loop or a fold expression, this enables other combinator patterns when dealing with collections of sequence number sets.	2023-07-05 14:23:18 +01:00
kodiakhq[bot]	70a6e60415	Merge branch 'main' into savage/use-u64-for-sequence-number	2023-07-05 12:55:44 +00:00
Marco Neumann	35d93f9475	fix: include `PartitionHashId` in size estimations (#8153 ) As for the other types: size estimations are conservative, so we assume the value behind the `Arc` is owned by the estimating party.	2023-07-05 10:42:39 +00:00
dependabot[bot]	3827257f94	chore(deps): Bump thiserror from 1.0.40 to 1.0.41 (#8149 ) Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.40 to 1.0.41. - [Release notes](https://github.com/dtolnay/thiserror/releases) - [Commits](https://github.com/dtolnay/thiserror/compare/1.0.40...1.0.41) --- updated-dependencies: - dependency-name: thiserror dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom <dom@itsallbroken.com>	2023-07-05 09:25:14 +00:00
dependabot[bot]	9a03d9c9fe	chore(deps): Bump paste from 1.0.12 to 1.0.13 (#8139 ) Bumps [paste](https://github.com/dtolnay/paste) from 1.0.12 to 1.0.13. - [Release notes](https://github.com/dtolnay/paste/releases) - [Commits](https://github.com/dtolnay/paste/compare/1.0.12...1.0.13) --- updated-dependencies: - dependency-name: paste dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-04 07:57:41 +00:00
dependabot[bot]	647541fc12	chore(deps): Bump croaring from 0.8.1 to 0.9.0 (#8088 ) Bumps [croaring](https://github.com/saulius/croaring-rs) from 0.8.1 to 0.9.0. - [Release notes](https://github.com/saulius/croaring-rs/releases) - [Commits](https://github.com/saulius/croaring-rs/commits) --- updated-dependencies: - dependency-name: croaring dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-27 08:10:39 +00:00
Fraser Savage	62cb6594c8	refactor(ingester): Use unsigned sequence number, remove its `Sqlx::Type` Now that sequence numbers are internal to the ingester and the WAL, there's no need for them to be a signed integer. As noted by [#7260](https://github.com/influxdata/influxdb_iox/issues/7260) this was a quirk related to the kafka-based IOx and Postgres only supported signed integers.	2023-06-23 16:39:11 +01:00
Carol (Nichols \|\| Goulding)	0d9f89ae48	test: Add verification of deterministic and collision-resistant properties of PartitionHashId	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	5096164efb	docs: Explain importance of the fixture test and what a failure would mean Co-authored-by: Dom <dom@itsallbroken.com>	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	bffb2f8f9f	fix: Specialize Partition constructors to clarify appropriate usage	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	d991e12fbb	feat: Send PartitionHashId from ingesters to queriers	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	62ba18171a	feat: Add a new hash column on the partition and parquet file tables This will hold the deterministic ID for partitions. Until all existing partitions have this value, this is optional/nullable. The row ID still exists and is used as the main foreign key in the parquet_file and skipped_compaction tables. The hash_id has a unique index so that we can look up records based on it (if it's available). If the parquet file record has a partition_hash_id value, use that to generate the object storage path instead of the partition_id.	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	5411d8b7c8	refactor: Move Partition type and friends to their own file	2023-06-22 08:59:10 -04:00
Marco Neumann	93ecb78ab9	feat: cache decoded partition value ranges (#8002 ) Currently this only works for tags. We may want to decode the time template as well at some point. For #7974.	2023-06-16 09:38:34 +00:00
Marco Neumann	64f573c13f	feat: cache partition template in querier (#7987 ) * feat: impl `Eq` for `TablePartitionTemplateOverride` * feat: `TablePartitionTemplateOverride::size` * feat: cache partition template in querier Required for #7974.	2023-06-15 10:30:56 +00:00
Phil Bracikowski	e34ec77e8d	feat(garbage-collector): batch parquet existence checks to catalog (#7964 ) * feat(garbage-collector): batch parquet existence checks to catalog The core feature of this PR is batching the existence checks of parquet files in object store against the catalog. Before, there was 1 catalog query per each parquet file in object store. This can be a lot of requests. This PR can perform one query of at most 100 parquet file uuids against the catalog in one query. A hundred seems like a decent starting place. The batch may not reach 100 because there is also a timeout on receiving object store meta objects from the object store lister thread. That timeout is set to 100 milliseconds. If more than 100 are received, they are batched into 100 for the catalog. Additionally, this PR includes surrounding code changes to make it more idiomatic (but not perfect). It follows up some suggested work from #7652 for watching for shutdown on the threads. * fixes #7784 * use hashset instead of vec to test for contains * chore: add test for db failure path * remove ParquetFileExistsByOSID and other single field structs that are just for sql deserialization; map to uuid explicitly * fix the sqlite query by using a blob literal X'<hex>' for uuids * comment clarifications * adjust loggings to warn from debug for expected rare events Many thanks to Carol for help implementing this!	2023-06-14 07:59:00 -07:00
Marco Neumann	335d9f7357	chore: minimize proptest features (#7993 )	2023-06-14 12:28:18 +00:00
Carol (Nichols \|\| Goulding)	5761226728	fix: Use the parts method to get the template length The template length should always return a value > 0 because templates must have at least one part. Before this change, `len` would have returned 0 if there was no override because of the `unwrap_or_default`. Instead, use the `parts` method, which takes care of the fallback to the hardcoded default template, whose len will always be 1.	2023-06-12 12:21:14 -04:00
Carol (Nichols \|\| Goulding)	7a99737f16	fix: Only allocate to remove the truncation marker if we need to	2023-06-12 12:05:16 -04:00
Carol (Nichols \|\| Goulding)	5decbae0d5	docs: Clarify some partition template docs	2023-06-12 11:56:24 -04:00
Dom Dwyer	fc49b3ec19	feat: restrict partition template length Partition templates should not contain more than 8 parts, which when combined with a per-part byte limit, bounds the maximum size of a partition key. This commit causes the router to refuse to service a write request that contains > 8 parts in the template - this causes a panic, as it's a broken system invariant and should be an unreachable state. Templates are pre-validated at creation time to contain no more than 8 parts, and are immutable: https://github.com/influxdata/influxdb_iox/pull/7930	2023-06-09 13:44:33 +02:00
Dom Dwyer	39c22a2c29	refactor: expose partition part count Allow a table template override to report the number of template parts within it. This ignores the lint wanting an "is_empty()" method too, because it's misleading and redundant - a template MUST never be empty.	2023-06-09 13:44:32 +02:00
Dom Dwyer	050093df1e	feat: truncate partition key parts at 200 bytes This commit ensures all partition key parts are less than or equal to 200 bytes long. If a string exceeds the 200 byte limit, it is truncated (avoiding splitting unicode code-points or graphemes) and then a single "#" sentinel value is appended. When reversed from the string, these column values are indicated to be suitable for prefix-matching only - a property that is encoded into the type system. This commit takes a conservative approach of not splitting graphemes as outlined in the module documentation, but this could be relaxed in the future if needed.	2023-06-09 13:44:32 +02:00

1 2 3 4 5 ...

582 Commits (65230040083eeca3d8a91905567e7341bd671103)