influxdb

Commit Graph

Author	SHA1	Message	Date
dependabot[bot]	647541fc12	chore(deps): Bump croaring from 0.8.1 to 0.9.0 (#8088 ) Bumps [croaring](https://github.com/saulius/croaring-rs) from 0.8.1 to 0.9.0. - [Release notes](https://github.com/saulius/croaring-rs/releases) - [Commits](https://github.com/saulius/croaring-rs/commits) --- updated-dependencies: - dependency-name: croaring dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-27 08:10:39 +00:00
Carol (Nichols \|\| Goulding)	0d9f89ae48	test: Add verification of deterministic and collision-resistant properties of PartitionHashId	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	5096164efb	docs: Explain importance of the fixture test and what a failure would mean Co-authored-by: Dom <dom@itsallbroken.com>	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	bffb2f8f9f	fix: Specialize Partition constructors to clarify appropriate usage	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	d991e12fbb	feat: Send PartitionHashId from ingesters to queriers	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	62ba18171a	feat: Add a new hash column on the partition and parquet file tables This will hold the deterministic ID for partitions. Until all existing partitions have this value, this is optional/nullable. The row ID still exists and is used as the main foreign key in the parquet_file and skipped_compaction tables. The hash_id has a unique index so that we can look up records based on it (if it's available). If the parquet file record has a partition_hash_id value, use that to generate the object storage path instead of the partition_id.	2023-06-22 09:01:22 -04:00
Carol (Nichols \|\| Goulding)	5411d8b7c8	refactor: Move Partition type and friends to their own file	2023-06-22 08:59:10 -04:00
Marco Neumann	93ecb78ab9	feat: cache decoded partition value ranges (#8002 ) Currently this only works for tags. We may want to decode the time template as well at some point. For #7974.	2023-06-16 09:38:34 +00:00
Marco Neumann	64f573c13f	feat: cache partition template in querier (#7987 ) * feat: impl `Eq` for `TablePartitionTemplateOverride` * feat: `TablePartitionTemplateOverride::size` * feat: cache partition template in querier Required for #7974.	2023-06-15 10:30:56 +00:00
Phil Bracikowski	e34ec77e8d	feat(garbage-collector): batch parquet existence checks to catalog (#7964 ) * feat(garbage-collector): batch parquet existence checks to catalog The core feature of this PR is batching the existence checks of parquet files in object store against the catalog. Before, there was 1 catalog query per each parquet file in object store. This can be a lot of requests. This PR can perform one query of at most 100 parquet file uuids against the catalog in one query. A hundred seems like a decent starting place. The batch may not reach 100 because there is also a timeout on receiving object store meta objects from the object store lister thread. That timeout is set to 100 milliseconds. If more than 100 are received, they are batched into 100 for the catalog. Additionally, this PR includes surrounding code changes to make it more idiomatic (but not perfect). It follows up some suggested work from #7652 for watching for shutdown on the threads. * fixes #7784 * use hashset instead of vec to test for contains * chore: add test for db failure path * remove ParquetFileExistsByOSID and other single field structs that are just for sql deserialization; map to uuid explicitly * fix the sqlite query by using a blob literal X'<hex>' for uuids * comment clarifications * adjust loggings to warn from debug for expected rare events Many thanks to Carol for help implementing this!	2023-06-14 07:59:00 -07:00
Marco Neumann	335d9f7357	chore: minimize proptest features (#7993 )	2023-06-14 12:28:18 +00:00
Carol (Nichols \|\| Goulding)	5761226728	fix: Use the parts method to get the template length The template length should always return a value > 0 because templates must have at least one part. Before this change, `len` would have returned 0 if there was no override because of the `unwrap_or_default`. Instead, use the `parts` method, which takes care of the fallback to the hardcoded default template, whose len will always be 1.	2023-06-12 12:21:14 -04:00
Carol (Nichols \|\| Goulding)	7a99737f16	fix: Only allocate to remove the truncation marker if we need to	2023-06-12 12:05:16 -04:00
Carol (Nichols \|\| Goulding)	5decbae0d5	docs: Clarify some partition template docs	2023-06-12 11:56:24 -04:00
Dom Dwyer	fc49b3ec19	feat: restrict partition template length Partition templates should not contain more than 8 parts, which when combined with a per-part byte limit, bounds the maximum size of a partition key. This commit causes the router to refuse to service a write request that contains > 8 parts in the template - this causes a panic, as it's a broken system invariant and should be an unreachable state. Templates are pre-validated at creation time to contain no more than 8 parts, and are immutable: https://github.com/influxdata/influxdb_iox/pull/7930	2023-06-09 13:44:33 +02:00
Dom Dwyer	39c22a2c29	refactor: expose partition part count Allow a table template override to report the number of template parts within it. This ignores the lint wanting an "is_empty()" method too, because it's misleading and redundant - a template MUST never be empty.	2023-06-09 13:44:32 +02:00
Dom Dwyer	050093df1e	feat: truncate partition key parts at 200 bytes This commit ensures all partition key parts are less than or equal to 200 bytes long. If a string exceeds the 200 byte limit, it is truncated (avoiding splitting unicode code-points or graphemes) and then a single "#" sentinel value is appended. When reversed from the string, these column values are indicated to be suitable for prefix-matching only - a property that is encoded into the type system. This commit takes a conservative approach of not splitting graphemes as outlined in the module documentation, but this could be relaxed in the future if needed.	2023-06-09 13:44:32 +02:00
Carol (Nichols \|\| Goulding)	9524e7e478	docs: Remove TODO comment that's TODONE (#7956 ) * docs: Remove TODO comment that's TODONE * docs: Oops, turns out the TODO comment was this enum's documentation --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-08 19:22:22 +00:00
Marko Mikulicic	d26ad8e079	feat: Allow passing service protection limits in create db gRPC call (#7941 ) * feat: Allow passing service protection limits in create db gRPC call * fix: Move the impl into the catalog namespace trait --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-06-08 14:28:32 +00:00
Dom Dwyer	60d3ae403f	fix: panic when using %#z time formatter Props to proptesting for this one - the prop_arbitrary_strftime_format() randomly generated the formatting sequence "%#z" which turns out to be an undocumented way of causing a panic in chrono: `088b69372e/src/format/mod.rs (L673)` In fact, the docs actually list is as a usable sequence!	2023-06-08 14:28:03 +02:00
Phil Bracikowski	92a83270f3	fix(garbage-collector): just test parquet file exists (#7948 ) * fix(garbage-collector): just test parquet file existence The GC, when checking files in object store against the catalog, only cares if the parquet file for the given object store id exists in the catalog. It doesn't need the full parquet file. Let's not transmit it over the wire. This PR uses a SELECT 1 and boolean to test for parquet file existing. * helps #7784 * chore: use struct for from_row * chore: satisfy clippy * chore: fmt	2023-06-07 15:12:48 -07:00
Carol (Nichols \|\| Goulding)	d0db1194e2	feat: Validate custom partition templates on their creation Make sure custom partition templates have: - At least one part - No more than 8 parts - Only nonempty, valid strftime formats	2023-06-07 11:38:12 -04:00
Carol (Nichols \|\| Goulding)	ac26ceef91	feat: Make a place to do partition template validation - Create data_types::partition_template::ValidationError - Make creation of NamespacePartitionTemplateOverride and TablePartitionTemplateOverride fallible - Move SerializationWrapper into a module to make its inner field private to force creation through one fallible constructor; this is where the validation logic will go to be shared among all uses of partition templates	2023-06-07 11:38:12 -04:00
Dom Dwyer	6bb4f20d7c	refactor: remove redundant test test_partition_key was recreated below via a test generator.	2023-06-01 17:44:43 +02:00
Dom	c907916871	docs: fix comment Co-authored-by: Fraser Savage <fsavage@influxdata.com>	2023-05-31 16:04:08 +01:00
Dom Dwyer	27bef292a3	feat: unambiguously reversible partition keys This commit changes the format of partition keys when generated with non-default partition key templates ONLY. A prior fixture test is unchanged by this commit, ensuring the default partition keys remain the same. When a custom partition key template is provided, it may specify one or more parts, with the TagValue template causing values extracted from tag columns to appear in the derived partition key. This commit changes the generated partition key in the following ways: * The delimiter of multi-part partition keys; the character used to delimit partition key parts is changed from "/" to "\|" (the pipe character) as it is less likely to occur in user-provided input, reducing the encoding overhead. * The format of the extracted TagValue values (see below). Building on the work of custom partition key overrides, where an immutable partition template is resolved and set at table creation time, the changes in this PR enable the derived partition key to be unambiguously reversed into the set of tag (column_name, column_value) tuples it was generated from for use in query pruning logic. This is implemented by the build_column_values() method in this commit, which requires both the template, and the derived partition key. Prior to this commit, a partition key value extracted from a tag column was in the form "tagname_x" where "x" is the value and "tagname" is the name of the tag column it was extracted from. After this commit, the partition key value is in the form "x"; the column name is removed from the derived string to reduce the catalog storage overhead (a key driver of COGS). In the case of a NULL tag value, the sentinel value "!" is inserted instead of the prior "tagname_" marker. In the case of an empty string tag value (""), the sentinel "^" value is inserted instead of the "tagname_-" marker, ensuring the distinction between an empty value and a not-present tag is preserved. Additionally tag values utilise percent encoding to encode reserved characters (part delimiter, empty sentinel character, % itself) to eliminate deserialisation ambiguity. Examples of how this has changed derived partition keys, for a template of [Time(YYYY-MM-DD), TagValue(region), TagValue(bananas)]: Write: time=1970-01-01,region=west,other=ignored Old: "1970-01-01-region_west-bananas" New: "1970-01-01\|west\|!" Write: time=1970-01-01,other=ignored Old: "1970-01-01-region-bananas" New: "1970-01-01\|!\|!"	2023-05-30 15:58:25 +02:00
Dom Dwyer	9e0570f2bf	refactor: explicit submod for partition_template Move the import into the submodule itself, rather than re-exporting it at the crate level. This will make it possible to link to the specific module/logic.	2023-05-30 15:13:20 +02:00
Carol (Nichols \|\| Goulding)	e1a93252c5	feat: Add a new table service crate	2023-05-25 10:44:57 -04:00
Carol (Nichols \|\| Goulding)	e67e336a88	docs: Explain why the partition template types are implemented the way they are	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	efc817c2a8	fix: Remove From impl, leaving TablePartitionTemplateOverride::new as only creation mechanism This makes it clearer that you do or do not have a custom table override (in the first argument to `new`).	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	73b09d895f	feat: Store and handle NULL partition_template database values Treat them as the default partition template in the application, but save space and avoid having to backfill the tables by having the database values be NULL when no custom template has been specified.	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	c8712bbc90	fix: Add a fixture test encoding and documenting default partition template assumptions	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	42804a20bc	fix: Switch to using Sqlite when encoding so there's no extra 1 in the JSON	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	d713ba935a	refactor: Reduce duplication of encode/decode implementations This is much less gobbledygook.	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	c479ed184d	refactor: Rearrange definitions in the partition_template module Move the application types to the top, which puts all the sqlx conversion gobbledygook at the end because it's an internal implementation detail I'm about to refactor Git probably isn't going to display this in a super obvious way, but this commit is only moving code around, not changing any of it	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	9c0faa66f0	feat: Set a table partition template explicitly or from the namespace And use the table partition template when partitioning writes to that table.	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	afb3838437	feat: Optionally supply the namespace partition template when creating a namespace	2023-05-24 10:10:34 -04:00
dependabot[bot]	24a4f36d24	chore(deps): Bump proptest from 1.1.0 to 1.2.0 (#7857 ) Bumps [proptest](https://github.com/proptest-rs/proptest) from 1.1.0 to 1.2.0. - [Release notes](https://github.com/proptest-rs/proptest/releases) - [Changelog](https://github.com/proptest-rs/proptest/blob/master/CHANGELOG.md) - [Commits](https://github.com/proptest-rs/proptest/compare/v1.1.0...v1.2.0) --- updated-dependencies: - dependency-name: proptest dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-24 09:21:32 +00:00
Dom Dwyer	928a4d163e	build: remove unused dependencies from crates This commit fixes loads of crates (47!) had unused dependencies, or mis-configured dependencies (test deps as normal deps). I added the "unused_crate_dependencies" to all crates to help prevent this mess from growing again! https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html This has the minor downside of false-positives when specifying dev-dependencies for test/bench binaries - these are files in /test or /benches (not normal tests). This commit includes a workaround, importing them in lib.rs (gated by a feature flag). I think the trade-off of better dependency management is worth it!	2023-05-23 14:55:43 +02:00
Andrew Lamb	6344fe8c3f	chore: Add rationale for `clippy::future_not_send` (#7822 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-18 16:58:56 +00:00
Dom	6aa634c1b9	Merge branch 'main' into cn/move-peas	2023-05-15 13:29:42 +01:00
Dom Dwyer	7f52959d29	perf: move column names for Schema construction When converting from a ColumnsByName into a schema::Schema instance, move the column names instead of cloning them.	2023-05-15 12:31:19 +02:00
Dom Dwyer	160628a7f8	refactor: impl intoIterator for ColumnsByName Allows the ColumnsByName to be converted into an iterator yielding owned column names & schema.	2023-05-15 12:30:11 +02:00
Dom	6257918d4c	Merge branch 'main' into kayagokalp/7764	2023-05-15 11:01:08 +01:00
kayagokalp	cb0fb92d86	refactor: remove borrowed from impl of ColumnsByName for Schema	2023-05-13 15:39:06 +03:00
Carol (Nichols \|\| Goulding)	14007808bd	fix: Move remaining conversions between data types and proto into data_types And have data_types depend on generated_types rather than vice versa.	2023-05-12 13:31:04 -04:00
Carol (Nichols \|\| Goulding)	92e5036943	fix: Size of ColumnSet shouldn't be using ChunkId (#7786 )	2023-05-12 14:58:03 +00:00
Dom Dwyer	01205e9671	refactor: assert Column.table_id matches Include an invariant assert when adding a Column to a TableSchema, ensuring the table IDs match.	2023-05-09 14:55:04 +02:00
Dom Dwyer	18c6d9e306	refactor: remove unnecessary "to_owned()" call This method now takes an owned name, so no need to call to_owned()!	2023-05-09 14:55:03 +02:00
Dom Dwyer	ab666ea5fa	refactor: owned ColumnsByName constructor only Refactors the From<BtreeMap> impl that accepted a &str name for ColumnsByName construction, instead allowing only the owned String, and updating the test that makes use of it appropriately.	2023-05-09 14:55:03 +02:00

1 2 3 4 5 ...

549 Commits (1acbf4a20d1600ada0c4905f1138930e144d50f4)