influxdb

Commit Graph

Author	SHA1	Message	Date
Marco Neumann	9b48437711	refactor: make influx column type mandatory (#5978 ) We basically assume everywhere that a column falls into one of the three known categories (time, tag, field), so lets encode this in our type system instead of defining "unknown" as "undefined behavior, may or may not crash". Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-10-26 11:20:29 +00:00
Carol (Nichols \|\| Goulding)	2e83e04eab	feat: Use workspace package metadata to reduce differences and repetition	2022-10-24 13:04:09 -04:00
Carol (Nichols \|\| Goulding)	efb964c390	feat: Enforce table column limits from the schema cache (#5819 ) * fix: Avoid some allocations by collecting instead of inserting into a vec * refactor: Encode that adding columns is for one table at a time * test: Add another test of column limits * test: Add below/above limit tests for create_or_get_many * fix: Explicitly DO NOT check column limits when inserting many columns * feat: Cache the max_columns_per_table on the NamespaceSchema * feat: Add a function to validate column limits in-memory * fix: Provide more useful information when over column limits * fix: Swap types to remove intermediate allocation * docs: Explain the interactions of the cache and the column limits * test: Actually set up test that showcases column limit race condition * fix: Allow writing to existing columns even if table is over column limit Co-authored-by: Dom <dom@itsallbroken.com>	2022-10-14 11:34:17 +00:00
Dom Dwyer	3e70dc44a0	refactor(catalog): remove partition_info_by_id() This method used to return a subset of partition metadata, and was used exclusively for persistence in the ingester. It is now no longer necessary.	2022-10-13 15:26:36 +02:00
Dom Dwyer	726b1d1d3b	refactor: PartitionData carries parent IDs This commit changes the PartitionData buffer structure to carry the IDs of all its parents - the table, namespace, and shard. Previously only the table & shard were carried.	2022-09-29 15:07:03 +02:00
Dom	e9bd03b77c	Merge branch 'main' into dom/partition-contains-key	2022-09-29 12:32:35 +01:00
Dom Dwyer	f5a7fbf8e2	refactor: PartitionData carries PartitionKey Changes the PartitionData to carry the derived PartitionKey for which it is buffering ops for. This is used at persist time.	2022-09-29 13:22:50 +02:00
Dom Dwyer	cd4087e00d	style: add no todo!() or dbg!() lints Some crates had theme, some not - lets be consistent and have the compiler spot dbg!() and todo!() macro calls - they should never be in prod code!	2022-09-29 13:10:07 +02:00
Dom Dwyer	2068ff394b	perf(ingester): cache Partition This commit implements a PartitionCache decorator over the PartitionProvider abstraction. When an ingester starts up, the internal data structures are empty and are lazily initialised for each namespace / table / partition as they are observed in the stream of DML ops. This lazy initialisation includes resolving the Partition ID and last persisted sequence number offset value from the catalog for each partition in each table in each namespace for which an op is observed - this occurs in the hot path, while blocking ingest for a shard. resolving each partition will cause a catalog query, this can cause a spike in queries against the catalog, also resulting in unnecessarily slow ingester recovery - we're effectively lazily warming a cache of PartitionData in the hot path! Instead this cache can be used to pre-warm the N most recently created partitions (which are likely to have ongoing writes) at startup to eliminate the hot-path overhead and associated catalog queries. NOTE: unlike most of the other hot-path queries, partition persist offset resolution cannot be eliminated by changes to the Kafka wire format.	2022-09-27 17:15:57 +02:00
Nga Tran	75ff805ee2	feat: instead of adding num_files and memory budget into the reason text column, let us create differnt columns for them. We will be able to filter them easily (#5742 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-09-26 20:14:04 +00:00
Carol (Nichols \|\| Goulding)	c0c0349bc5	fix: Use typed Time values rather than ns	2022-09-19 12:59:20 -04:00
Carol (Nichols \|\| Goulding)	e05657e8a4	feat: Make filter_parquet_files more general with regards to compaction level	2022-09-15 14:53:08 -04:00
Dom Dwyer	66bf0ff272	refactor(db): NULLable persisted_sequence_number Makes the partition.persisted_sequence_number column in the catalog DB NULLable. 0 is a valid persisted sequence number.	2022-09-15 18:19:39 +02:00
Dom Dwyer	f4cc9a6984	docs: partition persist visibility invariants Document the invariants (and non-invariants) of Partition.persisted_sequence_number.	2022-09-15 16:10:35 +02:00
Dom Dwyer	d199a83355	feat(catalog): per-partition persist mark API Adds the "persisted_sequence_number" field to the Partition model, and updates the catalog API to read & update it.	2022-09-15 16:10:35 +02:00
Carol (Nichols \|\| Goulding)	8a594621bc	fix: Converting a Column to a ColumnSchema can now be infallible too	2022-09-12 17:35:52 -04:00
Carol (Nichols \|\| Goulding)	20e6d26aa9	refactor: Have sqlx decode ColumnTypes in the catalog	2022-09-12 16:50:25 -04:00
Carol (Nichols \|\| Goulding)	10ba3fef47	feat: Compact cold partitions completely Fixes #5330.	2022-09-12 13:13:26 -04:00
Carol (Nichols \|\| Goulding)	fbe3e360d2	feat: Record skipped compactions in memory Connects to #5458.	2022-09-09 15:31:07 -04:00
dependabot[bot]	29800044fe	chore(deps): Bump percent-encoding from 2.1.0 to 2.2.0 (#5597 ) Bumps [percent-encoding](https://github.com/servo/rust-url) from 2.1.0 to 2.2.0. - [Release notes](https://github.com/servo/rust-url/releases) - [Commits](https://github.com/servo/rust-url/compare/percent-encoding-v2.1.0...v2.2.0) --- updated-dependencies: - dependency-name: percent-encoding dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-09-09 07:50:49 +00:00
YIXIAO SHI	52ae60bf2e	chore: fix comment typo (#5551 ) Co-authored-by: Dom <dom@itsallbroken.com>	2022-09-07 08:49:29 +00:00
Carol (Nichols \|\| Goulding)	8a0fa616cf	fix: Rename columns, tables, indexes and constraints in postgres catalog	2022-09-01 10:00:54 -04:00
Nga Tran	cb10a7c6d8	feat: More accurate memory estimate for compaction (#5471 ) * feat: initial implementation of memory estimation for a compaction * feat: estimate size of files and have the right actions for the needed budget * feat: run candidates in parallel * fix: have the right name for the column field of the output struct * feat: add metrics for estimated budgets * chore: cleanup * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * fix: fix syntax after applying review's suggestions * refactor: Convert a Vec to VecDeque to go well with pop and push * chore: remove max_concurrent_size_bytes and input_size_threshold_bytes * chore: remove input_file_count_threshold * test: tests for estimate_arrow_bytes_for_file Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-30 13:44:44 +00:00
Carol (Nichols \|\| Goulding)	1b49ad25f7	refactor: Rename KafkaTopicId to TopicId	2022-08-29 14:27:02 -04:00
Carol (Nichols \|\| Goulding)	58f0b63cdc	refactor: Rename KafkaTopic to Topic or TopicMetadata or topic name as appropriate	2022-08-29 14:27:02 -04:00
Carol (Nichols \|\| Goulding)	3aa3ae2ba5	docs: Add more comments about why to use ShardIndex or ShardId	2022-08-29 14:07:20 -04:00
Carol (Nichols \|\| Goulding)	74c9529062	fix: Rename KafkaPartition to ShardIndex	2022-08-29 14:07:18 -04:00
Jake Goulding	4abf21c724	refactor: Rename Sequencer (and its entourage) to Shard	2022-08-29 14:06:43 -04:00
Marco Neumann	b2caf54b3a	docs: clarify clamping behavior in `TimestampRange::new` (#5399 ) Closes #5248.	2022-08-15 12:43:01 +00:00
Carol (Nichols \|\| Goulding)	b982bdaf2f	fix: Derive Eq when we derive PartialEq and members can derive Eq Allow this in generated code that we don't control, though. Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq	2022-08-11 15:04:06 -04:00
Andrew Lamb	ee2013ce52	chore: Update docstrings for Partition::sort_keys (#5347 ) * chore: Update docstrings for Partition::sort_keys * docs: describe update details	2022-08-10 10:52:24 +00:00
Dom Dwyer	d003fe0047	refactor: const KafkaPartition::new() Allows this fn to be called from const contexts (useful in test setups).	2022-08-08 14:56:03 +02:00
Marco Neumann	b12ebe1109	fix: do not panic on invalid timestamp ranges (#5249 ) Timestamp ranges come from "untrusted" inputs (via gRPC) and must not lead to panics. The only case where this could happen is at `start > end`. Let's just set `start = end` in this case. Reaonsing: - Semantically this is a sound range, since this is only a somewhat degenerated case of "empty". - We already allow `start = end` to represent "empty" ranges. - We already clamp (and therefore modify) `start` to the valid range. Fixes https://github.com/influxdata/conductor/issues/1080. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-01 13:35:34 +00:00
Sam Arnold	3fbe860bb9	fix: interpret [MIN_NANO_TIME, MAX_NANO_TIME) range as all time for optimization (#5231 ) InfluxQL queries can send (technically incorrect) ranges like this, meaning all time but excluding the max nanosecond time. Since this is an important case, we should handle it specially and use the optimized 'all time' handling for meta queries even though this is technically wrong in that it does not filter out column names / measurement names at MAX_NANO_TIME exactly. Closes: https://github.com/influxdata/conductor/issues/1072 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-28 12:24:26 +00:00
Nga Tran	69cb3f2b19	refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151 ) * refactor: remove min_sequnce_number * fix: typos * fix: remove min_sequencer_number from new files from merging main * fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier * test: add test_compactor_collision back but modify the input to make it work woth new changes Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-21 13:51:54 +00:00
Nga Tran	c8f4000f04	feat: Select compaction candidates (#5131 ) * feat: initial implementation for selecting compaction candidates * feat: 2 catalog functions to choose the most thorughput partitions to compact and the selecting candidate function itself * test: tests for the new 2 queries * feat: more tests and metrics for chooing compaction candidates * chore: Apply self suggestions from self review * chore: cleanup * chore: fix doc comment * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * refactor: address review comments * fix: get the right time provider for the tests * refactor: remove the left over compaction_ * fix: typos * fix: make the param name and env name consistent * refactor: make relevant iSomething to uSomething * fix: typo Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2022-07-18 18:05:13 +00:00
Jake Goulding	635f535e0e	refactor: replace level_2 with level_1	2022-07-16 21:49:45 -04:00
Carol (Nichols \|\| Goulding)	d19c468b9d	fix: Remove unused level 1 compaction; move level 2 to level 1 Fixes #5119.	2022-07-13 15:05:09 -04:00
Carol (Nichols \|\| Goulding)	61c023139b	refactor: Switch compaction levels to an enum with values rather than separate consts Bonuses: - Type checking - Validation - Less casting - Exhaustiveness checking - Less use of the numerical value	2022-07-13 11:30:36 -04:00
Carol (Nichols \|\| Goulding)	80b6c5c82f	fix: Correct typo in constant name so searching for COMPACTION_LEVEL returns all (#5077 )	2022-07-08 16:31:52 +00:00
Sam Arnold	e193913ed3	fix: optimize field columns for all-time predicates (#5046 ) * fix: optimize field columns for all-time predicates Also fix timestamp range to allow selecting points at MAX_NANO_TIME * fix: clamp end to MIN_NANO_TIME for safety * refactor: add contains_all method to TimestampRange	2022-07-06 12:01:28 +00:00
Marco Neumann	6f445ccd94	feat: Prune chunks using table summary (stats) (#5017 ) * feat: easy tests of table summary against predicate Helps with #4976. Alternative to #4995. * refactor: address review comments * refactor: address review comments * refactor: address review comments	2022-07-04 09:01:34 +00:00
Marco Neumann	87a8579742	refactor: `ChunkOrder::new` cannot fail (#5004 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-30 22:26:20 +00:00
Marco Neumann	f8b1b847cc	fix: docs+assertions for `TimestampRange` (#4994 ) * docs: fix+clarify timestamp doc comments * feat: assert `TimestampRange` in non-debug mode Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-30 16:45:44 +00:00
Marco Neumann	be53716e4d	refactor: use IDs for `parquet_file.column_set` (#4965 ) * feat: `ColumnRepo::list_by_table_id` * refactor: use IDs for `parquet_file.column_set` Closes #4959. * refactor: introduce `TableSchema::column_id_map`	2022-06-30 15:08:41 +00:00
Carol (Nichols \|\| Goulding)	3049479b78	feat: Implement new querier to ingester config design	2022-06-30 08:26:50 -04:00
Nga Tran	cfcc4b8426	refactor: change level 1 to level 2 preparing for next design changes (#4954 ) * refactor: change level 1 to level 2 preparing for next design changes * fix: make level-2 consistent everywhere * chore: remove unused comments * refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent * chore: add correspinding constants for the comapction levels in the comments Co-authored-by: Dom <dom@itsallbroken.com>	2022-06-29 14:08:58 +00:00
Marco Neumann	215f297162	refactor: parquet file metadata from catalog (#4949 ) * refactor: remove `ParquetFileWithMetadata` * refactor: remove `ParquetFileRepo::parquet_metadata` * refactor: parquet file metadata from catalog Closes #4124.	2022-06-27 15:38:39 +00:00
Marco Neumann	9b8086df74	fix: size estimates (#4950 ) * fix: `Tombstone::size` must include serialized predicate * fix: `CachedPartition::size` must include `Arc` heap allocation Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-27 15:25:32 +00:00
Marco Neumann	0534b80886	fix: `ParquetFile::size` must include column set (#4925 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-22 13:06:02 +00:00

1 2 3 4 5 ...

422 Commits (8697ef49673c29389b76b852f421f7f65021d905)