influxdb

Commit Graph

Author	SHA1	Message	Date
Carol (Nichols \|\| Goulding)	8a0fa616cf	fix: Rename columns, tables, indexes and constraints in postgres catalog	2022-09-01 10:00:54 -04:00
Nga Tran	cb10a7c6d8	feat: More accurate memory estimate for compaction (#5471 ) * feat: initial implementation of memory estimation for a compaction * feat: estimate size of files and have the right actions for the needed budget * feat: run candidates in parallel * fix: have the right name for the column field of the output struct * feat: add metrics for estimated budgets * chore: cleanup * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * fix: fix syntax after applying review's suggestions * refactor: Convert a Vec to VecDeque to go well with pop and push * chore: remove max_concurrent_size_bytes and input_size_threshold_bytes * chore: remove input_file_count_threshold * test: tests for estimate_arrow_bytes_for_file Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-30 13:44:44 +00:00
Carol (Nichols \|\| Goulding)	1b49ad25f7	refactor: Rename KafkaTopicId to TopicId	2022-08-29 14:27:02 -04:00
Carol (Nichols \|\| Goulding)	58f0b63cdc	refactor: Rename KafkaTopic to Topic or TopicMetadata or topic name as appropriate	2022-08-29 14:27:02 -04:00
Carol (Nichols \|\| Goulding)	3aa3ae2ba5	docs: Add more comments about why to use ShardIndex or ShardId	2022-08-29 14:07:20 -04:00
Carol (Nichols \|\| Goulding)	74c9529062	fix: Rename KafkaPartition to ShardIndex	2022-08-29 14:07:18 -04:00
Jake Goulding	4abf21c724	refactor: Rename Sequencer (and its entourage) to Shard	2022-08-29 14:06:43 -04:00
Marco Neumann	b2caf54b3a	docs: clarify clamping behavior in `TimestampRange::new` (#5399 ) Closes #5248.	2022-08-15 12:43:01 +00:00
Carol (Nichols \|\| Goulding)	b982bdaf2f	fix: Derive Eq when we derive PartialEq and members can derive Eq Allow this in generated code that we don't control, though. Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq	2022-08-11 15:04:06 -04:00
Andrew Lamb	ee2013ce52	chore: Update docstrings for Partition::sort_keys (#5347 ) * chore: Update docstrings for Partition::sort_keys * docs: describe update details	2022-08-10 10:52:24 +00:00
Dom Dwyer	d003fe0047	refactor: const KafkaPartition::new() Allows this fn to be called from const contexts (useful in test setups).	2022-08-08 14:56:03 +02:00
Marco Neumann	b12ebe1109	fix: do not panic on invalid timestamp ranges (#5249 ) Timestamp ranges come from "untrusted" inputs (via gRPC) and must not lead to panics. The only case where this could happen is at `start > end`. Let's just set `start = end` in this case. Reaonsing: - Semantically this is a sound range, since this is only a somewhat degenerated case of "empty". - We already allow `start = end` to represent "empty" ranges. - We already clamp (and therefore modify) `start` to the valid range. Fixes https://github.com/influxdata/conductor/issues/1080. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-08-01 13:35:34 +00:00
Sam Arnold	3fbe860bb9	fix: interpret [MIN_NANO_TIME, MAX_NANO_TIME) range as all time for optimization (#5231 ) InfluxQL queries can send (technically incorrect) ranges like this, meaning all time but excluding the max nanosecond time. Since this is an important case, we should handle it specially and use the optimized 'all time' handling for meta queries even though this is technically wrong in that it does not filter out column names / measurement names at MAX_NANO_TIME exactly. Closes: https://github.com/influxdata/conductor/issues/1072 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-28 12:24:26 +00:00
Nga Tran	69cb3f2b19	refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151 ) * refactor: remove min_sequnce_number * fix: typos * fix: remove min_sequencer_number from new files from merging main * fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier * test: add test_compactor_collision back but modify the input to make it work woth new changes Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-21 13:51:54 +00:00
Nga Tran	c8f4000f04	feat: Select compaction candidates (#5131 ) * feat: initial implementation for selecting compaction candidates * feat: 2 catalog functions to choose the most thorughput partitions to compact and the selecting candidate function itself * test: tests for the new 2 queries * feat: more tests and metrics for chooing compaction candidates * chore: Apply self suggestions from self review * chore: cleanup * chore: fix doc comment * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * refactor: address review comments * fix: get the right time provider for the tests * refactor: remove the left over compaction_ * fix: typos * fix: make the param name and env name consistent * refactor: make relevant iSomething to uSomething * fix: typo Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com>	2022-07-18 18:05:13 +00:00
Jake Goulding	635f535e0e	refactor: replace level_2 with level_1	2022-07-16 21:49:45 -04:00
Carol (Nichols \|\| Goulding)	d19c468b9d	fix: Remove unused level 1 compaction; move level 2 to level 1 Fixes #5119.	2022-07-13 15:05:09 -04:00
Carol (Nichols \|\| Goulding)	61c023139b	refactor: Switch compaction levels to an enum with values rather than separate consts Bonuses: - Type checking - Validation - Less casting - Exhaustiveness checking - Less use of the numerical value	2022-07-13 11:30:36 -04:00
Carol (Nichols \|\| Goulding)	80b6c5c82f	fix: Correct typo in constant name so searching for COMPACTION_LEVEL returns all (#5077 )	2022-07-08 16:31:52 +00:00
Sam Arnold	e193913ed3	fix: optimize field columns for all-time predicates (#5046 ) * fix: optimize field columns for all-time predicates Also fix timestamp range to allow selecting points at MAX_NANO_TIME * fix: clamp end to MIN_NANO_TIME for safety * refactor: add contains_all method to TimestampRange	2022-07-06 12:01:28 +00:00
Marco Neumann	6f445ccd94	feat: Prune chunks using table summary (stats) (#5017 ) * feat: easy tests of table summary against predicate Helps with #4976. Alternative to #4995. * refactor: address review comments * refactor: address review comments * refactor: address review comments	2022-07-04 09:01:34 +00:00
Marco Neumann	87a8579742	refactor: `ChunkOrder::new` cannot fail (#5004 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-30 22:26:20 +00:00
Marco Neumann	f8b1b847cc	fix: docs+assertions for `TimestampRange` (#4994 ) * docs: fix+clarify timestamp doc comments * feat: assert `TimestampRange` in non-debug mode Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-30 16:45:44 +00:00
Marco Neumann	be53716e4d	refactor: use IDs for `parquet_file.column_set` (#4965 ) * feat: `ColumnRepo::list_by_table_id` * refactor: use IDs for `parquet_file.column_set` Closes #4959. * refactor: introduce `TableSchema::column_id_map`	2022-06-30 15:08:41 +00:00
Carol (Nichols \|\| Goulding)	3049479b78	feat: Implement new querier to ingester config design	2022-06-30 08:26:50 -04:00
Nga Tran	cfcc4b8426	refactor: change level 1 to level 2 preparing for next design changes (#4954 ) * refactor: change level 1 to level 2 preparing for next design changes * fix: make level-2 consistent everywhere * chore: remove unused comments * refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent * chore: add correspinding constants for the comapction levels in the comments Co-authored-by: Dom <dom@itsallbroken.com>	2022-06-29 14:08:58 +00:00
Marco Neumann	215f297162	refactor: parquet file metadata from catalog (#4949 ) * refactor: remove `ParquetFileWithMetadata` * refactor: remove `ParquetFileRepo::parquet_metadata` * refactor: parquet file metadata from catalog Closes #4124.	2022-06-27 15:38:39 +00:00
Marco Neumann	9b8086df74	fix: size estimates (#4950 ) * fix: `Tombstone::size` must include serialized predicate * fix: `CachedPartition::size` must include `Arc` heap allocation Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-27 15:25:32 +00:00
Marco Neumann	0534b80886	fix: `ParquetFile::size` must include column set (#4925 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-22 13:06:02 +00:00
Marco Neumann	c3912e34e9	refactor: store per-file column set in catalog (#4908 ) * refactor: store per-file column set in catalog Together with the table-wide schema and the partition-wide sort key, this should be everything we need to read a parquet file directly into memory without peeking any file-level metadata. The querier will use this to directly load parquet files into the read buffer. WARNING: This requires a catalog wipe! Ref #4124. * refactor: use proper `ColumnSet` type	2022-06-21 10:26:12 +00:00
Nga Tran	72c8cfa6ed	fix: make ChunkOrder i64 data type to accept min sequence number 0 and match with data type of sequence number (#4888 ) * fix: make ChunkOrder u64 data type to accept min sequence number 0 * fix: make ChunkOrder i64 to match with sequence number type Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-17 13:45:17 +00:00
Marco Neumann	0fbff981ec	chore(deps): Bump sqlx to 0.6.0 and uuid to 1 (#4894 ) Closes #4889. Closes #4890. Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-17 10:28:28 +00:00
Dom Dwyer	0da8ec87d5	refactor: always generate a partition key Changes the partitioner to always generate a partition key, even if the column being used to partition doesn't exist. This doesn't functionally change the batch partitioning output, but ensures we always have a non-empty string for the partition key.	2022-06-15 15:38:02 +01:00
Andrew Lamb	005610b172	refactor: remove some `&` use in iox_catalog (#4862 ) * refactor: remove some `&` use in iox_catalog * fix: Update data_types/src/lib.rs	2022-06-15 11:31:49 +00:00
Dom Dwyer	b41ea1d718	refactor: PartitionKey type This commit changes the code base to use a new reference-counted PartitionKey type wrapper, instead of passing a bare String around. This allows the compiler to type check & verify usage of the partition key, instead of passing a bare string around. By reference counting the underlying string, we reduce memory usage for some use cases.	2022-06-14 14:47:56 +01:00
Nga Tran	13c57d524a	feat: Change data type of catalog partition's sort_key from a string to an array of string (#4801 ) * feat: Change data type of catalog Postgres partition's sort_key from a string to an array of string * test: add column with comma * fix: use new protonuf field to avoid incompactible * fix: ensure sort_key is an empty array rather than NULL * refactor: address review comments * refactor: address more comments * chore: clearer comments * chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql * chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql * fix: Rename migration so it will be applied after Co-authored-by: Marko Mikulicic <mkm@influxdata.com>	2022-06-10 13:31:31 +00:00
Andrew Lamb	50697906b1	refactor: Make `DMLWrite::sequence_number` a `SequenceNumber` (#4817 )	2022-06-09 19:36:37 +00:00
Dom Dwyer	d1436c9f06	refactor(data_types): no Timestamp wraparound This commit changes addition/subtraction of Timestamp values to panic if they would trigger under/overflow rather than silently wrapping around.	2022-06-09 13:23:03 +01:00
Carol (Nichols \|\| Goulding)	b2905650aa	refactor: Extract extract_range to be a method on TableSummary So that other kinds of chunks can use this code too.	2022-05-26 16:52:14 -04:00
Carol (Nichols \|\| Goulding)	788e6eaf69	docs: Fix a comment that was very confused about what means kafka partition	2022-05-25 10:04:40 -04:00
Andrew Lamb	4d8ece5524	feat: Add `Tombstone` to querier cache (#4663 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-24 13:21:23 +00:00
Andrew Lamb	e877a64462	feat: Add `ParquetFiles` cache and memory size estimation for ParquetMetadata (#4661 ) * feat: Add `ParquetFiles` cache * fix: Apply suggestions from code review Co-authored-by: Marko Mikulicic <mkm@influxdata.com> * fix: remove commented out debugging println * refactor: Improve size calculation * fix: mark `ParquetFileCache::clear` test only * fix: assert on metric count Co-authored-by: Marko Mikulicic <mkm@influxdata.com>	2022-05-23 17:11:38 +00:00
Marco Neumann	779f0e9cdf	feat: querier RAM pool (#4593 ) * feat: `SortKey::size` * feat: `FunctionEstimator` * feat: querier RAM pool Let's put all the caches into a single RAM pool, so we can at least somewhat control RAM usage. Note that this does NOT limit the peak memory during query execution though, but should at least stop unlimited cache growth. A follow-up PR will add metrics. * refactor: improve some size calculations Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-17 13:11:20 +00:00
kodiakhq[bot]	0f8f294319	Merge branch 'main' into cn/remove-chunk-addr	2022-05-13 13:54:44 +00:00
Carol (Nichols \|\| Goulding)	55313d290a	fix: Update or remove comments that mention NG or OG Connects to #4450.	2022-05-12 16:09:08 -04:00
Carol (Nichols \|\| Goulding)	b581a42fde	fix: Rename new_id_for_ng to new_id Connects to #4450.	2022-05-12 16:09:07 -04:00
Carol (Nichols \|\| Goulding)	faba90d992	fix: Remove ChunkAddr	2022-05-12 15:50:41 -04:00
Carol (Nichols \|\| Goulding)	8545bb60c6	refactor: Move KafkaPartitionWriteStatus to data_types to share more	2022-05-11 14:07:06 -04:00
Carol (Nichols \|\| Goulding)	068096e7e1	fix: Rename data_types2 to data_types	2022-05-06 14:45:39 -04:00
Carol (Nichols \|\| Goulding)	e1bef1c218	fix: Remove OG data_types crate	2022-05-06 14:45:39 -04:00

1 2 3 4 5 ...

401 Commits (1e1d964fdb25f808ec25fc1f49c567e944980604)