Commit Graph

401 Commits (1e1d964fdb25f808ec25fc1f49c567e944980604)

Author SHA1 Message Date
Carol (Nichols || Goulding) 8a0fa616cf
fix: Rename columns, tables, indexes and constraints in postgres catalog 2022-09-01 10:00:54 -04:00
Nga Tran cb10a7c6d8
feat: More accurate memory estimate for compaction (#5471)
* feat: initial implementation of memory estimation for a compaction

* feat: estimate size of files and have the right actions for the needed budget

* feat: run candidates in parallel

* fix: have the right name for the column field of the output struct

* feat: add metrics for estimated budgets

* chore: cleanup

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: fix syntax after applying review's suggestions

* refactor: Convert a Vec to VecDeque to go well with pop and push

* chore: remove max_concurrent_size_bytes and input_size_threshold_bytes

* chore: remove input_file_count_threshold

* test: tests for estimate_arrow_bytes_for_file

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-30 13:44:44 +00:00
Carol (Nichols || Goulding) 1b49ad25f7
refactor: Rename KafkaTopicId to TopicId 2022-08-29 14:27:02 -04:00
Carol (Nichols || Goulding) 58f0b63cdc
refactor: Rename KafkaTopic to Topic or TopicMetadata or topic name as appropriate 2022-08-29 14:27:02 -04:00
Carol (Nichols || Goulding) 3aa3ae2ba5
docs: Add more comments about why to use ShardIndex or ShardId 2022-08-29 14:07:20 -04:00
Carol (Nichols || Goulding) 74c9529062
fix: Rename KafkaPartition to ShardIndex 2022-08-29 14:07:18 -04:00
Jake Goulding 4abf21c724
refactor: Rename Sequencer (and its entourage) to Shard 2022-08-29 14:06:43 -04:00
Marco Neumann b2caf54b3a
docs: clarify clamping behavior in `TimestampRange::new` (#5399)
Closes #5248.
2022-08-15 12:43:01 +00:00
Carol (Nichols || Goulding) b982bdaf2f
fix: Derive Eq when we derive PartialEq and members can derive Eq
Allow this in generated code that we don't control, though.

Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq
2022-08-11 15:04:06 -04:00
Andrew Lamb ee2013ce52
chore: Update docstrings for Partition::sort_keys (#5347)
* chore: Update docstrings for Partition::sort_keys

* docs: describe update details
2022-08-10 10:52:24 +00:00
Dom Dwyer d003fe0047 refactor: const KafkaPartition::new()
Allows this fn to be called from const contexts (useful in test setups).
2022-08-08 14:56:03 +02:00
Marco Neumann b12ebe1109
fix: do not panic on invalid timestamp ranges (#5249)
Timestamp ranges come from "untrusted" inputs (via gRPC) and must not
lead to panics. The only case where this could happen is at `start >
end`. Let's just set `start = end` in this case. Reaonsing:

- Semantically this is a sound range, since this is only a somewhat
  degenerated case of "empty".
- We already allow `start = end` to represent "empty" ranges.
- We already clamp (and therefore modify) `start` to the valid range.

Fixes https://github.com/influxdata/conductor/issues/1080.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-01 13:35:34 +00:00
Sam Arnold 3fbe860bb9
fix: interpret [MIN_NANO_TIME, MAX_NANO_TIME) range as all time for optimization (#5231)
InfluxQL queries can send (technically incorrect) ranges like this, meaning all time
but excluding the max nanosecond time.

Since this is an important case, we should handle it specially and use the optimized
'all time' handling for meta queries even though this is technically wrong in that
it does not filter out column names / measurement names at MAX_NANO_TIME exactly.

Closes: https://github.com/influxdata/conductor/issues/1072

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-28 12:24:26 +00:00
Nga Tran 69cb3f2b19
refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151)
* refactor: remove min_sequnce_number

* fix: typos

* fix: remove min_sequencer_number from new files from merging main

* fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier

* test: add test_compactor_collision back but modify the input to make it work woth new changes

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-21 13:51:54 +00:00
Nga Tran c8f4000f04
feat: Select compaction candidates (#5131)
* feat: initial implementation for selecting compaction candidates

* feat: 2 catalog functions to choose the most thorughput partitions to compact and the selecting candidate function itself

* test: tests for the new 2 queries

* feat: more tests and metrics for chooing compaction candidates

* chore: Apply self suggestions from self review

* chore: cleanup

* chore: fix doc comment

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* refactor: address review comments

* fix: get the right time provider for the tests

* refactor: remove the left over compaction_

* fix: typos

* fix: make the param name and env name consistent

* refactor: make relevant iSomething to uSomething

* fix: typo

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2022-07-18 18:05:13 +00:00
Jake Goulding 635f535e0e refactor: replace level_2 with level_1 2022-07-16 21:49:45 -04:00
Carol (Nichols || Goulding) d19c468b9d
fix: Remove unused level 1 compaction; move level 2 to level 1
Fixes #5119.
2022-07-13 15:05:09 -04:00
Carol (Nichols || Goulding) 61c023139b
refactor: Switch compaction levels to an enum with values rather than separate consts
Bonuses:

- Type checking
- Validation
- Less casting
- Exhaustiveness checking
- Less use of the numerical value
2022-07-13 11:30:36 -04:00
Carol (Nichols || Goulding) 80b6c5c82f
fix: Correct typo in constant name so searching for COMPACTION_LEVEL returns all (#5077) 2022-07-08 16:31:52 +00:00
Sam Arnold e193913ed3
fix: optimize field columns for all-time predicates (#5046)
* fix: optimize field columns for all-time predicates

Also fix timestamp range to allow selecting points at MAX_NANO_TIME

* fix: clamp end to MIN_NANO_TIME for safety

* refactor: add contains_all method to TimestampRange
2022-07-06 12:01:28 +00:00
Marco Neumann 6f445ccd94
feat: Prune chunks using table summary (stats) (#5017)
* feat: easy tests of table summary against predicate

Helps with #4976.

Alternative to #4995.

* refactor: address review comments

* refactor: address review comments

* refactor: address review comments
2022-07-04 09:01:34 +00:00
Marco Neumann 87a8579742
refactor: `ChunkOrder::new` cannot fail (#5004)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-30 22:26:20 +00:00
Marco Neumann f8b1b847cc
fix: docs+assertions for `TimestampRange` (#4994)
* docs: fix+clarify timestamp doc comments

* feat: assert `TimestampRange` in non-debug mode

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-30 16:45:44 +00:00
Marco Neumann be53716e4d
refactor: use IDs for `parquet_file.column_set` (#4965)
* feat: `ColumnRepo::list_by_table_id`

* refactor: use IDs for `parquet_file.column_set`

Closes #4959.

* refactor: introduce `TableSchema::column_id_map`
2022-06-30 15:08:41 +00:00
Carol (Nichols || Goulding) 3049479b78
feat: Implement new querier to ingester config design 2022-06-30 08:26:50 -04:00
Nga Tran cfcc4b8426
refactor: change level 1 to level 2 preparing for next design changes (#4954)
* refactor: change level 1 to level 2 preparing for next design changes

* fix: make level-2 consistent everywhere

* chore: remove unused comments

* refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent

* chore: add correspinding constants for the comapction levels in the comments

Co-authored-by: Dom <dom@itsallbroken.com>
2022-06-29 14:08:58 +00:00
Marco Neumann 215f297162
refactor: parquet file metadata from catalog (#4949)
* refactor: remove `ParquetFileWithMetadata`

* refactor: remove `ParquetFileRepo::parquet_metadata`

* refactor: parquet file metadata from catalog

Closes #4124.
2022-06-27 15:38:39 +00:00
Marco Neumann 9b8086df74
fix: size estimates (#4950)
* fix: `Tombstone::size` must include serialized predicate

* fix: `CachedPartition::size` must include `Arc` heap allocation

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-27 15:25:32 +00:00
Marco Neumann 0534b80886
fix: `ParquetFile::size` must include column set (#4925)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-22 13:06:02 +00:00
Marco Neumann c3912e34e9
refactor: store per-file column set in catalog (#4908)
* refactor: store per-file column set in catalog

Together with the table-wide schema and the partition-wide sort key, this should
be everything we need to read a parquet file directly into memory
without peeking any file-level metadata.

The querier will use this to directly load parquet files into the read
buffer.

**WARNING: This requires a catalog wipe!**

Ref #4124.

* refactor: use proper `ColumnSet` type
2022-06-21 10:26:12 +00:00
Nga Tran 72c8cfa6ed
fix: make ChunkOrder i64 data type to accept min sequence number 0 and match with data type of sequence number (#4888)
* fix: make ChunkOrder u64 data type to accept min sequence number 0

* fix: make ChunkOrder i64 to match with sequence number type

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-17 13:45:17 +00:00
Marco Neumann 0fbff981ec
chore(deps): Bump sqlx to 0.6.0 and uuid to 1 (#4894)
Closes #4889.
Closes #4890.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-17 10:28:28 +00:00
Dom Dwyer 0da8ec87d5 refactor: always generate a partition key
Changes the partitioner to always generate a partition key, even if the
column being used to partition doesn't exist. This doesn't functionally
change the batch partitioning output, but ensures we always have a
non-empty string for the partition key.
2022-06-15 15:38:02 +01:00
Andrew Lamb 005610b172
refactor: remove some `&` use in iox_catalog (#4862)
* refactor: remove some `&` use in iox_catalog

* fix: Update data_types/src/lib.rs
2022-06-15 11:31:49 +00:00
Dom Dwyer b41ea1d718 refactor: PartitionKey type
This commit changes the code base to use a new reference-counted
PartitionKey type wrapper, instead of passing a bare String around.

This allows the compiler to type check & verify usage of the partition
key, instead of passing a bare string around. By reference counting the
underlying string, we reduce memory usage for some use cases.
2022-06-14 14:47:56 +01:00
Nga Tran 13c57d524a
feat: Change data type of catalog partition's sort_key from a string to an array of string (#4801)
* feat: Change data type of catalog Postgres partition's sort_key from a string to an array of string

* test: add column with comma

* fix: use new protonuf field to avoid incompactible

* fix: ensure sort_key is an empty array rather than NULL

* refactor: address review comments

* refactor: address more comments

* chore: clearer comments

* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql

* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql

* fix: Rename migration so it will be applied after

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2022-06-10 13:31:31 +00:00
Andrew Lamb 50697906b1
refactor: Make `DMLWrite::sequence_number` a `SequenceNumber` (#4817) 2022-06-09 19:36:37 +00:00
Dom Dwyer d1436c9f06 refactor(data_types): no Timestamp wraparound
This commit changes addition/subtraction of Timestamp values to panic if
they would trigger under/overflow rather than silently wrapping around.
2022-06-09 13:23:03 +01:00
Carol (Nichols || Goulding) b2905650aa
refactor: Extract extract_range to be a method on TableSummary
So that other kinds of chunks can use this code too.
2022-05-26 16:52:14 -04:00
Carol (Nichols || Goulding) 788e6eaf69
docs: Fix a comment that was very confused about what means kafka partition 2022-05-25 10:04:40 -04:00
Andrew Lamb 4d8ece5524
feat: Add `Tombstone` to querier cache (#4663)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-24 13:21:23 +00:00
Andrew Lamb e877a64462
feat: Add `ParquetFiles` cache and memory size estimation for ParquetMetadata (#4661)
* feat: Add `ParquetFiles` cache

* fix: Apply suggestions from code review

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>

* fix: remove commented out debugging println

* refactor: Improve size calculation

* fix: mark `ParquetFileCache::clear` test only

* fix: assert on metric count

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2022-05-23 17:11:38 +00:00
Marco Neumann 779f0e9cdf
feat: querier RAM pool (#4593)
* feat: `SortKey::size`

* feat: `FunctionEstimator`

* feat: querier RAM pool

Let's put all the caches into a single RAM pool, so we can at least
somewhat control RAM usage. Note that this does NOT limit the peak
memory during query execution though, but should at least stop unlimited
cache growth. A follow-up PR will add metrics.

* refactor: improve some size calculations

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-05-17 13:11:20 +00:00
kodiakhq[bot] 0f8f294319
Merge branch 'main' into cn/remove-chunk-addr 2022-05-13 13:54:44 +00:00
Carol (Nichols || Goulding) 55313d290a
fix: Update or remove comments that mention NG or OG
Connects to #4450.
2022-05-12 16:09:08 -04:00
Carol (Nichols || Goulding) b581a42fde
fix: Rename new_id_for_ng to new_id
Connects to #4450.
2022-05-12 16:09:07 -04:00
Carol (Nichols || Goulding) faba90d992
fix: Remove ChunkAddr 2022-05-12 15:50:41 -04:00
Carol (Nichols || Goulding) 8545bb60c6
refactor: Move KafkaPartitionWriteStatus to data_types to share more 2022-05-11 14:07:06 -04:00
Carol (Nichols || Goulding) 068096e7e1
fix: Rename data_types2 to data_types 2022-05-06 14:45:39 -04:00
Carol (Nichols || Goulding) e1bef1c218
fix: Remove OG data_types crate 2022-05-06 14:45:39 -04:00