* revert: "revert: rdkafka/rskafka swapping (#5800)"
This reverts commit b77c3540e1.
* test: Verify write buffer connection_config is parsed as expected
* test: Failing test reproducing the error seen when deploying rdkafka
* fix: Translate k8s-idpe configs to rdkafka configs
* feat: add minimum row_count per file in estiumating compacting memory budget and limit number files per compaction
* chore: cleanup
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* test: add test per review comments
* chore: Apply suggestions from code review
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* test: add one more test that has limit num files larger than total input files
* fix: make the L1 files in tests not overlapped
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
During initialisation, the ingester connects to the Kafka brokers - this
involves per-partition leadership discovery & connection establishment.
These connections are then retained for the lifetime of the process.
Prior to this commit, the ingester would establish a connection to all
partition leaders for a given topic. After this commit, the ingester
connects to only the partition leaders it is going to consume from
(for those shards that it is assigned.)
This limit restricts a single partition to containing at most N rows
before it is marked for persistence (note: being marked for persistence
does not currently prevent further ingest for that partition.)
* feat: initial implementation of memory estimation for a compaction
* feat: estimate size of files and have the right actions for the needed budget
* feat: run candidates in parallel
* fix: have the right name for the column field of the output struct
* feat: add metrics for estimated budgets
* chore: cleanup
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: fix syntax after applying review's suggestions
* refactor: Convert a Vec to VecDeque to go well with pop and push
* chore: remove max_concurrent_size_bytes and input_size_threshold_bytes
* chore: remove input_file_count_threshold
* test: tests for estimate_arrow_bytes_for_file
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: file file_count_threshold for comapcting cold partitions to make it consistent with the hot case and help set up to avoid oom easier
* chore: remove unecessary commments
Esp. this fixes "unused import" warnings when not all features are
enabled, so developer IDEs don't shout.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: make querier RAM pool split a proper feature
- use propre pool names
- expose sizing via CLI/env
Closes https://github.com/influxdata/conductor/issues/1102.
* refactor: improve naming and docs
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: `QueryChunk::as_any`
* feat: allo `ChunkPruner::prune_chunks` to fail
* feat: limit per-table chunk data for every query
Closes#5211.
* fix: address review comments
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>