* fix: do not panic if measurement name is not the first tag
* feat: add "read group" support to storage CLI
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: read querier parquet files from cache
* refactor: only use parquet files in querier (no RB)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: naive `AddresableHeap::update_order`
* refactor: use `update_order` within LRU policy
* test: add benchmark for `AddressableHeap::update_order`
* refactor: avoid double-hash when updating addressable heap orders
```text
update_order_existing_after_n_elements/1
time: [25.483 ns 25.513 ns 25.547 ns]
change: [-42.490% -42.365% -42.247%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high severe
update_order_existing_after_n_elements/10
time: [68.158 ns 68.211 ns 68.266 ns]
change: [-19.391% -19.131% -18.952%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
update_order_existing_after_n_elements/100
time: [128.10 ns 128.43 ns 128.83 ns]
change: [-17.732% -17.531% -17.255%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
update_order_existing_after_n_elements/1000
time: [223.08 ns 224.06 ns 225.30 ns]
change: [-9.0635% -8.5828% -7.9794%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
update_order_existing_after_n_elements/10000
time: [1.0032 µs 1.0216 µs 1.0402 µs]
change: [-6.0920% -3.7038% -1.0826%] (p = 0.01 < 0.05)
Performance has improved.
update_order_new_after_n_elements/0
time: [35.898 ns 35.919 ns 35.943 ns]
change: [+183.39% +183.77% +184.12%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
update_order_new_after_n_elements/1
time: [13.273 ns 13.299 ns 13.344 ns]
change: [-6.6980% -5.9798% -5.2633%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe
update_order_new_after_n_elements/10
time: [14.010 ns 14.084 ns 14.183 ns]
change: [-13.579% -13.117% -12.553%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) high mild
9 (9.00%) high severe
update_order_new_after_n_elements/100
time: [23.846 ns 23.883 ns 23.921 ns]
change: [-4.7412% -4.3738% -4.0715%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
update_order_new_after_n_elements/1000
time: [28.590 ns 28.646 ns 28.705 ns]
change: [-4.1597% -3.6132% -3.0701%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
update_order_new_after_n_elements/10000
time: [31.459 ns 31.975 ns 32.601 ns]
change: [-32.153% -20.689% -11.961%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
````
Improvements might be even bigger for more expensive hash functions
(e.g. for `K = Arc<str>`).
Note that there is one outlier: `update_order_new_after_n_elements/0`. I
suspect this is due to slightly different compiler decisions (there is
no technical difference for "update a key of an empty heap"). Since this
case is also pretty uncommon in practice (only ~once when the process
boots up), I deem this acceptable.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: use dedicated ports for CLI tests
* chore: update `tracing-subscriber`
* fix: work around tracing-subscriber weirdness
It seems that trogging with tracing-subscriber >= 0.3.14 does not
produce any output at all. I suspect we are hitting
<https://github.com/tokio-rs/tracing/issues/2265>. Let's change the
construct to not use multiple optional layers but a single dyn-dispatch
layer. Logging shouldn't have such a high throughput that his makes any
difference, esp. because the dyn-dispatch happens AFTER the filter.
I'm spending way too long with the wrong number of arguments to
CompactorConfig::new and not a lot of help from the compiler. If these
struct fields are pub, they can be set directly and destructured, etc,
which the compiler gives way more help on. This also reduces duplication
and boilerplate that has to be updated when the config fields change.
Instead of a naive "remove + insert", use a proper insertion routine
that touches the hash map only once.
Note that in case of an override (i.e. the entry with this key already
existed) we need to touch the heap twice, because the sort order likely
changed (we don't optimize the "identical order" case here because it is
pretty unlikely that this will happen in practice).
**Perf results:**
```text
insert_n_elements/0 time: [16.489 ns 16.497 ns 16.506 ns]
change: [-8.1154% -7.9967% -7.8990%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe
insert_n_elements/1 time: [59.806 ns 59.839 ns 59.875 ns]
change: [-14.241% -14.160% -14.086%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) high mild
1 (1.00%) high severe
insert_n_elements/10 time: [601.58 ns 602.26 ns 603.09 ns]
change: [-20.870% -20.714% -20.565%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
insert_n_elements/100 time: [6.9096 µs 6.9161 µs 6.9246 µs]
change: [-18.759% -18.667% -18.553%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
insert_n_elements/1000 time: [107.71 µs 107.76 µs 107.82 µs]
change: [-14.564% -14.427% -14.295%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
insert_n_elements/10000 time: [2.8642 ms 2.8700 ms 2.8765 ms]
change: [-11.079% -10.860% -10.605%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
15 (15.00%) high severe
````
Note that the results are even better for keys with more expansive hash
functions (we have few in the querier).
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
During initialisation, the ingester connects to the Kafka brokers - this
involves per-partition leadership discovery & connection establishment.
These connections are then retained for the lifetime of the process.
Prior to this commit, the ingester would establish a connection to all
partition leaders for a given topic. After this commit, the ingester
connects to only the partition leaders it is going to consume from
(for those shards that it is assigned.)