* refactor: improve addressable heap benchmarks
- don't use the full range for key and order so it is easier to generate
new entries at the top/bottom
- differentiate between "update order to a new random one" and "update
order to the last one", since the latter one is the de facto standard
case for the LRU cache
* refactor: use `BTreeSet` in addressable heap
Our cache system has mostly reads and only a few writes (like the change
rate is low or in other words: we have a high cache HIT rate). It makes
sense to optimize the data structures accordingly. While investigating
https://github.com/influxdata/EAR/issues/4505 it turned out that the
bookkeeping for read operations is quite expensive.
This PR improves
that by optimizing the "addressable heap". That data structure is
basically a map K->V but every entry also has an order O which can be
used to "pop" the first element. This is used to implement the LRU cache
where K is the cache key and O is the "least recently used" time (V is
unused, but that's irrelevant for the problem described here).
The former queue impl. is very expensive for `update_order`, because you
essentially copy large parts of the `VecDeque` twice (once for remove, once
for insert). B-trees handle updates of individual keys better, even if
they are less efficient in the "tiny case" (i.e. just a few keys).
Performance Results
===================
The summary is:
- `insert_n_elements`: improved, not surprising because inserting into
the middle of the vector was pretty slow
- `peek_after_n_elements`: quite noisy, maybe regressed slightly (<+30%)
which is OK. Peeking the `VecDeque` is pretty cheap compared to the
B-tree.
- `get_existing_after_n_elements`: noisy, mostly identical or slightly
slower (<+30%)
- `get_new_after_n_elements`: same, there shouldn't be a difference
because the `VecDeque` / B-tree isn't touched for elements that are
not found, so if there is a difference this is probably LLVM codegen
stuff
- `pop_n_elements`: regressed by up to +200%, which makes sense given
how cheap removing elements in order from the `VecDeque` is compared
to the B-tree. However this operation is only used under memory
pressure when new data is added to the cache system, which is a rare
event and quite expensive to begin with, so this is unlikely to make a
noticeable difference in practice
- `remove_existing_after_n_elements`: improved up to -70%, which makes
sense because removing from the `VecDeque` is very expensive due to
the data copy/move
- `remove_new_after_n_elements`: mostly noise, should logically be the
same because the `VecDeque`/B-tree isn't touched if the element
doesn't exist
- `replace_after_n_elements`: this is "remove"+"insert", improved
accordingly
- `update_order_existing_to_random_after_n_elements`: optimized version
of "replace" that touches the internal `HashMap` less often, improved
accordingly
- `update_order_existing_to_first_after_n_elements`: **THE prime case that
we where aiming for** because it is used to update the "least recently
used" times and is used for every read. Improved by at least -70%,
same argument as "replace" and "remove".
- `update_order_new_after_n_elements`: mostly noise / same,
`VecDeque`/B-tree isn't touched in this case
The detailed results are:
<details>
```
❯ cargo bench -p cache_system --bench addressable_heap -- --baseline btree-pre
Compiling cache_system v0.1.0 (/home/mneumann/src/influxdb_iox/cache_system)
Finished bench [optimized + debuginfo] target(s) in 8.48s
Running benches/addressable_heap.rs (target/release/deps/addressable_heap-e6a3281d83b52007)
insert_n_elements/0 time: [12.621 ns 12.629 ns 12.640 ns]
change: [-3.6323% -3.5654% -3.4848%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
insert_n_elements/1 time: [77.540 ns 77.665 ns 77.824 ns]
change: [-6.7121% -6.5483% -6.3759%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
insert_n_elements/10 time: [565.61 ns 565.89 ns 566.22 ns]
change: [-20.328% -20.232% -20.156%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
insert_n_elements/100 time: [7.3376 µs 7.3438 µs 7.3499 µs]
change: [-2.7056% -2.6147% -2.5146%] (p = 0.00 < 0.05)
Performance has improved.
insert_n_elements/1000 time: [97.249 µs 97.335 µs 97.435 µs]
change: [-13.880% -13.804% -13.717%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
insert_n_elements/10000 time: [1.1371 ms 1.1386 ms 1.1403 ms]
change: [-61.699% -61.650% -61.594%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) low mild
2 (2.00%) high mild
6 (6.00%) high severe
peek_after_n_elements/0 time: [7.2367 ns 7.2398 ns 7.2430 ns]
change: [-33.159% -33.109% -33.064%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
peek_after_n_elements/1 time: [27.329 ns 27.344 ns 27.361 ns]
change: [-15.114% -15.047% -14.968%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe
peek_after_n_elements/10
time: [29.520 ns 29.534 ns 29.548 ns]
change: [-6.4913% -6.3567% -6.2216%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
peek_after_n_elements/100
time: [36.132 ns 36.192 ns 36.264 ns]
change: [-2.5660% -2.3400% -2.1099%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
peek_after_n_elements/1000
time: [40.246 ns 40.312 ns 40.380 ns]
change: [+19.265% +19.583% +19.892%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high mild
peek_after_n_elements/10000
time: [53.840 ns 54.581 ns 55.585 ns]
change: [+21.970% +23.892% +26.167%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
get_existing_after_n_elements/1
time: [29.431 ns 29.451 ns 29.474 ns]
change: [-9.8425% -9.7508% -9.6523%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low mild
6 (6.00%) high mild
3 (3.00%) high severe
get_existing_after_n_elements/10
time: [28.923 ns 28.950 ns 28.981 ns]
change: [-14.267% -14.160% -14.042%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
get_existing_after_n_elements/100
time: [36.280 ns 36.328 ns 36.384 ns]
change: [+4.3315% +4.5087% +4.6797%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
8 (8.00%) high mild
4 (4.00%) high severe
get_existing_after_n_elements/1000
time: [33.089 ns 33.166 ns 33.284 ns]
change: [+6.6989% +7.5179% +8.1580%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
get_existing_after_n_elements/10000
time: [36.041 ns 36.145 ns 36.252 ns]
change: [-6.3166% -4.6691% -3.3693%] (p = 0.00 < 0.05)
Performance has improved.
get_new_after_n_elements/0
time: [9.3569 ns 9.3597 ns 9.3626 ns]
change: [-38.034% -37.914% -37.820%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
get_new_after_n_elements/1
time: [25.029 ns 25.054 ns 25.095 ns]
change: [-13.193% -13.094% -12.946%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
get_new_after_n_elements/10
time: [26.160 ns 26.176 ns 26.191 ns]
change: [-13.138% -13.072% -12.999%] (p = 0.00 < 0.05)
Performance has improved.
get_new_after_n_elements/100
time: [35.086 ns 35.135 ns 35.187 ns]
change: [-1.9261% -1.7566% -1.5940%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
get_new_after_n_elements/1000
time: [28.929 ns 29.010 ns 29.103 ns]
change: [-0.1152% +0.2217% +0.5492%] (p = 0.20 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
get_new_after_n_elements/10000
time: [43.733 ns 44.226 ns 44.830 ns]
change: [+23.061% +24.885% +27.016%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
pop_n_elements/0 time: [7.6553 ns 7.6586 ns 7.6619 ns]
change: [-27.593% -27.545% -27.501%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
pop_n_elements/1 time: [43.055 ns 43.089 ns 43.136 ns]
change: [+10.568% +10.680% +10.798%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
pop_n_elements/10 time: [347.19 ns 347.42 ns 347.68 ns]
change: [+71.785% +71.998% +72.180%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
pop_n_elements/100 time: [4.6767 µs 4.6793 µs 4.6819 µs]
change: [+104.38% +104.56% +104.74%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
pop_n_elements/1000 time: [44.635 µs 44.724 µs 44.872 µs]
change: [+186.13% +186.84% +187.92%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high severe
pop_n_elements/10000 time: [479.51 µs 479.92 µs 480.34 µs]
change: [+179.63% +180.01% +180.38%] (p = 0.00 < 0.05)
Performance has regressed.
remove_existing_after_n_elements/1
time: [52.939 ns 52.996 ns 53.077 ns]
change: [+16.171% +16.313% +16.529%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
remove_existing_after_n_elements/10
time: [62.905 ns 62.968 ns 63.038 ns]
change: [+0.5731% +0.6847% +0.8053%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
remove_existing_after_n_elements/100
time: [97.206 ns 97.310 ns 97.416 ns]
change: [+0.3006% +0.5036% +0.6846%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low mild
3 (3.00%) high mild
remove_existing_after_n_elements/1000
time: [119.75 ns 120.18 ns 120.63 ns]
change: [-10.255% -9.8892% -9.5151%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
remove_existing_after_n_elements/10000
time: [160.66 ns 162.29 ns 164.06 ns]
change: [-71.326% -70.589% -69.801%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
remove_new_after_n_elements/0
time: [21.056 ns 21.067 ns 21.080 ns]
change: [-13.898% -13.844% -13.783%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
remove_new_after_n_elements/1
time: [24.313 ns 24.322 ns 24.332 ns]
change: [-13.770% -13.691% -13.623%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
remove_new_after_n_elements/10
time: [25.672 ns 25.686 ns 25.700 ns]
change: [-12.717% -12.650% -12.585%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
remove_new_after_n_elements/100
time: [35.031 ns 35.060 ns 35.091 ns]
change: [-1.9734% -1.8181% -1.6578%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
remove_new_after_n_elements/1000
time: [29.874 ns 29.978 ns 30.109 ns]
change: [-5.3132% -4.4315% -3.6840%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high severe
remove_new_after_n_elements/10000
time: [36.628 ns 36.872 ns 37.164 ns]
change: [-17.430% -16.074% -14.738%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
replace_after_n_elements/1
time: [55.542 ns 55.609 ns 55.693 ns]
change: [+17.578% +17.732% +17.940%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe
replace_after_n_elements/10
time: [77.437 ns 77.529 ns 77.642 ns]
change: [-13.348% -13.240% -13.114%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
replace_after_n_elements/100
time: [155.57 ns 155.74 ns 155.92 ns]
change: [+3.9468% +4.2718% +4.5275%] (p = 0.00 < 0.05)
Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
replace_after_n_elements/1000
time: [192.74 ns 193.47 ns 194.29 ns]
change: [-19.267% -18.905% -18.445%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
replace_after_n_elements/10000
time: [271.85 ns 275.39 ns 279.82 ns]
change: [-75.027% -74.447% -73.816%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
update_order_existing_to_random_after_n_elements/1
time: [56.459 ns 56.484 ns 56.515 ns]
change: [+15.045% +15.128% +15.219%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
update_order_existing_to_random_after_n_elements/10
time: [79.163 ns 79.229 ns 79.327 ns]
change: [-14.176% -14.090% -13.963%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high severe
update_order_existing_to_random_after_n_elements/100
time: [138.88 ns 139.28 ns 139.90 ns]
change: [-9.5444% -9.2390% -8.7987%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
update_order_existing_to_random_after_n_elements/1000
time: [187.01 ns 188.09 ns 189.60 ns]
change: [-21.513% -20.638% -19.817%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
update_order_existing_to_random_after_n_elements/10000
time: [271.94 ns 278.12 ns 285.56 ns]
change: [-75.123% -74.395% -73.590%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) high mild
6 (6.00%) high severe
update_order_existing_to_first_after_n_elements/1
time: [54.234 ns 54.270 ns 54.310 ns]
change: [+6.9327% +7.0343% +7.1302%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
update_order_existing_to_first_after_n_elements/10
time: [68.528 ns 68.598 ns 68.678 ns]
change: [-10.464% -10.352% -10.242%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
update_order_existing_to_first_after_n_elements/100
time: [114.73 ns 114.85 ns 114.98 ns]
change: [+1.2309% +1.3810% +1.5303%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
update_order_existing_to_first_after_n_elements/1000
time: [148.20 ns 149.64 ns 151.63 ns]
change: [-18.039% -17.103% -15.865%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe
update_order_existing_to_first_after_n_elements/10000
time: [195.62 ns 198.87 ns 203.28 ns]
change: [-71.482% -70.401% -69.197%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
update_order_new_after_n_elements/0
time: [8.0889 ns 8.0925 ns 8.0961 ns]
change: [-84.003% -83.966% -83.943%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
update_order_new_after_n_elements/1
time: [24.156 ns 24.165 ns 24.174 ns]
change: [-16.500% -16.448% -16.400%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
update_order_new_after_n_elements/10
time: [26.888 ns 26.906 ns 26.925 ns]
change: [-11.976% -11.882% -11.791%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
update_order_new_after_n_elements/100
time: [34.980 ns 35.024 ns 35.070 ns]
change: [-6.7389% -6.5648% -6.3944%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
update_order_new_after_n_elements/1000
time: [29.538 ns 29.588 ns 29.643 ns]
change: [+0.3276% +0.5565% +0.7915%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
update_order_new_after_n_elements/10000
time: [35.260 ns 35.536 ns 35.873 ns]
change: [-2.7360% -1.5665% -0.2156%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
```
</details>
* feat: `RemoveIfHandle::remove_if_and_get_with_status`
* fix: avoid tracing flood
Do not create a span for every partition that we get from the cache
system.
Ref https://github.com/influxdata/idpe/issues/17884.
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: regression test for #8378
* fix: avoid recursive locking during LRU shutdown
Fixes the following construct during shutdown:
1. `clean_up_loop` holds `members` lock
2. calls `member.remove_keys`
3. `CallbackHandle::execute_requests` requests upgrades weak ref and gets lock
4. other thread drops last external reference to pool member, the
upgraded weak ref from (3) is now the last strong ref
5. `CallbackHandle::execute_requests` finishes, drops pool member
6. dropping that pool member calls `ResourcePool::unregister_member`
which is the same lock as we got in (1) => deadlock
We now just avoid modifying `members` during shutdown and just hold a
weak ref there. As a side effect, the `last_used` addressable heap moves
around a bit an is no longer `Arc`ed (see updated diagram).
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
I've seen at least one case in prod where the UTC clock goes backwards.
The `TimeProvider` and `Time` interface even warns about that. However
there was a `Sub` impl that would panic if that happens and even though
this was documented, I think we can do better and just not offer a
panicky interface at all.
So this removes the `Sub` impl. and replaces all uses with
`checked_duration_since`.
* feat: batch partition catalog requests in querier
This is mostly wiring that builds on top of the other PRs linked to #8089.
I think we eventually could make the batching code nicer by adding
better wrappers / helpers, but lets do that if we have other batched
caches and this patterns proofs to be useful.
Closes#8089.
* test: extend `test_multi_get`
* test: regression test for #8286
* fix: prevent auto-flush CPU looping
* fix: panic when loading different tables at the same time
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: pull out type
* feat: impl `Loader` for `Arc`
* refactor: have a single `TestLoader`
* refactor: simplify code
* refactor: encapsulate `CancellationSafeFutureReceiver`
Avoid that the querier accesses files that were flagged for deletion a
long time ago. This would happen if the following conditions hold:
- we have very long-running querier pods (e.g. over holidays)
- the table doesn't receive any writes (or the partition if we ever
change the cache granularity), hence the querier is never informed
that its state is out-of-date
- a compactor runs a cold compaction, and by doing so flags a file for
deletion
- the GC finally wants to delete it
This is mostly a safety measure to prevent weird internal server errors
that should nearly never happen. On the other hand I do not want to hunt
Heisenbugs.
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).
I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!
https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html
This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
* refactor: avoid `tokio::spawn` for cache requests
During the happy-path, we don't need any tokio task to drive a cache
loader requests because the future issuing the request just act as a
driver. If that future is cancelled, we place the cache request in an
extra task. This avoid latencies due to task overhead and (task) context
switches for most requests. This may remove a millisecond or two from
latency but also makes the whole thing easier to analyze/profile because
we don't spawn a truckload of tasks.
This trick was borrowed from rskafka.
* refactor: split up code
* feat: rework cache refresh logic
Instead of issuing a single refresh when a GET request for a cached key
comes in, start a background job (using some efficient logic to not
overload tokio) per key that refreshes the key using some exponential
backoff. The timer is reset a new GET request comes in. This has the
following advantages:
- our backoff logic decorrelates the requests
- the longer a key was not used, the less often it will be updated
All test (esp. integration tests) as adjusted accordingly, mostly to
account for the fact that no extra GET is required to start the refresh
timer.
Closes#5720.
* docs: improve
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* refactor: simplify rng overwrite
Co-authored-by: Andrew Lamb <alamb@influxdata.com>