Commit Graph

49211 Commits (1827866d00f0c77c05967552ff1c55fccd7a896c)

Author SHA1 Message Date
Dom c0f62ce3e1
Merge pull request #8709 from influxdata/dom/rpcwrite-health-livelock
fix(router): health probe livelock / missing probes
2023-09-11 18:07:34 +01:00
Dom ffa3c39dbc
Merge branch 'main' into dom/rpcwrite-health-livelock 2023-09-11 17:37:07 +01:00
Marco Neumann a17c9a2bf8
refactor: use `BTreeSet` in addressable heap (#8710)
* refactor: improve addressable heap benchmarks

- don't use the full range for key and order so it is easier to generate
  new entries at the top/bottom
- differentiate between "update order to a new random one" and "update
  order to the last one", since the latter one is the de facto standard
  case for the LRU cache

* refactor: use `BTreeSet` in addressable heap

Our cache system has mostly reads and only a few writes (like the change
rate is low or in other words: we have a high cache HIT rate). It makes
sense to optimize the data structures accordingly. While investigating
https://github.com/influxdata/EAR/issues/4505 it turned out that the
bookkeeping for read operations is quite expensive.

This PR improves
that by optimizing the "addressable heap". That data structure is
basically a map K->V but every entry also has an order O which can be
used to "pop" the first element. This is used to implement the LRU cache
where K is the cache key and O is the "least recently used" time (V is
unused, but that's irrelevant for the problem described here).

The former queue impl. is very expensive for `update_order`, because you
essentially copy large parts of the `VecDeque` twice (once for remove, once
for insert). B-trees handle updates of individual keys better, even if
they are less efficient in the "tiny case" (i.e. just a few keys).

Performance Results
===================

The summary is:

- `insert_n_elements`: improved, not surprising because inserting into
  the middle of the vector was pretty slow
- `peek_after_n_elements`: quite noisy, maybe regressed slightly (<+30%)
  which is OK. Peeking the `VecDeque` is pretty cheap compared to the
  B-tree.
- `get_existing_after_n_elements`: noisy, mostly identical or slightly
  slower (<+30%)
- `get_new_after_n_elements`: same, there shouldn't be a difference
  because the `VecDeque` / B-tree isn't touched for elements that are
  not found, so if there is a difference this is probably LLVM codegen
  stuff
- `pop_n_elements`: regressed by up to +200%, which makes sense given
  how cheap removing elements in order from the `VecDeque` is compared
  to the B-tree. However this operation is only used under memory
  pressure when new data is added to the cache system, which is a rare
  event and quite expensive to begin with, so this is unlikely to make a
  noticeable difference in practice
- `remove_existing_after_n_elements`: improved up to -70%, which makes
  sense because removing from the `VecDeque` is very expensive due to
  the data copy/move
- `remove_new_after_n_elements`: mostly noise, should logically be the
  same because the `VecDeque`/B-tree isn't touched if the element
  doesn't exist
- `replace_after_n_elements`: this is "remove"+"insert", improved
  accordingly
- `update_order_existing_to_random_after_n_elements`: optimized version
  of "replace" that touches the internal `HashMap` less often, improved
  accordingly
- `update_order_existing_to_first_after_n_elements`: **THE prime case that
  we where aiming for** because it is used to update the "least recently
  used" times and is used for every read. Improved by at least -70%,
  same argument as "replace" and "remove".
- `update_order_new_after_n_elements`: mostly noise / same,
  `VecDeque`/B-tree isn't touched in this case

The detailed results are:

<details>

```
❯ cargo bench -p cache_system --bench addressable_heap -- --baseline btree-pre
   Compiling cache_system v0.1.0 (/home/mneumann/src/influxdb_iox/cache_system)
    Finished bench [optimized + debuginfo] target(s) in 8.48s
     Running benches/addressable_heap.rs (target/release/deps/addressable_heap-e6a3281d83b52007)
insert_n_elements/0     time:   [12.621 ns 12.629 ns 12.640 ns]
                        change: [-3.6323% -3.5654% -3.4848%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
insert_n_elements/1     time:   [77.540 ns 77.665 ns 77.824 ns]
                        change: [-6.7121% -6.5483% -6.3759%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe
insert_n_elements/10    time:   [565.61 ns 565.89 ns 566.22 ns]
                        change: [-20.328% -20.232% -20.156%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
insert_n_elements/100   time:   [7.3376 µs 7.3438 µs 7.3499 µs]
                        change: [-2.7056% -2.6147% -2.5146%] (p = 0.00 < 0.05)
                        Performance has improved.
insert_n_elements/1000  time:   [97.249 µs 97.335 µs 97.435 µs]
                        change: [-13.880% -13.804% -13.717%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
insert_n_elements/10000 time:   [1.1371 ms 1.1386 ms 1.1403 ms]
                        change: [-61.699% -61.650% -61.594%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  6 (6.00%) high severe

peek_after_n_elements/0 time:   [7.2367 ns 7.2398 ns 7.2430 ns]
                        change: [-33.159% -33.109% -33.064%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
peek_after_n_elements/1 time:   [27.329 ns 27.344 ns 27.361 ns]
                        change: [-15.114% -15.047% -14.968%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
peek_after_n_elements/10
                        time:   [29.520 ns 29.534 ns 29.548 ns]
                        change: [-6.4913% -6.3567% -6.2216%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
peek_after_n_elements/100
                        time:   [36.132 ns 36.192 ns 36.264 ns]
                        change: [-2.5660% -2.3400% -2.1099%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe
peek_after_n_elements/1000
                        time:   [40.246 ns 40.312 ns 40.380 ns]
                        change: [+19.265% +19.583% +19.892%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild
peek_after_n_elements/10000
                        time:   [53.840 ns 54.581 ns 55.585 ns]
                        change: [+21.970% +23.892% +26.167%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

get_existing_after_n_elements/1
                        time:   [29.431 ns 29.451 ns 29.474 ns]
                        change: [-9.8425% -9.7508% -9.6523%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  6 (6.00%) high mild
  3 (3.00%) high severe
get_existing_after_n_elements/10
                        time:   [28.923 ns 28.950 ns 28.981 ns]
                        change: [-14.267% -14.160% -14.042%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
get_existing_after_n_elements/100
                        time:   [36.280 ns 36.328 ns 36.384 ns]
                        change: [+4.3315% +4.5087% +4.6797%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  8 (8.00%) high mild
  4 (4.00%) high severe
get_existing_after_n_elements/1000
                        time:   [33.089 ns 33.166 ns 33.284 ns]
                        change: [+6.6989% +7.5179% +8.1580%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
get_existing_after_n_elements/10000
                        time:   [36.041 ns 36.145 ns 36.252 ns]
                        change: [-6.3166% -4.6691% -3.3693%] (p = 0.00 < 0.05)
                        Performance has improved.

get_new_after_n_elements/0
                        time:   [9.3569 ns 9.3597 ns 9.3626 ns]
                        change: [-38.034% -37.914% -37.820%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
get_new_after_n_elements/1
                        time:   [25.029 ns 25.054 ns 25.095 ns]
                        change: [-13.193% -13.094% -12.946%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
get_new_after_n_elements/10
                        time:   [26.160 ns 26.176 ns 26.191 ns]
                        change: [-13.138% -13.072% -12.999%] (p = 0.00 < 0.05)
                        Performance has improved.
get_new_after_n_elements/100
                        time:   [35.086 ns 35.135 ns 35.187 ns]
                        change: [-1.9261% -1.7566% -1.5940%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
get_new_after_n_elements/1000
                        time:   [28.929 ns 29.010 ns 29.103 ns]
                        change: [-0.1152% +0.2217% +0.5492%] (p = 0.20 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
get_new_after_n_elements/10000
                        time:   [43.733 ns 44.226 ns 44.830 ns]
                        change: [+23.061% +24.885% +27.016%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

pop_n_elements/0        time:   [7.6553 ns 7.6586 ns 7.6619 ns]
                        change: [-27.593% -27.545% -27.501%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
pop_n_elements/1        time:   [43.055 ns 43.089 ns 43.136 ns]
                        change: [+10.568% +10.680% +10.798%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
pop_n_elements/10       time:   [347.19 ns 347.42 ns 347.68 ns]
                        change: [+71.785% +71.998% +72.180%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
pop_n_elements/100      time:   [4.6767 µs 4.6793 µs 4.6819 µs]
                        change: [+104.38% +104.56% +104.74%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
pop_n_elements/1000     time:   [44.635 µs 44.724 µs 44.872 µs]
                        change: [+186.13% +186.84% +187.92%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe
pop_n_elements/10000    time:   [479.51 µs 479.92 µs 480.34 µs]
                        change: [+179.63% +180.01% +180.38%] (p = 0.00 < 0.05)
                        Performance has regressed.

remove_existing_after_n_elements/1
                        time:   [52.939 ns 52.996 ns 53.077 ns]
                        change: [+16.171% +16.313% +16.529%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
remove_existing_after_n_elements/10
                        time:   [62.905 ns 62.968 ns 63.038 ns]
                        change: [+0.5731% +0.6847% +0.8053%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
remove_existing_after_n_elements/100
                        time:   [97.206 ns 97.310 ns 97.416 ns]
                        change: [+0.3006% +0.5036% +0.6846%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
remove_existing_after_n_elements/1000
                        time:   [119.75 ns 120.18 ns 120.63 ns]
                        change: [-10.255% -9.8892% -9.5151%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
remove_existing_after_n_elements/10000
                        time:   [160.66 ns 162.29 ns 164.06 ns]
                        change: [-71.326% -70.589% -69.801%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

remove_new_after_n_elements/0
                        time:   [21.056 ns 21.067 ns 21.080 ns]
                        change: [-13.898% -13.844% -13.783%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
remove_new_after_n_elements/1
                        time:   [24.313 ns 24.322 ns 24.332 ns]
                        change: [-13.770% -13.691% -13.623%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
remove_new_after_n_elements/10
                        time:   [25.672 ns 25.686 ns 25.700 ns]
                        change: [-12.717% -12.650% -12.585%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
remove_new_after_n_elements/100
                        time:   [35.031 ns 35.060 ns 35.091 ns]
                        change: [-1.9734% -1.8181% -1.6578%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
remove_new_after_n_elements/1000
                        time:   [29.874 ns 29.978 ns 30.109 ns]
                        change: [-5.3132% -4.4315% -3.6840%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high severe
remove_new_after_n_elements/10000
                        time:   [36.628 ns 36.872 ns 37.164 ns]
                        change: [-17.430% -16.074% -14.738%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

replace_after_n_elements/1
                        time:   [55.542 ns 55.609 ns 55.693 ns]
                        change: [+17.578% +17.732% +17.940%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
replace_after_n_elements/10
                        time:   [77.437 ns 77.529 ns 77.642 ns]
                        change: [-13.348% -13.240% -13.114%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
replace_after_n_elements/100
                        time:   [155.57 ns 155.74 ns 155.92 ns]
                        change: [+3.9468% +4.2718% +4.5275%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
replace_after_n_elements/1000
                        time:   [192.74 ns 193.47 ns 194.29 ns]
                        change: [-19.267% -18.905% -18.445%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
replace_after_n_elements/10000
                        time:   [271.85 ns 275.39 ns 279.82 ns]
                        change: [-75.027% -74.447% -73.816%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

update_order_existing_to_random_after_n_elements/1
                        time:   [56.459 ns 56.484 ns 56.515 ns]
                        change: [+15.045% +15.128% +15.219%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
update_order_existing_to_random_after_n_elements/10
                        time:   [79.163 ns 79.229 ns 79.327 ns]
                        change: [-14.176% -14.090% -13.963%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe
update_order_existing_to_random_after_n_elements/100
                        time:   [138.88 ns 139.28 ns 139.90 ns]
                        change: [-9.5444% -9.2390% -8.7987%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
update_order_existing_to_random_after_n_elements/1000
                        time:   [187.01 ns 188.09 ns 189.60 ns]
                        change: [-21.513% -20.638% -19.817%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
update_order_existing_to_random_after_n_elements/10000
                        time:   [271.94 ns 278.12 ns 285.56 ns]
                        change: [-75.123% -74.395% -73.590%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

update_order_existing_to_first_after_n_elements/1
                        time:   [54.234 ns 54.270 ns 54.310 ns]
                        change: [+6.9327% +7.0343% +7.1302%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
update_order_existing_to_first_after_n_elements/10
                        time:   [68.528 ns 68.598 ns 68.678 ns]
                        change: [-10.464% -10.352% -10.242%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
update_order_existing_to_first_after_n_elements/100
                        time:   [114.73 ns 114.85 ns 114.98 ns]
                        change: [+1.2309% +1.3810% +1.5303%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
update_order_existing_to_first_after_n_elements/1000
                        time:   [148.20 ns 149.64 ns 151.63 ns]
                        change: [-18.039% -17.103% -15.865%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe
update_order_existing_to_first_after_n_elements/10000
                        time:   [195.62 ns 198.87 ns 203.28 ns]
                        change: [-71.482% -70.401% -69.197%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

update_order_new_after_n_elements/0
                        time:   [8.0889 ns 8.0925 ns 8.0961 ns]
                        change: [-84.003% -83.966% -83.943%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
update_order_new_after_n_elements/1
                        time:   [24.156 ns 24.165 ns 24.174 ns]
                        change: [-16.500% -16.448% -16.400%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
update_order_new_after_n_elements/10
                        time:   [26.888 ns 26.906 ns 26.925 ns]
                        change: [-11.976% -11.882% -11.791%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
update_order_new_after_n_elements/100
                        time:   [34.980 ns 35.024 ns 35.070 ns]
                        change: [-6.7389% -6.5648% -6.3944%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe
update_order_new_after_n_elements/1000
                        time:   [29.538 ns 29.588 ns 29.643 ns]
                        change: [+0.3276% +0.5565% +0.7915%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
update_order_new_after_n_elements/10000
                        time:   [35.260 ns 35.536 ns 35.873 ns]
                        change: [-2.7360% -1.5665% -0.2156%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
```

</details>
2023-09-11 14:50:20 +00:00
Andrew Lamb 74c0851fc2
chore: Update DataFusion pin (#8698)
* chore: Update DataFusion pin

* chore: Update for new API

* fix: fix test

* fix: only check error messages

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-11 13:54:24 +00:00
Dom Dwyer a513301e23
fix(router): health probe livelock / missing probe
When an upstream ingester goes offline, the "circuit breaker" detects it
as unhealthy, and prevents further requests being sent to it.
Periodically a small number of requests are allowed ("probe requests")
to check for recovery.

If a write request is selected as a "probe request", it SHOULD be sent -
a limited number writes are selected as probes, and enough have to be
successful to drive recovery. If no probes are ever sent/successful, the
upstream will never be marked as healthy.

Additionally the RPC handler applies an optimisation: if the number of
ingesters selected to service a write is less than the number needed to
successfully reach the desired replication factor, no requests are sent
and an error is returned immediately, preventing unnecessary system load
for writes that would never succeed.

This optimisation conflicts with the probe request requirement when a
replication factor of >= 2 is specified:

   * All ingesters are offline
   * Write comes in
   * UpstreamSnapshot is populated with a probe request for 1 ingester
     only - no other healthy candidate ingesters exist.
   * Optimisation applied: 1 probe candidate < 2 needed for replication

This results in a probe request never being sent, and in turn, never
allowing further requests to the recovered upstream.

This fix changes the optimisation, applying it only when there are no
probes in the candidate ingester list - the write will always fail, but
it will drive detection of recovered ingesters and maintain liveness of
the system.
2023-09-11 15:24:40 +02:00
Dom Dwyer 03a15aee62
refactor: UpstreamSnapshot aware of probe requests
Allows the UpstreamSnapshot to be initialised with a "contains probe"
boolean indicator that's passed through to the RPC layer.
2023-09-11 14:06:30 +02:00
Marco Neumann 3bdaafe36a
refactor: optimize hash-collection constructions (#8707)
* refactor: optimize `SortKey` construction

* refactor: optimize column set construction

* refactor: optimize "should cover" calculcation for partitions

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-11 09:46:47 +00:00
dependabot[bot] e440740aed
chore(deps): Bump croaring from 0.9.0 to 1.0.0 (#8703)
* chore(deps): Bump croaring from 0.9.0 to 1.0.0

Bumps [croaring](https://github.com/RoaringBitmap/croaring-rs) from 0.9.0 to 1.0.0.
- [Release notes](https://github.com/RoaringBitmap/croaring-rs/releases)
- [Commits](https://github.com/RoaringBitmap/croaring-rs/commits/1.0.0)

---
updated-dependencies:
- dependency-name: croaring
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* refactor: fix croaring upgrade and remove dead code

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
2023-09-11 09:41:16 +00:00
Marco Neumann b741dca8b8
refactor: wrap `CachedPartition` into `Arc` (#8706)
Even though all subfields of `CachedPartition` are `Arc`ed, the size of
this structure grows and copying more and more fields around for every
cache access gets quite expensive. `Arc` the whole thing and simplify
management a bit.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-11 09:22:15 +00:00
dependabot[bot] 9a0332912b
chore(deps): Bump toml from 0.7.6 to 0.7.8 (#8704)
Bumps [toml](https://github.com/toml-rs/toml) from 0.7.6 to 0.7.8.
- [Commits](https://github.com/toml-rs/toml/compare/toml-v0.7.6...toml-v0.7.8)

---
updated-dependencies:
- dependency-name: toml
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-11 08:44:31 +00:00
Marco Neumann 3e7fe33c56
feat: add easy-to-use compositions of layers (#8686)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-11 08:34:24 +00:00
dependabot[bot] 87bfe52f6b
chore(deps): Bump serde_json from 1.0.105 to 1.0.106 (#8702)
Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.105 to 1.0.106.
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](https://github.com/serde-rs/json/compare/v1.0.105...v1.0.106)

---
updated-dependencies:
- dependency-name: serde_json
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-11 08:28:31 +00:00
Marco Neumann 806124b700
feat: reconnect layer (#8695)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-11 08:21:24 +00:00
dependabot[bot] 5cd9c37519
chore(deps): Bump base64 from 0.21.3 to 0.21.4 (#8701)
Bumps [base64](https://github.com/marshallpierce/rust-base64) from 0.21.3 to 0.21.4.
- [Changelog](https://github.com/marshallpierce/rust-base64/blob/master/RELEASE-NOTES.md)
- [Commits](https://github.com/marshallpierce/rust-base64/compare/v0.21.3...v0.21.4)

---
updated-dependencies:
- dependency-name: base64
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-11 08:14:43 +00:00
Andrew Lamb 50799e1f66
chore: Add ticket reference to comment (#8697)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-10 10:32:29 +00:00
Joe-Blount 90fc0370ae
chore: adjust compactor catalog query rate limiter for small clusters (#8699) 2023-09-08 18:53:08 +00:00
wiedld c332029d38
fix(k8s-idpe-28726): use backoff on startup authz probe() (#8683)
* Note that an error returned from authz is still a valid response,
therefore we are using a conditional retry based on communication errors.
2023-09-08 10:19:26 -07:00
dependabot[bot] cb2d6d1d25
chore(deps): Bump chrono from 0.4.29 to 0.4.30 (#8693)
Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.29 to 0.4.30.
- [Release notes](https://github.com/chronotope/chrono/releases)
- [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md)
- [Commits](https://github.com/chronotope/chrono/compare/v0.4.29...v0.4.30)

---
updated-dependencies:
- dependency-name: chrono
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-08 13:03:27 +00:00
Andrew Lamb 45c6bfea9c
chore: Update datafusion, arrow/flight/parquet to `46.0.0` , object_store to `0.7.0` (#8577)
* chore: Update DataFusion pin

* chore: Update for new API

* fix: Update for API

* fix: update compactor test

* fix: Update to patched version of arrow 46.0.0

* fix: map  `DataFusionError::Configuration` to an internal error

* fix: do not use deprecated API

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-08 12:49:57 +00:00
dependabot[bot] 7f20b0faa0
chore(deps): Bump bytes from 1.4.0 to 1.5.0 (#8692)
Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.4.0 to 1.5.0.
- [Release notes](https://github.com/tokio-rs/bytes/releases)
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md)
- [Commits](https://github.com/tokio-rs/bytes/compare/v1.4.0...v1.5.0)

---
updated-dependencies:
- dependency-name: bytes
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-08 12:17:12 +00:00
Marco Neumann 9b7697e0d0
feat: backoff & retry for i->q V2 client (#8688)
* feat: error classifiers for retries etc.

* feat: backoff-based retries

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-08 08:14:28 +00:00
Marco Neumann 7f06f524eb
feat: i->q V2 metrics integration (#8687)
Same metric names as V1, so dashboards don't need any migration.
2023-09-08 08:07:22 +00:00
Marco Neumann 6f1c6fa44a
feat: i->q V2 tracing integration (#8680)
* refactor: improve `TestLayer`

* feat: tracing layer
2023-09-07 09:13:17 +00:00
dependabot[bot] 581276f974
chore(deps): Bump sysinfo from 0.29.9 to 0.29.10 (#8685)
Bumps [sysinfo](https://github.com/GuillaumeGomez/sysinfo) from 0.29.9 to 0.29.10.
- [Changelog](https://github.com/GuillaumeGomez/sysinfo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/GuillaumeGomez/sysinfo/commits)

---
updated-dependencies:
- dependency-name: sysinfo
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-07 08:57:26 +00:00
Michael Angerman d87c2ae821
chore: change loglevel to info on No compaction job found (#8684) 2023-09-06 14:49:47 -07:00
Joe-Blount 98960a353c
fix(compactor): prevent sort order mismatches from creating overlapping chains (#8675)
* fix(compactor): prevent sort order mismatches from creating overlapping regions

* chore: test additions for incorrectly created regions

* fix(compactor): more sort order mismatch fixes

* chore: insta updates

* chore: insta updates after merge
2023-09-06 14:53:09 +00:00
Carol (Nichols || Goulding) fdffa871c3
feat: Optionally specify a table name to get just its schema (#8650)
Rather than always having to request all of a namespace's schema then
filtering to the one you want. Will make this more consistent with
upserting schema by namespace+table.

Fixes #4997.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-06 13:33:25 +00:00
Nga Tran de6f710c31
chore: reland have ingester's SortKeyState include sort_key_ids (#8678) 2023-09-06 12:01:07 +00:00
Fraser Savage d05380db09
test(ingester): Cover more WAL reader errors using wal replay test macro 2023-09-06 12:01:16 +01:00
Fraser Savage 36604d30ae
refactor(ingester): Only automatically recover when encountering `IncompleteEntry` during wal replay 2023-09-06 12:01:15 +01:00
Dom 27a106ba8b
Merge pull request #8670 from influxdata/dom/gossip
feat(compactor): gossip compaction completion events
2023-09-06 11:30:16 +01:00
Dom b5a9a6c141
Merge branch 'main' into dom/gossip 2023-09-06 11:24:56 +01:00
Fraser Savage 73533a71fa
refactor(wal): Disambiguate between `UnexpectedEof` and other errors during entry read 2023-09-06 11:10:08 +01:00
Martin Hilton 6056571e74
fix(influxql): FILL(linear) for selectors (#8396)
* fix(influxql): FILL(linear) for selectors

Ensure that selector functions such as FIRST, LAST, MIN and MAX can
use LINEAR filling in the same way as influxdb 1.8.

* chore: review suggestions

Apply suggestions from the review. This adds more tests and support
for interpolation in SQL.

* fix: lint

* fix: lint

* chore: buffered input for struct arrays

Ensure that for linear interpolation the buffered input of a struct
field ensures that buffering only stops when there is a non-null
struct containing a non-null value.

* fix: integration test

* fix(iox_query): make clippy happy

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-06 09:44:28 +00:00
dependabot[bot] 4f6864c0b9
chore(deps): Bump chrono from 0.4.28 to 0.4.29 (#8677)
* chore(deps): Bump chrono from 0.4.28 to 0.4.29

Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.28 to 0.4.29.
- [Release notes](https://github.com/chronotope/chrono/releases)
- [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md)
- [Commits](https://github.com/chronotope/chrono/compare/v0.4.28...v0.4.29)

---
updated-dependencies:
- dependency-name: chrono
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix: deprecations

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-06 09:20:58 +00:00
Marco Neumann 720bdc22c8
feat: add "deserialize" layer for i->q V2 client (#8639)
Adds the code that deserializes the gRPC response into proper high level
types. Uses mostly the low level code added in #8347.

For #8349.
2023-09-06 09:15:23 +00:00
Marco Neumann 377d2a8215
feat: network layer for i->q V2 client (#8640)
Adds the actual network IO layer for #8349.

This is a rather simple layer for now, but we may want to tune some
connection settings in the future.
2023-09-06 09:07:17 +00:00
Marco Neumann 4d49be9777
feat: add "serialize" layer for i->q V2 client (#8638)
The layer that serializes our requests. This also contains the logic to
leave out non-serialiable filters like the V1 version (same tests, just
slightly differently arranged).

For #8349.
2023-09-06 08:36:33 +00:00
Marco Neumann 260aa0d64c
feat: "logging" layer for i->q V2 client (#8641)
* feat: more `TestResponse` constructors

* feat: "logging" layer for i->q V2 client

Logging layer for #8349. This mostly logs in debug mode but emits errors
to the log. Simple implementation that can be extended later.

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-06 08:16:17 +00:00
Marco Neumann d0d355ba4d
refactor: unpack record batches later during query (#8663)
For #8350 we want to be able to stream record batches from the ingester
instead of waiting to buffer them fully before the query starts. Hence
we can no longer inspect the batches in the "display" implementation of
the plan.

This change mostly contains the display change, not the actual streaming
part. I'll do that in a follow-up.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-06 08:08:54 +00:00
Nga Tran ecc3a2c416
Merge pull request #8676 from influxdata/ntran/revert-jic
chore: prepare for revert just in case
2023-09-05 23:53:07 -04:00
NGA-TRAN 7380d76993 chore: prepare for revert just in case 2023-09-05 17:40:24 -04:00
Nga Tran 9af06dee9e
feat: have ingester's SortKeyState include sort_key_ids (#8556)
* feat: have ingester's SortKeyState include sort_key_ids

* fix: test failures

* chore: address review comments

* chore: address review comments by asding asserts to catch bugs if any

* chore: fix typo

* test: get column IDs for the tests

* refactor: reuse function

* chore: address review comments
2023-09-05 20:41:15 +00:00
Nga Tran 2a71fcbc76
feat: reland compactor consumes sort_key_ids (#8674) 2023-09-05 18:45:49 +00:00
Nga Tran 5c4ec830c5
Merge pull request #8673 from influxdata/ntran/compact-revert-jic
chore: prepare a revert PR just in case
2023-09-05 13:47:27 -04:00
NGA-TRAN 399c0e257d chore: prepare a revert PR just in case 2023-09-05 13:26:18 -04:00
Nga Tran fb453ede1e
chore: reland 'teach compactor to use sortkey_ids' after catalog migration is fixed (#8575) 2023-09-05 17:05:13 +00:00
Fraser Savage 6e6970cfe1
test(ingester): Cover precedence of `IngestState::read_with_exceptions()` 2023-09-05 17:49:53 +01:00
Fraser Savage c15dfb25b4
test(ingester): Expand `IngestState::read_with_exceptions()` testing
This covers multiple error states and multiple exceptions return the
expected results.
2023-09-05 17:49:51 +01:00
Fraser Savage be9064c75f
feat(ingester): Allow read of `IngestState` with exceptions
This will enable some subsystems to trivially respect any `IngestStateError`
set while ignoring specific errors which they may be responsible for
resolving (such as WAL replay needing to ingest from disk when `DiskFull`
is set).
2023-09-05 16:49:41 +01:00