influxdb

Commit Graph

Author	SHA1	Message	Date
Dom	c0f62ce3e1	Merge pull request #8709 from influxdata/dom/rpcwrite-health-livelock fix(router): health probe livelock / missing probes	2023-09-11 18:07:34 +01:00
Dom	ffa3c39dbc	Merge branch 'main' into dom/rpcwrite-health-livelock	2023-09-11 17:37:07 +01:00
Marco Neumann	a17c9a2bf8	refactor: use `BTreeSet` in addressable heap (#8710 ) * refactor: improve addressable heap benchmarks - don't use the full range for key and order so it is easier to generate new entries at the top/bottom - differentiate between "update order to a new random one" and "update order to the last one", since the latter one is the de facto standard case for the LRU cache * refactor: use `BTreeSet` in addressable heap Our cache system has mostly reads and only a few writes (like the change rate is low or in other words: we have a high cache HIT rate). It makes sense to optimize the data structures accordingly. While investigating https://github.com/influxdata/EAR/issues/4505 it turned out that the bookkeeping for read operations is quite expensive. This PR improves that by optimizing the "addressable heap". That data structure is basically a map K->V but every entry also has an order O which can be used to "pop" the first element. This is used to implement the LRU cache where K is the cache key and O is the "least recently used" time (V is unused, but that's irrelevant for the problem described here). The former queue impl. is very expensive for `update_order`, because you essentially copy large parts of the `VecDeque` twice (once for remove, once for insert). B-trees handle updates of individual keys better, even if they are less efficient in the "tiny case" (i.e. just a few keys). Performance Results =================== The summary is: - `insert_n_elements`: improved, not surprising because inserting into the middle of the vector was pretty slow - `peek_after_n_elements`: quite noisy, maybe regressed slightly (<+30%) which is OK. Peeking the `VecDeque` is pretty cheap compared to the B-tree. - `get_existing_after_n_elements`: noisy, mostly identical or slightly slower (<+30%) - `get_new_after_n_elements`: same, there shouldn't be a difference because the `VecDeque` / B-tree isn't touched for elements that are not found, so if there is a difference this is probably LLVM codegen stuff - `pop_n_elements`: regressed by up to +200%, which makes sense given how cheap removing elements in order from the `VecDeque` is compared to the B-tree. However this operation is only used under memory pressure when new data is added to the cache system, which is a rare event and quite expensive to begin with, so this is unlikely to make a noticeable difference in practice - `remove_existing_after_n_elements`: improved up to -70%, which makes sense because removing from the `VecDeque` is very expensive due to the data copy/move - `remove_new_after_n_elements`: mostly noise, should logically be the same because the `VecDeque`/B-tree isn't touched if the element doesn't exist - `replace_after_n_elements`: this is "remove"+"insert", improved accordingly - `update_order_existing_to_random_after_n_elements`: optimized version of "replace" that touches the internal `HashMap` less often, improved accordingly - `update_order_existing_to_first_after_n_elements`: THE prime case that we where aiming for because it is used to update the "least recently used" times and is used for every read. Improved by at least -70%, same argument as "replace" and "remove". - `update_order_new_after_n_elements`: mostly noise / same, `VecDeque`/B-tree isn't touched in this case The detailed results are: <details> ``` ❯ cargo bench -p cache_system --bench addressable_heap -- --baseline btree-pre Compiling cache_system v0.1.0 (/home/mneumann/src/influxdb_iox/cache_system) Finished bench [optimized + debuginfo] target(s) in 8.48s Running benches/addressable_heap.rs (target/release/deps/addressable_heap-e6a3281d83b52007) insert_n_elements/0 time: [12.621 ns 12.629 ns 12.640 ns] change: [-3.6323% -3.5654% -3.4848%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe insert_n_elements/1 time: [77.540 ns 77.665 ns 77.824 ns] change: [-6.7121% -6.5483% -6.3759%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe insert_n_elements/10 time: [565.61 ns 565.89 ns 566.22 ns] change: [-20.328% -20.232% -20.156%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe insert_n_elements/100 time: [7.3376 µs 7.3438 µs 7.3499 µs] change: [-2.7056% -2.6147% -2.5146%] (p = 0.00 < 0.05) Performance has improved. insert_n_elements/1000 time: [97.249 µs 97.335 µs 97.435 µs] change: [-13.880% -13.804% -13.717%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe insert_n_elements/10000 time: [1.1371 ms 1.1386 ms 1.1403 ms] change: [-61.699% -61.650% -61.594%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low mild 2 (2.00%) high mild 6 (6.00%) high severe peek_after_n_elements/0 time: [7.2367 ns 7.2398 ns 7.2430 ns] change: [-33.159% -33.109% -33.064%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild peek_after_n_elements/1 time: [27.329 ns 27.344 ns 27.361 ns] change: [-15.114% -15.047% -14.968%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe peek_after_n_elements/10 time: [29.520 ns 29.534 ns 29.548 ns] change: [-6.4913% -6.3567% -6.2216%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild peek_after_n_elements/100 time: [36.132 ns 36.192 ns 36.264 ns] change: [-2.5660% -2.3400% -2.1099%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe peek_after_n_elements/1000 time: [40.246 ns 40.312 ns 40.380 ns] change: [+19.265% +19.583% +19.892%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 7 (7.00%) high mild peek_after_n_elements/10000 time: [53.840 ns 54.581 ns 55.585 ns] change: [+21.970% +23.892% +26.167%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe get_existing_after_n_elements/1 time: [29.431 ns 29.451 ns 29.474 ns] change: [-9.8425% -9.7508% -9.6523%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low mild 6 (6.00%) high mild 3 (3.00%) high severe get_existing_after_n_elements/10 time: [28.923 ns 28.950 ns 28.981 ns] change: [-14.267% -14.160% -14.042%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe get_existing_after_n_elements/100 time: [36.280 ns 36.328 ns 36.384 ns] change: [+4.3315% +4.5087% +4.6797%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 8 (8.00%) high mild 4 (4.00%) high severe get_existing_after_n_elements/1000 time: [33.089 ns 33.166 ns 33.284 ns] change: [+6.6989% +7.5179% +8.1580%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe get_existing_after_n_elements/10000 time: [36.041 ns 36.145 ns 36.252 ns] change: [-6.3166% -4.6691% -3.3693%] (p = 0.00 < 0.05) Performance has improved. get_new_after_n_elements/0 time: [9.3569 ns 9.3597 ns 9.3626 ns] change: [-38.034% -37.914% -37.820%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild get_new_after_n_elements/1 time: [25.029 ns 25.054 ns 25.095 ns] change: [-13.193% -13.094% -12.946%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe get_new_after_n_elements/10 time: [26.160 ns 26.176 ns 26.191 ns] change: [-13.138% -13.072% -12.999%] (p = 0.00 < 0.05) Performance has improved. get_new_after_n_elements/100 time: [35.086 ns 35.135 ns 35.187 ns] change: [-1.9261% -1.7566% -1.5940%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild get_new_after_n_elements/1000 time: [28.929 ns 29.010 ns 29.103 ns] change: [-0.1152% +0.2217% +0.5492%] (p = 0.20 > 0.05) No change in performance detected. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe get_new_after_n_elements/10000 time: [43.733 ns 44.226 ns 44.830 ns] change: [+23.061% +24.885% +27.016%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe pop_n_elements/0 time: [7.6553 ns 7.6586 ns 7.6619 ns] change: [-27.593% -27.545% -27.501%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild pop_n_elements/1 time: [43.055 ns 43.089 ns 43.136 ns] change: [+10.568% +10.680% +10.798%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe pop_n_elements/10 time: [347.19 ns 347.42 ns 347.68 ns] change: [+71.785% +71.998% +72.180%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe pop_n_elements/100 time: [4.6767 µs 4.6793 µs 4.6819 µs] change: [+104.38% +104.56% +104.74%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild pop_n_elements/1000 time: [44.635 µs 44.724 µs 44.872 µs] change: [+186.13% +186.84% +187.92%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high severe pop_n_elements/10000 time: [479.51 µs 479.92 µs 480.34 µs] change: [+179.63% +180.01% +180.38%] (p = 0.00 < 0.05) Performance has regressed. remove_existing_after_n_elements/1 time: [52.939 ns 52.996 ns 53.077 ns] change: [+16.171% +16.313% +16.529%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe remove_existing_after_n_elements/10 time: [62.905 ns 62.968 ns 63.038 ns] change: [+0.5731% +0.6847% +0.8053%] (p = 0.00 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe remove_existing_after_n_elements/100 time: [97.206 ns 97.310 ns 97.416 ns] change: [+0.3006% +0.5036% +0.6846%] (p = 0.00 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) low mild 3 (3.00%) high mild remove_existing_after_n_elements/1000 time: [119.75 ns 120.18 ns 120.63 ns] change: [-10.255% -9.8892% -9.5151%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild remove_existing_after_n_elements/10000 time: [160.66 ns 162.29 ns 164.06 ns] change: [-71.326% -70.589% -69.801%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) high mild 1 (1.00%) high severe remove_new_after_n_elements/0 time: [21.056 ns 21.067 ns 21.080 ns] change: [-13.898% -13.844% -13.783%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe remove_new_after_n_elements/1 time: [24.313 ns 24.322 ns 24.332 ns] change: [-13.770% -13.691% -13.623%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild remove_new_after_n_elements/10 time: [25.672 ns 25.686 ns 25.700 ns] change: [-12.717% -12.650% -12.585%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild remove_new_after_n_elements/100 time: [35.031 ns 35.060 ns 35.091 ns] change: [-1.9734% -1.8181% -1.6578%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild remove_new_after_n_elements/1000 time: [29.874 ns 29.978 ns 30.109 ns] change: [-5.3132% -4.4315% -3.6840%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high severe remove_new_after_n_elements/10000 time: [36.628 ns 36.872 ns 37.164 ns] change: [-17.430% -16.074% -14.738%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe replace_after_n_elements/1 time: [55.542 ns 55.609 ns 55.693 ns] change: [+17.578% +17.732% +17.940%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe replace_after_n_elements/10 time: [77.437 ns 77.529 ns 77.642 ns] change: [-13.348% -13.240% -13.114%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe replace_after_n_elements/100 time: [155.57 ns 155.74 ns 155.92 ns] change: [+3.9468% +4.2718% +4.5275%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild replace_after_n_elements/1000 time: [192.74 ns 193.47 ns 194.29 ns] change: [-19.267% -18.905% -18.445%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe replace_after_n_elements/10000 time: [271.85 ns 275.39 ns 279.82 ns] change: [-75.027% -74.447% -73.816%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe update_order_existing_to_random_after_n_elements/1 time: [56.459 ns 56.484 ns 56.515 ns] change: [+15.045% +15.128% +15.219%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe update_order_existing_to_random_after_n_elements/10 time: [79.163 ns 79.229 ns 79.327 ns] change: [-14.176% -14.090% -13.963%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe update_order_existing_to_random_after_n_elements/100 time: [138.88 ns 139.28 ns 139.90 ns] change: [-9.5444% -9.2390% -8.7987%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe update_order_existing_to_random_after_n_elements/1000 time: [187.01 ns 188.09 ns 189.60 ns] change: [-21.513% -20.638% -19.817%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe update_order_existing_to_random_after_n_elements/10000 time: [271.94 ns 278.12 ns 285.56 ns] change: [-75.123% -74.395% -73.590%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe update_order_existing_to_first_after_n_elements/1 time: [54.234 ns 54.270 ns 54.310 ns] change: [+6.9327% +7.0343% +7.1302%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe update_order_existing_to_first_after_n_elements/10 time: [68.528 ns 68.598 ns 68.678 ns] change: [-10.464% -10.352% -10.242%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe update_order_existing_to_first_after_n_elements/100 time: [114.73 ns 114.85 ns 114.98 ns] change: [+1.2309% +1.3810% +1.5303%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild update_order_existing_to_first_after_n_elements/1000 time: [148.20 ns 149.64 ns 151.63 ns] change: [-18.039% -17.103% -15.865%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe update_order_existing_to_first_after_n_elements/10000 time: [195.62 ns 198.87 ns 203.28 ns] change: [-71.482% -70.401% -69.197%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe update_order_new_after_n_elements/0 time: [8.0889 ns 8.0925 ns 8.0961 ns] change: [-84.003% -83.966% -83.943%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild update_order_new_after_n_elements/1 time: [24.156 ns 24.165 ns 24.174 ns] change: [-16.500% -16.448% -16.400%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high mild update_order_new_after_n_elements/10 time: [26.888 ns 26.906 ns 26.925 ns] change: [-11.976% -11.882% -11.791%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe update_order_new_after_n_elements/100 time: [34.980 ns 35.024 ns 35.070 ns] change: [-6.7389% -6.5648% -6.3944%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 6 (6.00%) high mild 1 (1.00%) high severe update_order_new_after_n_elements/1000 time: [29.538 ns 29.588 ns 29.643 ns] change: [+0.3276% +0.5565% +0.7915%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe update_order_new_after_n_elements/10000 time: [35.260 ns 35.536 ns 35.873 ns] change: [-2.7360% -1.5665% -0.2156%] (p = 0.01 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe ``` </details>	2023-09-11 14:50:20 +00:00
Andrew Lamb	74c0851fc2	chore: Update DataFusion pin (#8698 ) * chore: Update DataFusion pin * chore: Update for new API * fix: fix test * fix: only check error messages --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-11 13:54:24 +00:00
Dom Dwyer	a513301e23	fix(router): health probe livelock / missing probe When an upstream ingester goes offline, the "circuit breaker" detects it as unhealthy, and prevents further requests being sent to it. Periodically a small number of requests are allowed ("probe requests") to check for recovery. If a write request is selected as a "probe request", it SHOULD be sent - a limited number writes are selected as probes, and enough have to be successful to drive recovery. If no probes are ever sent/successful, the upstream will never be marked as healthy. Additionally the RPC handler applies an optimisation: if the number of ingesters selected to service a write is less than the number needed to successfully reach the desired replication factor, no requests are sent and an error is returned immediately, preventing unnecessary system load for writes that would never succeed. This optimisation conflicts with the probe request requirement when a replication factor of >= 2 is specified: * All ingesters are offline * Write comes in * UpstreamSnapshot is populated with a probe request for 1 ingester only - no other healthy candidate ingesters exist. * Optimisation applied: 1 probe candidate < 2 needed for replication This results in a probe request never being sent, and in turn, never allowing further requests to the recovered upstream. This fix changes the optimisation, applying it only when there are no probes in the candidate ingester list - the write will always fail, but it will drive detection of recovered ingesters and maintain liveness of the system.	2023-09-11 15:24:40 +02:00
Dom Dwyer	03a15aee62	refactor: UpstreamSnapshot aware of probe requests Allows the UpstreamSnapshot to be initialised with a "contains probe" boolean indicator that's passed through to the RPC layer.	2023-09-11 14:06:30 +02:00
Marco Neumann	3bdaafe36a	refactor: optimize hash-collection constructions (#8707 ) * refactor: optimize `SortKey` construction * refactor: optimize column set construction * refactor: optimize "should cover" calculcation for partitions --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-11 09:46:47 +00:00
dependabot[bot]	e440740aed	chore(deps): Bump croaring from 0.9.0 to 1.0.0 (#8703 ) * chore(deps): Bump croaring from 0.9.0 to 1.0.0 Bumps [croaring](https://github.com/RoaringBitmap/croaring-rs) from 0.9.0 to 1.0.0. - [Release notes](https://github.com/RoaringBitmap/croaring-rs/releases) - [Commits](https://github.com/RoaringBitmap/croaring-rs/commits/1.0.0) --- updated-dependencies: - dependency-name: croaring dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * refactor: fix croaring upgrade and remove dead code --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Marco Neumann <marco@crepererum.net>	2023-09-11 09:41:16 +00:00
Marco Neumann	b741dca8b8	refactor: wrap `CachedPartition` into `Arc` (#8706 ) Even though all subfields of `CachedPartition` are `Arc`ed, the size of this structure grows and copying more and more fields around for every cache access gets quite expensive. `Arc` the whole thing and simplify management a bit. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-11 09:22:15 +00:00
dependabot[bot]	9a0332912b	chore(deps): Bump toml from 0.7.6 to 0.7.8 (#8704 ) Bumps [toml](https://github.com/toml-rs/toml) from 0.7.6 to 0.7.8. - [Commits](https://github.com/toml-rs/toml/compare/toml-v0.7.6...toml-v0.7.8) --- updated-dependencies: - dependency-name: toml dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-11 08:44:31 +00:00
Marco Neumann	3e7fe33c56	feat: add easy-to-use compositions of layers (#8686 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-11 08:34:24 +00:00
dependabot[bot]	87bfe52f6b	chore(deps): Bump serde_json from 1.0.105 to 1.0.106 (#8702 ) Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.105 to 1.0.106. - [Release notes](https://github.com/serde-rs/json/releases) - [Commits](https://github.com/serde-rs/json/compare/v1.0.105...v1.0.106) --- updated-dependencies: - dependency-name: serde_json dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-11 08:28:31 +00:00
Marco Neumann	806124b700	feat: reconnect layer (#8695 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-11 08:21:24 +00:00
dependabot[bot]	5cd9c37519	chore(deps): Bump base64 from 0.21.3 to 0.21.4 (#8701 ) Bumps [base64](https://github.com/marshallpierce/rust-base64) from 0.21.3 to 0.21.4. - [Changelog](https://github.com/marshallpierce/rust-base64/blob/master/RELEASE-NOTES.md) - [Commits](https://github.com/marshallpierce/rust-base64/compare/v0.21.3...v0.21.4) --- updated-dependencies: - dependency-name: base64 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-11 08:14:43 +00:00
Andrew Lamb	50799e1f66	chore: Add ticket reference to comment (#8697 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-10 10:32:29 +00:00
Joe-Blount	90fc0370ae	chore: adjust compactor catalog query rate limiter for small clusters (#8699 )	2023-09-08 18:53:08 +00:00
wiedld	c332029d38	fix(k8s-idpe-28726): use backoff on startup authz probe() (#8683 ) * Note that an error returned from authz is still a valid response, therefore we are using a conditional retry based on communication errors.	2023-09-08 10:19:26 -07:00
dependabot[bot]	cb2d6d1d25	chore(deps): Bump chrono from 0.4.29 to 0.4.30 (#8693 ) Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.29 to 0.4.30. - [Release notes](https://github.com/chronotope/chrono/releases) - [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md) - [Commits](https://github.com/chronotope/chrono/compare/v0.4.29...v0.4.30) --- updated-dependencies: - dependency-name: chrono dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-08 13:03:27 +00:00
Andrew Lamb	45c6bfea9c	chore: Update datafusion, arrow/flight/parquet to `46.0.0` , object_store to `0.7.0` (#8577 ) * chore: Update DataFusion pin * chore: Update for new API * fix: Update for API * fix: update compactor test * fix: Update to patched version of arrow 46.0.0 * fix: map `DataFusionError::Configuration` to an internal error * fix: do not use deprecated API --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-08 12:49:57 +00:00
dependabot[bot]	7f20b0faa0	chore(deps): Bump bytes from 1.4.0 to 1.5.0 (#8692 ) Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.4.0 to 1.5.0. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/bytes/compare/v1.4.0...v1.5.0) --- updated-dependencies: - dependency-name: bytes dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-08 12:17:12 +00:00
Marco Neumann	9b7697e0d0	feat: backoff & retry for i->q V2 client (#8688 ) * feat: error classifiers for retries etc. * feat: backoff-based retries --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-08 08:14:28 +00:00
Marco Neumann	7f06f524eb	feat: i->q V2 metrics integration (#8687 ) Same metric names as V1, so dashboards don't need any migration.	2023-09-08 08:07:22 +00:00
Marco Neumann	6f1c6fa44a	feat: i->q V2 tracing integration (#8680 ) * refactor: improve `TestLayer` * feat: tracing layer	2023-09-07 09:13:17 +00:00
dependabot[bot]	581276f974	chore(deps): Bump sysinfo from 0.29.9 to 0.29.10 (#8685 ) Bumps [sysinfo](https://github.com/GuillaumeGomez/sysinfo) from 0.29.9 to 0.29.10. - [Changelog](https://github.com/GuillaumeGomez/sysinfo/blob/master/CHANGELOG.md) - [Commits](https://github.com/GuillaumeGomez/sysinfo/commits) --- updated-dependencies: - dependency-name: sysinfo dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-07 08:57:26 +00:00
Michael Angerman	d87c2ae821	chore: change loglevel to info on No compaction job found (#8684 )	2023-09-06 14:49:47 -07:00
Joe-Blount	98960a353c	fix(compactor): prevent sort order mismatches from creating overlapping chains (#8675 ) * fix(compactor): prevent sort order mismatches from creating overlapping regions * chore: test additions for incorrectly created regions * fix(compactor): more sort order mismatch fixes * chore: insta updates * chore: insta updates after merge	2023-09-06 14:53:09 +00:00
Carol (Nichols \|\| Goulding)	fdffa871c3	feat: Optionally specify a table name to get just its schema (#8650 ) Rather than always having to request all of a namespace's schema then filtering to the one you want. Will make this more consistent with upserting schema by namespace+table. Fixes #4997. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-06 13:33:25 +00:00
Nga Tran	de6f710c31	chore: reland have ingester's SortKeyState include sort_key_ids (#8678 )	2023-09-06 12:01:07 +00:00
Fraser Savage	d05380db09	test(ingester): Cover more WAL reader errors using wal replay test macro	2023-09-06 12:01:16 +01:00
Fraser Savage	36604d30ae	refactor(ingester): Only automatically recover when encountering `IncompleteEntry` during wal replay	2023-09-06 12:01:15 +01:00
Dom	27a106ba8b	Merge pull request #8670 from influxdata/dom/gossip feat(compactor): gossip compaction completion events	2023-09-06 11:30:16 +01:00
Dom	b5a9a6c141	Merge branch 'main' into dom/gossip	2023-09-06 11:24:56 +01:00
Fraser Savage	73533a71fa	refactor(wal): Disambiguate between `UnexpectedEof` and other errors during entry read	2023-09-06 11:10:08 +01:00
Martin Hilton	6056571e74	fix(influxql): FILL(linear) for selectors (#8396 ) * fix(influxql): FILL(linear) for selectors Ensure that selector functions such as FIRST, LAST, MIN and MAX can use LINEAR filling in the same way as influxdb 1.8. * chore: review suggestions Apply suggestions from the review. This adds more tests and support for interpolation in SQL. * fix: lint * fix: lint * chore: buffered input for struct arrays Ensure that for linear interpolation the buffered input of a struct field ensures that buffering only stops when there is a non-null struct containing a non-null value. * fix: integration test * fix(iox_query): make clippy happy --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-06 09:44:28 +00:00
dependabot[bot]	4f6864c0b9	chore(deps): Bump chrono from 0.4.28 to 0.4.29 (#8677 ) * chore(deps): Bump chrono from 0.4.28 to 0.4.29 Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.28 to 0.4.29. - [Release notes](https://github.com/chronotope/chrono/releases) - [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md) - [Commits](https://github.com/chronotope/chrono/compare/v0.4.28...v0.4.29) --- updated-dependencies: - dependency-name: chrono dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * fix: deprecations --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-06 09:20:58 +00:00
Marco Neumann	720bdc22c8	feat: add "deserialize" layer for i->q V2 client (#8639 ) Adds the code that deserializes the gRPC response into proper high level types. Uses mostly the low level code added in #8347. For #8349.	2023-09-06 09:15:23 +00:00
Marco Neumann	377d2a8215	feat: network layer for i->q V2 client (#8640 ) Adds the actual network IO layer for #8349. This is a rather simple layer for now, but we may want to tune some connection settings in the future.	2023-09-06 09:07:17 +00:00
Marco Neumann	4d49be9777	feat: add "serialize" layer for i->q V2 client (#8638 ) The layer that serializes our requests. This also contains the logic to leave out non-serialiable filters like the V1 version (same tests, just slightly differently arranged). For #8349.	2023-09-06 08:36:33 +00:00
Marco Neumann	260aa0d64c	feat: "logging" layer for i->q V2 client (#8641 ) * feat: more `TestResponse` constructors * feat: "logging" layer for i->q V2 client Logging layer for #8349. This mostly logs in debug mode but emits errors to the log. Simple implementation that can be extended later. --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-06 08:16:17 +00:00
Marco Neumann	d0d355ba4d	refactor: unpack record batches later during query (#8663 ) For #8350 we want to be able to stream record batches from the ingester instead of waiting to buffer them fully before the query starts. Hence we can no longer inspect the batches in the "display" implementation of the plan. This change mostly contains the display change, not the actual streaming part. I'll do that in a follow-up. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-06 08:08:54 +00:00
Nga Tran	ecc3a2c416	Merge pull request #8676 from influxdata/ntran/revert-jic chore: prepare for revert just in case	2023-09-05 23:53:07 -04:00
NGA-TRAN	7380d76993	chore: prepare for revert just in case	2023-09-05 17:40:24 -04:00
Nga Tran	9af06dee9e	feat: have ingester's SortKeyState include sort_key_ids (#8556 ) * feat: have ingester's SortKeyState include sort_key_ids * fix: test failures * chore: address review comments * chore: address review comments by asding asserts to catch bugs if any * chore: fix typo * test: get column IDs for the tests * refactor: reuse function * chore: address review comments	2023-09-05 20:41:15 +00:00
Nga Tran	2a71fcbc76	feat: reland compactor consumes sort_key_ids (#8674 )	2023-09-05 18:45:49 +00:00
Nga Tran	5c4ec830c5	Merge pull request #8673 from influxdata/ntran/compact-revert-jic chore: prepare a revert PR just in case	2023-09-05 13:47:27 -04:00
NGA-TRAN	399c0e257d	chore: prepare a revert PR just in case	2023-09-05 13:26:18 -04:00
Nga Tran	fb453ede1e	chore: reland 'teach compactor to use sortkey_ids' after catalog migration is fixed (#8575 )	2023-09-05 17:05:13 +00:00
Fraser Savage	6e6970cfe1	test(ingester): Cover precedence of `IngestState::read_with_exceptions()`	2023-09-05 17:49:53 +01:00
Fraser Savage	c15dfb25b4	test(ingester): Expand `IngestState::read_with_exceptions()` testing This covers multiple error states and multiple exceptions return the expected results.	2023-09-05 17:49:51 +01:00
Fraser Savage	be9064c75f	feat(ingester): Allow read of `IngestState` with exceptions This will enable some subsystems to trivially respect any `IngestStateError` set while ignoring specific errors which they may be responsible for resolving (such as WAL replay needing to ingest from disk when `DiskFull` is set).	2023-09-05 16:49:41 +01:00

... 3 4 5 6 7 ...

49211 Commits (1827866d00f0c77c05967552ff1c55fccd7a896c) All Branches Search

49211 Commits (1827866d00f0c77c05967552ff1c55fccd7a896c)

All Branches