influxdb

Commit Graph

Author	SHA1	Message	Date
Carol (Nichols \|\| Goulding)	0994264152	refactor: Use a test constant rather than redefining	2023-09-15 13:09:35 -04:00
Carol (Nichols \|\| Goulding)	553e34a7f3	refactor: Share some test constants in a common parent module	2023-09-15 13:09:35 -04:00
Fraser Savage	827f4beb05	feat(ingester): Expose metric for finished replay of whole & truncated segment files	2023-09-15 16:47:41 +01:00
Fraser Savage	bc3b421618	fix(ingester): Include truncated WAL file for max sequence number calculation	2023-09-15 16:24:43 +01:00
Martin Hilton	421b78e48b	feat(iox_query): support timezone in gap-filling (#8745 ) When gap-filling make the output time array have the same timezone as the imput time array. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-15 14:55:16 +00:00
Fraser Savage	32853b77ed	refactor(ingester): Only discard truncated write during replay for final segment This moves the error handling up the the file level replay loop, being stricter about which files are considered "replayed" when they are truncated. Any files other than the most recent segment file which encounter and unexpected are not considered to be safe to replay and discard.	2023-09-15 15:45:58 +01:00
Carol (Nichols \|\| Goulding)	fb351ac3e1	test: Add a test encoding expected behavior of validate_or_insert_schema (#8738 ) I was confused about whether validate_or_insert_schema should return all columns a table has in the catalog if another process has added some. Dom explained that no, this is by design-- the validate_or_insert_schema function shouldn't be fetching any extra columns from the catalog, only inserting missing columns from the diff set being processed during a write. The NamespaceCache/gossip system takes care of eventually converging schemas at a higher level. To avoid anyone having to go through the understanding path I just did, encode this expected behavior in a test for future reference. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-15 14:36:19 +00:00
Martin Hilton	ada65d7389	feat(query_functions): suport timezone in selector_* functions (#8742 ) Update the selector functions to output the selected time in the same timezone as input time array. This will not have any effect on the rest of the system yet as timezones are not used anywhere. This change is being done in preparation for making use of timezones. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-15 11:54:28 +00:00
dependabot[bot]	8fddbc395b	chore(deps): Bump mockito from 1.1.0 to 1.1.1 (#8741 ) Bumps [mockito](https://github.com/lipanski/mockito) from 1.1.0 to 1.1.1. - [Release notes](https://github.com/lipanski/mockito/releases) - [Commits](https://github.com/lipanski/mockito/compare/1.1.0...1.1.1) --- updated-dependencies: - dependency-name: mockito dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-15 08:51:03 +00:00
kodiakhq[bot]	477e6df639	Merge pull request #8735 from influxdata/dom/persist-threads perf: use half of logical cores for persist exec	2023-09-14 16:05:27 +00:00
Dom Dwyer	1bb4c08067	perf: use half of logical cores for persist exec Changes the default ingester configuration to assign half the logical cores to datafusion for persist execution. Prior to this commit, datafusion always used 4 threads by default. In situations where the ingesters are configured with 4 logical cores or less, the periodic persist can start enough persist jobs to keep the 4 threads assigned to datafusion busy. Because there are enough threads to saturate all CPU cores, these CPU-heavy persist threads can impact write latency by stealing CPU time from the tokio runtime threads. This change assigns exactly half the threads to DF by default, ensuring there's always N/2 cores to service I/O heavy API requests.	2023-09-14 17:54:33 +02:00
Dom Dwyer	7ad26e6d0e	feat: only limit non-empty partitions This changes the per-namespace buffered partition limiter to only consider non-empty partitions when enforcing the partition limit. Non-empty partitions cost a small amount of RAM, but are not added to the persist queue - only non-empty partitions will need persisting, so the limiter only needs to limit non-empty partitions. This commit also significantly improves the consistency properties of the limiter - the limit no longer suffers from a small window of "overrun" due to non-atomic updates w.r.t partition creation - the limit is now exact. As an optimisation, partitions are not created at all if the limit has been reached, preventing an accumulation of empty partitions whilst the limit is being enforced.	2023-09-14 16:50:48 +02:00
Dom Dwyer	3978b07a43	feat(ingester): partition is_empty() Adds an is_empty() method to the PartitionData, returning true iff a subsequent query of the partition would return no rows.	2023-09-14 16:50:23 +02:00
Dom	11604f1f70	Merge pull request #8733 from influxdata/dom/partition-use-builder test: accept PartitionDataBuilder in provider	2023-09-14 15:34:42 +01:00
Dom	b0b93a1225	Merge branch 'main' into dom/partition-use-builder	2023-09-14 15:28:35 +01:00
Nga Tran	ac426fe5e1	feat: ingester reads `sort_key_ids` instead of `sort_key` (#8588 ) * feat: have ingester's SortKeyState include sort_key_ids * fix: test failures * chore: address review comments * feat: first step to compare sort_key_ids * feat: compare sort_key_ids in cas_sort_key * fix: comment typos * feat: ingester reads sort_key_ids instead of sort_key * refactor: use direct assert instead of going true a function * chore: fix typo * test: add tests and comments * chore: fix typos * test: add more test to handle empty sort key * chore: address review comments * fix: type * chore: address review comments --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-14 14:06:21 +00:00
Fraser Savage	f5bfe72c7b	refactor(wal): Remove `IncompleteEntry` reader error	2023-09-14 14:59:57 +01:00
Dom Dwyer	429d90cde9	test: accept PartitionDataBuilder in provider Use the PartitionDataBuilder in the MockPartitionProvider, allowing the test caller to specify any necessary parameters, but still allow the mock provider to inject the arguments it was called with.	2023-09-14 15:41:14 +02:00
Fraser Savage	a160b97977	chore(ingester): Address review comments Lower log from error to warn, clarify. Undo rename of `replay_file`. Co-authored-by: Dom <dom@itsallbroken.com>	2023-09-14 12:02:36 +01:00
Dom	75ceafff54	Merge pull request #8723 from influxdata/dom/ingester-partition-bound feat(ingester): buffered partition limit	2023-09-14 11:25:48 +01:00
Dom	0ea2dfbe01	Merge branch 'main' into dom/ingester-partition-bound	2023-09-14 11:19:44 +01:00
kodiakhq[bot]	a07596f05f	Merge pull request #8672 from influxdata/savage/respect-ingest-system-state-during-wal-replay feat(ingester): Allow read of `IngestState` with exceptions	2023-09-14 09:55:59 +00:00
kodiakhq[bot]	dd0ee28e02	Merge branch 'main' into savage/respect-ingest-system-state-during-wal-replay	2023-09-14 09:50:03 +00:00
Fraser Savage	04c5e89c96	test(ingester): Add cover of multi-exceptions back to `IngestState` This adds a level of assurance that multiple error states set are ignored when they are all are present in the exceptions, while disjoint error states and exceptions return an error. Arbitrary sets could be covered, but would like require taking a non-const array for `read_with_exceptions`.	2023-09-14 10:42:36 +01:00
dependabot[bot]	0d51a1ca6f	chore(deps): Bump serde_json from 1.0.106 to 1.0.107 (#8731 ) Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.106 to 1.0.107. - [Release notes](https://github.com/serde-rs/json/releases) - [Commits](https://github.com/serde-rs/json/compare/v1.0.106...v1.0.107) --- updated-dependencies: - dependency-name: serde_json dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-14 09:28:30 +00:00
dependabot[bot]	71315d8ab6	chore(deps): Bump toml from 0.7.8 to 0.8.0 (#8730 ) Bumps [toml](https://github.com/toml-rs/toml) from 0.7.8 to 0.8.0. - [Commits](https://github.com/toml-rs/toml/compare/toml-v0.7.8...toml-v0.8.0) --- updated-dependencies: - dependency-name: toml dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-14 09:22:18 +00:00
Marco Neumann	4ad21e1eca	feat: decode time portion of the partition key (#8725 ) * refactor: make partition key parsing more flexible * feat: decode time portion of the partition key Helpful for #8705 because we can prune partitions earlier during the query planning w/o having to consider their parquet files at all.	2023-09-14 09:15:11 +00:00
Marco Neumann	b5c0c9c167	feat: allow fallback to generic TS column range for chunk stats (#8724 ) This will be useful for #8705. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-14 08:37:50 +00:00
dependabot[bot]	82c45a798c	chore(deps): Bump libc from 0.2.147 to 0.2.148 (#8729 ) Bumps [libc](https://github.com/rust-lang/libc) from 0.2.147 to 0.2.148. - [Release notes](https://github.com/rust-lang/libc/releases) - [Commits](https://github.com/rust-lang/libc/compare/0.2.147...0.2.148) --- updated-dependencies: - dependency-name: libc dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-14 08:30:21 +00:00
Marco Neumann	c49c6159ef	refactor: change "normalization" in projected schema cache (#8720 ) * refactor: "projected schema" cache inputs must be normalized Normalizing under the hood and returning normalized schemas w/o the user knowing about it is a good source for subtle bugs. * refactor: do not normalize projected schema by name Normalizing makes it harder to predict the output and potentially requires additional string lookups just to work with the schema. * fix: typos Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: Martin Hilton <mhilton@influxdata.com> --------- Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: Martin Hilton <mhilton@influxdata.com>	2023-09-13 15:25:38 +00:00
Dom Dwyer	9e3f7611bb	feat(ingester): buffered partition limit This commit adds an optional (disabled by default) limit on the number partitions that may be buffered for a namespace at any one time. The exact value is configurable by setting INFLUXDB_IOX_MAX_PARTITIONS_PER_NAMESPACE to a non-zero value, and is disabled unless specified.	2023-09-13 17:14:43 +02:00
Dom Dwyer	db5ad12b9a	docs: remove misleading documentation In an ArcMap, an init() function is called exactly once, this sentence was supposed to suggest threads race to call init, but instead it sounds like they race to initialise a V (via init()) and put it in the map before the other thread, which is incorrect.	2023-09-13 17:14:42 +02:00
Dom	e3e66145da	Merge pull request #8722 from influxdata/dom/table-metadata refactor: move table metadata alongside resolver	2023-09-13 16:14:33 +01:00
Fraser Savage	eaa63c6392	test(ingester): Use simpler set logic for `read_with_exception` test	2023-09-13 15:53:16 +01:00
Dom Dwyer	9379a227c4	refactor: move table metadata alongside resolver We already have a metadata resolver, so lets stick the metadata types next to it.	2023-09-13 15:54:02 +02:00
Fraser Savage	7174262d4b	chore(ingester): DRY `IngestState` mask Co-authored-by: Dom <dom@itsallbroken.com>	2023-09-13 11:46:40 +01:00
dependabot[bot]	2477bdbbee	chore(deps): Bump clap from 4.4.2 to 4.4.3 (#8719 ) Bumps [clap](https://github.com/clap-rs/clap) from 4.4.2 to 4.4.3. - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v4.4.2...v4.4.3) --- updated-dependencies: - dependency-name: clap dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-13 09:17:52 +00:00
Nga Tran	a86c77213e	feat: compare sort_key_ids in cas_sort_key (#8579 ) * feat: have ingester's SortKeyState include sort_key_ids * fix: test failures * chore: address review comments * feat: first step to compare sort_key_ids * feat: compare sort_key_ids in cas_sort_key * fix: comment typos * refactor: use direct assert instead of going true a function * chore: fix typo * test: add tests and comments * chore: fix typos * test: add more test to handle empty sort key * chore: address review comments --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-12 15:10:57 +00:00
Marco Neumann	c466f469ea	refactor: optimize `PartitionHashId` hashing (#8712 ) There is no need to hash a hash. Found while investigating https://github.com/influxdata/EAR/issues/4505 and the hashing code turned up in the profile. In general, hashing IDs should be pretty cheap. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-12 08:35:13 +00:00
Marco Neumann	d31f54754f	refactor: replace tokio `oneshot` w/ futures `oneshot` (#8713 ) Tokio oneshots have A LOT of overhead: `61042b4d90/tokio/src/sync/oneshot.rs (L1091-L1097)` For a particular case that I've debugged (https://github.com/influxdata/EAR/issues/4505), that change alone decreases the "cold" query time from 16s to 11s. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-12 08:29:18 +00:00
Joe-Blount	ce34d4ffa3	fix: handle oversized files in compactor (#8700 )	2023-09-12 00:18:56 +00:00
Andrew Lamb	ed2da2a831	Revert "chore: Update DataFusion pin (#8698 )" (#8714 ) This reverts commit `74c0851fc2`. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-11 17:19:04 +00:00
Dom	c0f62ce3e1	Merge pull request #8709 from influxdata/dom/rpcwrite-health-livelock fix(router): health probe livelock / missing probes	2023-09-11 18:07:34 +01:00
Dom	ffa3c39dbc	Merge branch 'main' into dom/rpcwrite-health-livelock	2023-09-11 17:37:07 +01:00
Marco Neumann	a17c9a2bf8	refactor: use `BTreeSet` in addressable heap (#8710 ) * refactor: improve addressable heap benchmarks - don't use the full range for key and order so it is easier to generate new entries at the top/bottom - differentiate between "update order to a new random one" and "update order to the last one", since the latter one is the de facto standard case for the LRU cache * refactor: use `BTreeSet` in addressable heap Our cache system has mostly reads and only a few writes (like the change rate is low or in other words: we have a high cache HIT rate). It makes sense to optimize the data structures accordingly. While investigating https://github.com/influxdata/EAR/issues/4505 it turned out that the bookkeeping for read operations is quite expensive. This PR improves that by optimizing the "addressable heap". That data structure is basically a map K->V but every entry also has an order O which can be used to "pop" the first element. This is used to implement the LRU cache where K is the cache key and O is the "least recently used" time (V is unused, but that's irrelevant for the problem described here). The former queue impl. is very expensive for `update_order`, because you essentially copy large parts of the `VecDeque` twice (once for remove, once for insert). B-trees handle updates of individual keys better, even if they are less efficient in the "tiny case" (i.e. just a few keys). Performance Results =================== The summary is: - `insert_n_elements`: improved, not surprising because inserting into the middle of the vector was pretty slow - `peek_after_n_elements`: quite noisy, maybe regressed slightly (<+30%) which is OK. Peeking the `VecDeque` is pretty cheap compared to the B-tree. - `get_existing_after_n_elements`: noisy, mostly identical or slightly slower (<+30%) - `get_new_after_n_elements`: same, there shouldn't be a difference because the `VecDeque` / B-tree isn't touched for elements that are not found, so if there is a difference this is probably LLVM codegen stuff - `pop_n_elements`: regressed by up to +200%, which makes sense given how cheap removing elements in order from the `VecDeque` is compared to the B-tree. However this operation is only used under memory pressure when new data is added to the cache system, which is a rare event and quite expensive to begin with, so this is unlikely to make a noticeable difference in practice - `remove_existing_after_n_elements`: improved up to -70%, which makes sense because removing from the `VecDeque` is very expensive due to the data copy/move - `remove_new_after_n_elements`: mostly noise, should logically be the same because the `VecDeque`/B-tree isn't touched if the element doesn't exist - `replace_after_n_elements`: this is "remove"+"insert", improved accordingly - `update_order_existing_to_random_after_n_elements`: optimized version of "replace" that touches the internal `HashMap` less often, improved accordingly - `update_order_existing_to_first_after_n_elements`: THE prime case that we where aiming for because it is used to update the "least recently used" times and is used for every read. Improved by at least -70%, same argument as "replace" and "remove". - `update_order_new_after_n_elements`: mostly noise / same, `VecDeque`/B-tree isn't touched in this case The detailed results are: <details> ``` ❯ cargo bench -p cache_system --bench addressable_heap -- --baseline btree-pre Compiling cache_system v0.1.0 (/home/mneumann/src/influxdb_iox/cache_system) Finished bench [optimized + debuginfo] target(s) in 8.48s Running benches/addressable_heap.rs (target/release/deps/addressable_heap-e6a3281d83b52007) insert_n_elements/0 time: [12.621 ns 12.629 ns 12.640 ns] change: [-3.6323% -3.5654% -3.4848%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe insert_n_elements/1 time: [77.540 ns 77.665 ns 77.824 ns] change: [-6.7121% -6.5483% -6.3759%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe insert_n_elements/10 time: [565.61 ns 565.89 ns 566.22 ns] change: [-20.328% -20.232% -20.156%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe insert_n_elements/100 time: [7.3376 µs 7.3438 µs 7.3499 µs] change: [-2.7056% -2.6147% -2.5146%] (p = 0.00 < 0.05) Performance has improved. insert_n_elements/1000 time: [97.249 µs 97.335 µs 97.435 µs] change: [-13.880% -13.804% -13.717%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe insert_n_elements/10000 time: [1.1371 ms 1.1386 ms 1.1403 ms] change: [-61.699% -61.650% -61.594%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) low mild 2 (2.00%) high mild 6 (6.00%) high severe peek_after_n_elements/0 time: [7.2367 ns 7.2398 ns 7.2430 ns] change: [-33.159% -33.109% -33.064%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild peek_after_n_elements/1 time: [27.329 ns 27.344 ns 27.361 ns] change: [-15.114% -15.047% -14.968%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe peek_after_n_elements/10 time: [29.520 ns 29.534 ns 29.548 ns] change: [-6.4913% -6.3567% -6.2216%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild peek_after_n_elements/100 time: [36.132 ns 36.192 ns 36.264 ns] change: [-2.5660% -2.3400% -2.1099%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe peek_after_n_elements/1000 time: [40.246 ns 40.312 ns 40.380 ns] change: [+19.265% +19.583% +19.892%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 7 (7.00%) high mild peek_after_n_elements/10000 time: [53.840 ns 54.581 ns 55.585 ns] change: [+21.970% +23.892% +26.167%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe get_existing_after_n_elements/1 time: [29.431 ns 29.451 ns 29.474 ns] change: [-9.8425% -9.7508% -9.6523%] (p = 0.00 < 0.05) Performance has improved. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low mild 6 (6.00%) high mild 3 (3.00%) high severe get_existing_after_n_elements/10 time: [28.923 ns 28.950 ns 28.981 ns] change: [-14.267% -14.160% -14.042%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe get_existing_after_n_elements/100 time: [36.280 ns 36.328 ns 36.384 ns] change: [+4.3315% +4.5087% +4.6797%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 8 (8.00%) high mild 4 (4.00%) high severe get_existing_after_n_elements/1000 time: [33.089 ns 33.166 ns 33.284 ns] change: [+6.6989% +7.5179% +8.1580%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe get_existing_after_n_elements/10000 time: [36.041 ns 36.145 ns 36.252 ns] change: [-6.3166% -4.6691% -3.3693%] (p = 0.00 < 0.05) Performance has improved. get_new_after_n_elements/0 time: [9.3569 ns 9.3597 ns 9.3626 ns] change: [-38.034% -37.914% -37.820%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild get_new_after_n_elements/1 time: [25.029 ns 25.054 ns 25.095 ns] change: [-13.193% -13.094% -12.946%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe get_new_after_n_elements/10 time: [26.160 ns 26.176 ns 26.191 ns] change: [-13.138% -13.072% -12.999%] (p = 0.00 < 0.05) Performance has improved. get_new_after_n_elements/100 time: [35.086 ns 35.135 ns 35.187 ns] change: [-1.9261% -1.7566% -1.5940%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild get_new_after_n_elements/1000 time: [28.929 ns 29.010 ns 29.103 ns] change: [-0.1152% +0.2217% +0.5492%] (p = 0.20 > 0.05) No change in performance detected. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe get_new_after_n_elements/10000 time: [43.733 ns 44.226 ns 44.830 ns] change: [+23.061% +24.885% +27.016%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe pop_n_elements/0 time: [7.6553 ns 7.6586 ns 7.6619 ns] change: [-27.593% -27.545% -27.501%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild pop_n_elements/1 time: [43.055 ns 43.089 ns 43.136 ns] change: [+10.568% +10.680% +10.798%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe pop_n_elements/10 time: [347.19 ns 347.42 ns 347.68 ns] change: [+71.785% +71.998% +72.180%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe pop_n_elements/100 time: [4.6767 µs 4.6793 µs 4.6819 µs] change: [+104.38% +104.56% +104.74%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild pop_n_elements/1000 time: [44.635 µs 44.724 µs 44.872 µs] change: [+186.13% +186.84% +187.92%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high severe pop_n_elements/10000 time: [479.51 µs 479.92 µs 480.34 µs] change: [+179.63% +180.01% +180.38%] (p = 0.00 < 0.05) Performance has regressed. remove_existing_after_n_elements/1 time: [52.939 ns 52.996 ns 53.077 ns] change: [+16.171% +16.313% +16.529%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe remove_existing_after_n_elements/10 time: [62.905 ns 62.968 ns 63.038 ns] change: [+0.5731% +0.6847% +0.8053%] (p = 0.00 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe remove_existing_after_n_elements/100 time: [97.206 ns 97.310 ns 97.416 ns] change: [+0.3006% +0.5036% +0.6846%] (p = 0.00 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) low mild 3 (3.00%) high mild remove_existing_after_n_elements/1000 time: [119.75 ns 120.18 ns 120.63 ns] change: [-10.255% -9.8892% -9.5151%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild remove_existing_after_n_elements/10000 time: [160.66 ns 162.29 ns 164.06 ns] change: [-71.326% -70.589% -69.801%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) high mild 1 (1.00%) high severe remove_new_after_n_elements/0 time: [21.056 ns 21.067 ns 21.080 ns] change: [-13.898% -13.844% -13.783%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe remove_new_after_n_elements/1 time: [24.313 ns 24.322 ns 24.332 ns] change: [-13.770% -13.691% -13.623%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild remove_new_after_n_elements/10 time: [25.672 ns 25.686 ns 25.700 ns] change: [-12.717% -12.650% -12.585%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild remove_new_after_n_elements/100 time: [35.031 ns 35.060 ns 35.091 ns] change: [-1.9734% -1.8181% -1.6578%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild remove_new_after_n_elements/1000 time: [29.874 ns 29.978 ns 30.109 ns] change: [-5.3132% -4.4315% -3.6840%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high severe remove_new_after_n_elements/10000 time: [36.628 ns 36.872 ns 37.164 ns] change: [-17.430% -16.074% -14.738%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe replace_after_n_elements/1 time: [55.542 ns 55.609 ns 55.693 ns] change: [+17.578% +17.732% +17.940%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe replace_after_n_elements/10 time: [77.437 ns 77.529 ns 77.642 ns] change: [-13.348% -13.240% -13.114%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe replace_after_n_elements/100 time: [155.57 ns 155.74 ns 155.92 ns] change: [+3.9468% +4.2718% +4.5275%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild replace_after_n_elements/1000 time: [192.74 ns 193.47 ns 194.29 ns] change: [-19.267% -18.905% -18.445%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe replace_after_n_elements/10000 time: [271.85 ns 275.39 ns 279.82 ns] change: [-75.027% -74.447% -73.816%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe update_order_existing_to_random_after_n_elements/1 time: [56.459 ns 56.484 ns 56.515 ns] change: [+15.045% +15.128% +15.219%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe update_order_existing_to_random_after_n_elements/10 time: [79.163 ns 79.229 ns 79.327 ns] change: [-14.176% -14.090% -13.963%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe update_order_existing_to_random_after_n_elements/100 time: [138.88 ns 139.28 ns 139.90 ns] change: [-9.5444% -9.2390% -8.7987%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 2 (2.00%) high mild 1 (1.00%) high severe update_order_existing_to_random_after_n_elements/1000 time: [187.01 ns 188.09 ns 189.60 ns] change: [-21.513% -20.638% -19.817%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe update_order_existing_to_random_after_n_elements/10000 time: [271.94 ns 278.12 ns 285.56 ns] change: [-75.123% -74.395% -73.590%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe update_order_existing_to_first_after_n_elements/1 time: [54.234 ns 54.270 ns 54.310 ns] change: [+6.9327% +7.0343% +7.1302%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe update_order_existing_to_first_after_n_elements/10 time: [68.528 ns 68.598 ns 68.678 ns] change: [-10.464% -10.352% -10.242%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe update_order_existing_to_first_after_n_elements/100 time: [114.73 ns 114.85 ns 114.98 ns] change: [+1.2309% +1.3810% +1.5303%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild update_order_existing_to_first_after_n_elements/1000 time: [148.20 ns 149.64 ns 151.63 ns] change: [-18.039% -17.103% -15.865%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe update_order_existing_to_first_after_n_elements/10000 time: [195.62 ns 198.87 ns 203.28 ns] change: [-71.482% -70.401% -69.197%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe update_order_new_after_n_elements/0 time: [8.0889 ns 8.0925 ns 8.0961 ns] change: [-84.003% -83.966% -83.943%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high mild update_order_new_after_n_elements/1 time: [24.156 ns 24.165 ns 24.174 ns] change: [-16.500% -16.448% -16.400%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) low mild 1 (1.00%) high mild update_order_new_after_n_elements/10 time: [26.888 ns 26.906 ns 26.925 ns] change: [-11.976% -11.882% -11.791%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe update_order_new_after_n_elements/100 time: [34.980 ns 35.024 ns 35.070 ns] change: [-6.7389% -6.5648% -6.3944%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 6 (6.00%) high mild 1 (1.00%) high severe update_order_new_after_n_elements/1000 time: [29.538 ns 29.588 ns 29.643 ns] change: [+0.3276% +0.5565% +0.7915%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe update_order_new_after_n_elements/10000 time: [35.260 ns 35.536 ns 35.873 ns] change: [-2.7360% -1.5665% -0.2156%] (p = 0.01 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe ``` </details>	2023-09-11 14:50:20 +00:00
Andrew Lamb	74c0851fc2	chore: Update DataFusion pin (#8698 ) * chore: Update DataFusion pin * chore: Update for new API * fix: fix test * fix: only check error messages --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-11 13:54:24 +00:00
Dom Dwyer	a513301e23	fix(router): health probe livelock / missing probe When an upstream ingester goes offline, the "circuit breaker" detects it as unhealthy, and prevents further requests being sent to it. Periodically a small number of requests are allowed ("probe requests") to check for recovery. If a write request is selected as a "probe request", it SHOULD be sent - a limited number writes are selected as probes, and enough have to be successful to drive recovery. If no probes are ever sent/successful, the upstream will never be marked as healthy. Additionally the RPC handler applies an optimisation: if the number of ingesters selected to service a write is less than the number needed to successfully reach the desired replication factor, no requests are sent and an error is returned immediately, preventing unnecessary system load for writes that would never succeed. This optimisation conflicts with the probe request requirement when a replication factor of >= 2 is specified: * All ingesters are offline * Write comes in * UpstreamSnapshot is populated with a probe request for 1 ingester only - no other healthy candidate ingesters exist. * Optimisation applied: 1 probe candidate < 2 needed for replication This results in a probe request never being sent, and in turn, never allowing further requests to the recovered upstream. This fix changes the optimisation, applying it only when there are no probes in the candidate ingester list - the write will always fail, but it will drive detection of recovered ingesters and maintain liveness of the system.	2023-09-11 15:24:40 +02:00
Dom Dwyer	03a15aee62	refactor: UpstreamSnapshot aware of probe requests Allows the UpstreamSnapshot to be initialised with a "contains probe" boolean indicator that's passed through to the RPC layer.	2023-09-11 14:06:30 +02:00
Marco Neumann	3bdaafe36a	refactor: optimize hash-collection constructions (#8707 ) * refactor: optimize `SortKey` construction * refactor: optimize column set construction * refactor: optimize "should cover" calculcation for partitions --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-11 09:46:47 +00:00
dependabot[bot]	e440740aed	chore(deps): Bump croaring from 0.9.0 to 1.0.0 (#8703 ) * chore(deps): Bump croaring from 0.9.0 to 1.0.0 Bumps [croaring](https://github.com/RoaringBitmap/croaring-rs) from 0.9.0 to 1.0.0. - [Release notes](https://github.com/RoaringBitmap/croaring-rs/releases) - [Commits](https://github.com/RoaringBitmap/croaring-rs/commits/1.0.0) --- updated-dependencies: - dependency-name: croaring dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * refactor: fix croaring upgrade and remove dead code --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Marco Neumann <marco@crepererum.net>	2023-09-11 09:41:16 +00:00

... 5 6 7 8 9 ...

49353 Commits (praveen/perf-troubleshoot-flightsql) All Branches Search

49353 Commits (praveen/perf-troubleshoot-flightsql)

All Branches