influxdb

Commit Graph

Author	SHA1	Message	Date
Dom	ffa3c39dbc	Merge branch 'main' into dom/rpcwrite-health-livelock	2023-09-11 17:37:07 +01:00
Dom Dwyer	a513301e23	fix(router): health probe livelock / missing probe When an upstream ingester goes offline, the "circuit breaker" detects it as unhealthy, and prevents further requests being sent to it. Periodically a small number of requests are allowed ("probe requests") to check for recovery. If a write request is selected as a "probe request", it SHOULD be sent - a limited number writes are selected as probes, and enough have to be successful to drive recovery. If no probes are ever sent/successful, the upstream will never be marked as healthy. Additionally the RPC handler applies an optimisation: if the number of ingesters selected to service a write is less than the number needed to successfully reach the desired replication factor, no requests are sent and an error is returned immediately, preventing unnecessary system load for writes that would never succeed. This optimisation conflicts with the probe request requirement when a replication factor of >= 2 is specified: * All ingesters are offline * Write comes in * UpstreamSnapshot is populated with a probe request for 1 ingester only - no other healthy candidate ingesters exist. * Optimisation applied: 1 probe candidate < 2 needed for replication This results in a probe request never being sent, and in turn, never allowing further requests to the recovered upstream. This fix changes the optimisation, applying it only when there are no probes in the candidate ingester list - the write will always fail, but it will drive detection of recovered ingesters and maintain liveness of the system.	2023-09-11 15:24:40 +02:00
Dom Dwyer	03a15aee62	refactor: UpstreamSnapshot aware of probe requests Allows the UpstreamSnapshot to be initialised with a "contains probe" boolean indicator that's passed through to the RPC layer.	2023-09-11 14:06:30 +02:00
dependabot[bot]	5cd9c37519	chore(deps): Bump base64 from 0.21.3 to 0.21.4 (#8701 ) Bumps [base64](https://github.com/marshallpierce/rust-base64) from 0.21.3 to 0.21.4. - [Changelog](https://github.com/marshallpierce/rust-base64/blob/master/RELEASE-NOTES.md) - [Commits](https://github.com/marshallpierce/rust-base64/compare/v0.21.3...v0.21.4) --- updated-dependencies: - dependency-name: base64 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-11 08:14:43 +00:00
dependabot[bot]	cb2d6d1d25	chore(deps): Bump chrono from 0.4.29 to 0.4.30 (#8693 ) Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.29 to 0.4.30. - [Release notes](https://github.com/chronotope/chrono/releases) - [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md) - [Commits](https://github.com/chronotope/chrono/compare/v0.4.29...v0.4.30) --- updated-dependencies: - dependency-name: chrono dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-08 13:03:27 +00:00
dependabot[bot]	7f20b0faa0	chore(deps): Bump bytes from 1.4.0 to 1.5.0 (#8692 ) Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.4.0 to 1.5.0. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/bytes/compare/v1.4.0...v1.5.0) --- updated-dependencies: - dependency-name: bytes dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-08 12:17:12 +00:00
dependabot[bot]	4f6864c0b9	chore(deps): Bump chrono from 0.4.28 to 0.4.29 (#8677 ) * chore(deps): Bump chrono from 0.4.28 to 0.4.29 Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.28 to 0.4.29. - [Release notes](https://github.com/chronotope/chrono/releases) - [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md) - [Commits](https://github.com/chronotope/chrono/compare/v0.4.28...v0.4.29) --- updated-dependencies: - dependency-name: chrono dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * fix: deprecations --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-09-06 09:20:58 +00:00
Dom	f5bb59ded9	Merge branch 'main' into dom/memory-cache-arc	2023-09-05 14:35:01 +01:00
Dom Dwyer	021e22b5bf	refactor: remove Arc wrap from ReadThroughCache This Arc was unnecessary in most uses.	2023-09-05 14:15:35 +02:00
Dom Dwyer	529f11e85d	refactor: remove Arc wrap from InstrumentedCache This Arc is unnecessary in most calls.	2023-09-05 14:11:17 +02:00
Dom Dwyer	b200d82d0f	docs: remove outdated cache race warning Concurrent writes to the cache no longer overwrite each other - entries are now merged.	2023-09-05 13:55:56 +02:00
Dom Dwyer	bcdafa5f25	refactor: remove Arc wrapper from ShardedCache This Arc wrapper is unnecessary.	2023-09-05 13:49:57 +02:00
Dom Dwyer	51096119be	refactor: remove Arc from MemoryNamespaceCache Prior to this commit, the NamespaceCache was only implemented for Arc<MemoryNamespaceCache> instead of the cache type itself. In the vast majority of cases, this Arc wrapper is completely unnecessary - it adds both runtime overhead, and code/type complexity. This commit impls NamespaceCache for any Arc-wrapped NamespaceCache, and removes all unnecessary Arc wrapping of the MemoryNamespaceCache.	2023-09-05 13:47:00 +02:00
dependabot[bot]	f631b13fb0	chore(deps): Bump chrono from 0.4.27 to 0.4.28 (#8622 ) Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.27 to 0.4.28. - [Release notes](https://github.com/chronotope/chrono/releases) - [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md) - [Commits](https://github.com/chronotope/chrono/compare/v0.4.27...v0.4.28) --- updated-dependencies: - dependency-name: chrono dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-08-31 09:49:55 +00:00
dependabot[bot]	4ce11fd9f2	chore(deps): Bump chrono from 0.4.26 to 0.4.27 (#8607 ) * chore(deps): Bump chrono from 0.4.26 to 0.4.27 Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.26 to 0.4.27. - [Release notes](https://github.com/chronotope/chrono/releases) - [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md) - [Commits](https://github.com/chronotope/chrono/compare/v0.4.26...v0.4.27) --- updated-dependencies: - dependency-name: chrono dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks * fix: Update deprecated chrono methods to their now-recommended versions `chrono::DateTime::<Tz>::from_utc` has been deprecated and it's now recommended to use `chrono::DateTime::from_naive_utc_and_offset` instead. <https://github.com/chronotope/chrono/pull/1175> Note that the `Timestamp` type in `influxdb_influxql_parser` is an alias for `chrono::DateTime<FixedOffset>`. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: Carol (Nichols \|\| Goulding) <carol.nichols@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-08-31 09:18:25 +00:00
Dom Dwyer	4414c6940b	refactor: move MST et al into module Adds a "mst" (merkle search tree) submodule in anti_entropy, and moves all the MST code into it. This makes space for a gossip-based sync primitive to live here too.	2023-08-30 14:47:15 +02:00
Dom Dwyer	e120935a61	fix(bench): reuse same namespace cache And add another thread for the hashing to happen on now there's two async tasks to run.	2023-08-30 12:07:15 +02:00
Dom Dwyer	38f67ae736	perf(anti_entropy): add AntiEntropyActor for MST Separate the management of the Merkle Search Tree state into an actor to manage concurrent access. This moves hashing and tree updates off of the hot request path, and into an asynchronous background process, practically eliminating overhead for maintaining the MST structure. This decoupling will allow convergence runs between peers to proceed without causing contention on the lock in the request hot path.	2023-08-30 12:07:15 +02:00
Dom Dwyer	934e4bc9c6	test(bench): MerkleTree overhead Layers the MerkleTree over the NamespaceCache stack in benchmarks to measure the overhead.	2023-08-30 12:07:14 +02:00
Dom Dwyer	9055bc1754	ci: lint failures as warnings Don't fail to compile / run tests because of an unreachable pub, or missing debug impl - just emit a compiler warning. This lets the compilation complete, but isn't accepted in PRs as CI runs with "deny warnings".	2023-08-29 16:29:31 +02:00
Dom	15da02b59f	Merge branch 'main' into dom/merkle-cache	2023-08-29 14:21:26 +01:00
Dom Dwyer	c1ba3918a4	test: cache content hash fixture Assert the hash representing static cache content does not change.	2023-08-29 13:57:39 +02:00
Dom Dwyer	19e7c90fc1	test: compose prop::Strategy for schema generation Accept an arbitrary ID generation strategy, composing it into a NamespaceSchema generation strategy to simplify the call args / usage.	2023-08-29 13:47:22 +02:00
Dom Dwyer	932532c3e3	fix(bench): schema validator bench panic The benchmark code was completely broken - running the benches would immediately panic.	2023-08-29 12:41:47 +02:00
Dom Dwyer	124b3d2b42	test: MerkleTree cache decorator Adds an integration test asserting the derived MST content hashes accurately track updates to an underlying cache entry merge implementation. This ensures the merge implementation, and content hashes do not become out-of-sync.	2023-08-29 12:26:42 +02:00
Dom Dwyer	b694b9f494	feat(router): Merkle tree content hash for cache Adds a (currently unused) NamespaceCache decorator that observes the post-merge content of the cache to maintain a content hash. This makes use of a Merkle Search Tree (https://inria.hal.science/hal-02303490) to track the CRDT content of the cache in a deterministic structure that allows for synchronisation between peers (itself a CRDT). The hash of two routers' NamespaceCache will be equal iff their cache contents are equal - this can be used to (very cheaply) identify out-of-sync routers, and trigger convergence. The MST structure used here provides functionality to compare two compact MST representations, and identify subsets of the cache that are out-of-sync, allowing for cheap convergence. Note this content hash only covers the tables, their set of columns, and those column schemas - this reflects the fact that only these values may currently be converged by gossip. Future work will enable full convergence of all fields.	2023-08-29 12:19:43 +02:00
Carol (Nichols \|\| Goulding)	12b8095c46	feat: Upgrade to Rust 1.72.0 (#8589 ) * feat: Upgrade to Rust 1.72.0 * fix: Allow a warning about an error we're intentionally creating This is a test for an error. This lint warns that this code will cause an error. Thanks lint, that's what we wanted! * chore: rustfmt 1.72 * fix: Remove unnecessary hashes in raw string literals Thanks Clippy! https://rust-lang.github.io/rust-clippy/master/index.html#/needless_raw_string_hashes Note that there are a number of false negatives with this lint; see https://github.com/rust-lang/rust-clippy/issues/11420 * fix: Remove unnecessary explicit iteration Looks like clippy::explicit_iter_loop was improved. https://rust-lang.github.io/rust-clippy/master/index.html#/explicit_iter_loop * fix: Allow clippy::manual_try_fold in a few places Some of these might not be possible to rewrite with try_fold, or at least not trivially. I don't feel confident enough to change these, in any case. I think the lint is good to have on for future code though, so that new code can be written with try_fold. * fix: Remove useless creation of vectors when an array will do Mostly in tests. Also fix some long lines. Thanks Clippy! https://rust-lang.github.io/rust-clippy/master/index.html#/useless_vec * fix: Allow a single range in a vec init, which is actually what we want Looks like Clippy's trying to catch a common mistake here, but for realz we actually want `Vec<Range<usize>>` not `Vec<usize>` https://rust-lang.github.io/rust-clippy/master/index.html#/single_range_in_vec_init * fix: Remove a useless conversion This looks like removing explicit iteration, but it's actually caught by useless_conversion. https://rust-lang.github.io/rust-clippy/master/index.html#/useless_conversion * fix: Remove redundant pattern matching Thanks Clippy! https://rust-lang.github.io/rust-clippy/master/index.html#/redundant_pat * fix: Allow an unwrap on a literal None in a test This matches with the other tests better, and also when I tried to remove the `unwrap_or_default` it changed the JSON sent from something with an empty value to `null`, so I think the `or_default` part is actually changing from one `None` to another `None`. https://rust-lang.github.io/rust-clippy/master/index.html#/unnecessary_literal_unwrap	2023-08-29 05:57:38 +00:00
dependabot[bot]	aae478d0f5	chore(deps): Bump base64 from 0.21.2 to 0.21.3 Bumps [base64](https://github.com/marshallpierce/rust-base64) from 0.21.2 to 0.21.3. - [Changelog](https://github.com/marshallpierce/rust-base64/blob/master/RELEASE-NOTES.md) - [Commits](https://github.com/marshallpierce/rust-base64/compare/v0.21.2...v0.21.3) --- updated-dependencies: - dependency-name: base64 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2023-08-28 09:05:14 +00:00
Dom Dwyer	ee063057b3	refactor(router): use gossip_schema Replace the bespoke schema gossip logic in the router with the reusable gossip_schema crate.	2023-08-23 12:42:33 +02:00
Dom Dwyer	d35bd48f65	refactor(gossip): rename GossipMessage Now there's a Topic, there's no need for a giant "all message types" enum. As part of this shift, the gossip_message::GossipMessage used for schema gossiping is sounding overly generic. This commit changes the name to schema_message::SchemaMessage and updates the code. This is a backwards-compatible change (and if anything goes wrong, the "old" routers simply log a warning if a message is unreadable).	2023-08-22 12:06:49 +02:00
Dom Dwyer	ca29e9b0d8	feat(gossip): topic support Adds "topic" support, allowing a node to subscribe to one or more types of application payloads independently. A gossip node is optionally initialised with a set of topics (defaulting to "all topics") and this set of topic interests is propagated throughout the cluster via the usual PEX mechanism, alongside the existing connection & identity information. When broadcasting an application payload, the sender only transmits it to nodes that had registered an interest in this payload type. This prevents wasted network bandwidth and CPU for all nodes, and allows multiple, distinct payload types to be propagated independently to keep subsystems that rely on gossip decoupled from each other (no giant, brittle payload enum type).	2023-08-17 14:53:40 +02:00
Fraser Savage	a3ab4d33da	refactor(router): Revert configurable health-check `ERROR_WINDOW` Configuring the `ERROR_WINDOW` of the router's on-path health check did not provide a consistent improvement for low write volume clusters. Now that the `NUM_PROBES` parameter is configurable, this can be un-exposed to simplify configuration options and clean up boiler plate.	2023-08-08 17:49:14 +01:00
Dom Dwyer	a017d1d7f9	test: simplify integration test setup Remove the redundant async mutex (previously required, but I refactored the code to make it unnecessary) and DRY the node setup.	2023-08-03 18:02:33 +02:00
Dom Dwyer	757ecc1d03	perf(router): schema gossip between peers This commit allows schema gossiping to be enabled on router nodes. Enabling gossiping allows any schema changes made on router A to be sent to the N-1 other routers, populating their internal caches in anticipation of handling a similar request. By populating their cache, they avoid incurring a catalog lookup to populate their local state upon a cache miss, therefore reducing request latency, and reducing catalog load. Enabling gossip on the routers automatically enables schema gossiping - enabling gossip remains optional, and off by default.	2023-08-03 17:10:17 +02:00
Dom Dwyer	8928c838a8	test: schema gossip w/ default partition keys Ensure gossiping namespace & tables with empty partition keys is correct.	2023-08-03 17:10:16 +02:00
Dom Dwyer	00542f7041	test: schema gossip integration Adds an integration test ensuring the schema gossip layer added to one instance ("node A") propagates schema diffs to another ("node B").	2023-08-03 17:10:16 +02:00
Dom Dwyer	16c115d5cb	docs(router): gossip subsystem types / topology Describes the router's schema gossiping types and how they fit together.	2023-08-03 17:10:15 +02:00
Dom Dwyer	3133318e16	refactor: remove redundant NamespaceCache impl The NamespaceCache does not need to be a decorator itself - it can operate using a reference to the cache without needing access to cache requests.	2023-08-03 17:10:14 +02:00
Dom Dwyer	b1cdb928f6	refactor: always log error message Always log the actual error as it may change.	2023-08-03 16:59:06 +02:00
Dom Dwyer	fc903b8102	test: preserve duplicates in column set assertions Don't collect into a BTreeSet for sorting as it drops duplicates.	2023-08-03 16:56:56 +02:00
Dom Dwyer	7a4ed257a2	feat: send-side schema gossip implementation This commit adds the SchemaChangeObserver, the delegate which is handed a schema diff, and is responsible for computing the gossip message and handing it off to the gossip system. This sits between the cache layer, and the gossip layer, converting schema things into gossip things. This isn't connected up, so no messages will be sent.	2023-08-03 12:42:16 +02:00
Dom	a32c3d0fa8	Merge branch 'main' into dom/gossip-namespace-cache	2023-08-02 16:39:32 +01:00
Fraser Savage	ff207ec158	fix(router): Use BatchSize::NumIterations(1) for namespace schema cache benchmark Batches share the same set-up step between iterations, so using a batch size of more than 1 per setup provides inaccurate readings.	2023-08-02 13:35:55 +01:00
Dom Dwyer	10a3a048d8	feat: NamespaceSchemaGossip cache decorator This commit adds the NamespaceSchemaGossip type, a decorator of [`NamespaceCache`] implementations utilising peer gossiping to provide best-effort convergence of the local cache state. This decorator will sit in the NamespaceCache stack, allowing it to receive incoming schema gossip messages, and update the local cache through the regular NamespaceCache abstraction methods. This currently implements the message handlers only - no messages are sent yet!	2023-08-02 14:08:06 +02:00
Fraser Savage	33e4098cf8	perf(router): Add benchmark for additions to namespace schema cache This benchmark covers two axis of performance for calls to the namespace cache's `put_schema()` stack. These are the cost of adding varying numbers of new columns to an existing table in the namespace, as well as adding new tables with their own set of columns to an existing namespace.	2023-08-02 12:45:30 +01:00
Dom Dwyer	41c9604e46	feat(router): schema gossip skeleton Adds the supporting types required to integrate the generic gossip crate into a schema-specific broadcast primitive. This commit implements the two "halves": * GossipMessageDispatcher: async processing of incoming gossip msgs * Handle: the send-side handle for async sending of gossip msgs These types are responsible for converting into/from the serialised bytes sent over the gossip primitive into application-level / protobuf types.	2023-08-01 17:11:09 +02:00
Fraser Savage	df2c1850fb	refactor(router): Try to fix rustfmt having a nap	2023-08-01 14:51:20 +01:00
Fraser Savage	e643014900	docs(router): Fix typo in circuit breaker document comment	2023-08-01 14:46:17 +01:00
Fraser Savage	e4a5d2efaa	feat(router): Expose `num_probes` request count used to health-check ingesters as config option This allows routers to be configured to mark downstreams as healthy/ unhealthy with a requirement for the number of probe requests which can/must be collected to transition the health checkers circuit state to healthy/unhealthy.	2023-08-01 14:21:56 +01:00
Dom Dwyer	8da08fa574	feat(router): optionally enable gossip subsystem Allows the router to optionally enable and start the gossip subsystem (disabled by default). No code uses the gossip system, so no application-level messages are exchanged, but this allows the gossip subsystem to run and exchange control frames / perform discovery / etc.	2023-07-31 11:01:30 +02:00

1 2 3 4 5 ...

452 Commits (ac426fe5e1f7dab9d21b3f67fd0eae49b5356cdf)