influxdb

Commit Graph

Author	SHA1	Message	Date
Dom	f0d7ee59c3	Merge branch 'main' into dom/circuit-fuzz	2023-01-25 12:42:43 +00:00
Dom Dwyer	6eb1773ec0	perf(router): faster balancer node recovery Ensure a "probe" node is always returned as the first candidate, driving it to recovery faster. This also includes a fix for the balancer metrics that would report probe candidate nodes as healthy nodes.	2023-01-25 13:18:24 +01:00
Dom Dwyer	f5d4171be0	test: CircuitBreaker recovery property fuzz test Adds a multi-threaded fuzz test that ensures a circuit breaker can always transition to the healthy state, regardless of the sequence of events prior.	2023-01-25 11:53:57 +01:00
Luke Bond	caea42665b	Merge branch 'main' into dom/rpc-endpoint-metrics	2023-01-25 10:44:18 +11:00
Dom	442e8a8b79	Merge branch 'main' into dom/ingester-rediscovery	2023-01-24 19:13:02 +00:00
Dom Dwyer	411f4bd08b	fix(router): force rediscovery of nodes Similar to https://github.com/influxdata/influxdb_iox/pull/6509, this forces a constant re-querying of the DNS address of an ingester to drive rediscovery. Unlike the above PR, this only reconnects when there are errors observed. This still isn't ideal - something is wrong with the discovery itself - this just papers over it.	2023-01-24 20:11:53 +01:00
Dom Dwyer	9132343dac	feat(metrics): export RPC upstream health state Adds a metric with a per-ingester label recording the current health state of the upstream ingester from the perspective of the router instance. Also logs periodically when one or more ingesters are offline.	2023-01-24 19:27:15 +01:00
Dom Dwyer	f26b54beec	refactor(router): set sensible RPC timeouts Copies these over from the client_util package.	2023-01-24 19:22:27 +01:00
Dom Dwyer	87b553fe9d	feat: WARN logs w/ endpoint for unhealthy upstream Changes the DEBUG log event to a WARN now that it includes the endpoint to which the event applies.	2023-01-24 19:19:31 +01:00
Dom Dwyer	085de40127	feat: lazy-connect to ingester gRPC endpoints Lazily establish connections in the background, instead of using tonic's connect_lazy(). connect_lazy() causes error handling to take a different path in tonic compared to "normal" connections, and this stops reconnections from being performed when the endpoint goes away (likely a bug). It also means the first few write requests won't have to wait while the connection is dialed, which brings down the P99 as a nice side-effect.	2023-01-24 16:44:55 +01:00
Dom Dwyer	8215f4126e	test: router balancer recovery Ensure a recovering node is yielded from the balancer.	2023-01-24 15:30:01 +01:00
Dom Dwyer	c6d6c50fbf	perf(router): circuit break ingester connections Adds on-path health checking / recording using the CircuitBreaker construct, stopping requests to unhealthy upstreams (minus the probe requests) until they recover. This removes the horrible gRPC balancer hack I added to get us deployed ASAP, and should eliminate latency spikes and elevated error responses observed during deployments as a result.	2023-01-24 15:30:01 +01:00
Dom Dwyer	107006c801	revert: influxdata/dom/rpc-balancer This reverts commit `a3805dbccf`, reversing changes made to `bcb1232c5d`.	2023-01-24 14:47:05 +01:00
Dom Dwyer	b32662ebf2	test: router balancer recovery Ensure a recovering node is yielded from the balancer.	2023-01-24 13:38:36 +01:00
Dom Dwyer	7596dc0826	perf(router): circuit break ingester connections Adds on-path health checking / recording using the CircuitBreaker construct, stopping requests to unhealthy upstreams (minus the probe requests) until they recover. This removes the horrible gRPC balancer hack I added to get us deployed ASAP, and should eliminate latency spikes and elevated error responses observed during deployments as a result.	2023-01-24 12:38:27 +01:00
Dom Dwyer	c3a2ac3a0d	refactor: prevent div by 0 Preserve the error ratio calculation but prevent a div by 0 by ensuring the divisor is always at least 1.	2023-01-24 12:09:00 +01:00
Dom Dwyer	c4b04a16c5	refactor: rename last_probe instant last_probe was "the instant at which the last set of probes started being sent" in my head, but Carol saw it as "first_probe - the time at which probes started being sent". Hopefully probe_window_started_at is less ambiguous.	2023-01-24 12:08:10 +01:00
Dom Dwyer	2f3fb48091	docs: document error count floor Describe the floor on the number of errors that must be observed before the circuit breaker will consider switching to the unhealthy state.	2023-01-24 12:08:09 +01:00
Carol (Nichols \|\| Goulding)	caf8dc9032	fix: Rename incorrect usage of 'close' to 'unhealthy' in test helper	2023-01-23 16:08:00 -05:00
Carol (Nichols \|\| Goulding)	081b4f15da	docs: Clarify my understanding of the circuit breaker based on chat with Dom	2023-01-23 16:07:02 -05:00
Dom Dwyer	67b73d90dd	feat: low-overhead circuit breaker Implements a "circuit breaker", a construct that tracks the error & success of requests to a remote node, and uses this information to allow or deny further requests. This circuit breaker stops sending requests to the remote when the error count exceeds 80% of requests in a 5 second window. Once this happens, up to 10 "probe" requests per second are allowed, and when they succeed, normal operation resumes (though concurrent requests may still be completing during the probe regime and are counted towards the probe results). In the happy path, this circuit breaker is very cheap (lock free; WFPO) to evaluate and record request results in, minimising the throughput penalty. Once the breaker enters an unhealthy state (hopefully a rare occurrence) it uses a mutex to manage the probe state (with a higher overhead) for simplicity; it's definitely possible to optimise this away if high latencies are observed during upstream outages when the circuit breaker is open/unhealthy.	2023-01-23 13:55:12 +01:00
Dom Dwyer	6ef68513d9	fix: gRPC balancer shutdown panic The gRPC node discovery hack spawns a task that outlives the gRPC balancer - once the balancer stops, the task should stop too (and not panic sending on the closed channel).	2023-01-11 16:42:39 +01:00
Dom Dwyer	9ab86fa154	fix(router2): drive ingester node (re)-discovery The tonic / tower load-balance implementation discards failed nodes, even when using a static list - this causes nodes that fail once to never be retried. This doesn't happen for the last node for some reason, and leads to all the load from one router hitting a single ingester instead of load balancing across all ingesters. This commit adds a hack to constantly tell the load balancer to probe all nodes, hopefully causing them to re-discover previously failed nodes. I don't have the time to do this properly :(	2023-01-05 14:06:29 +01:00
Dom Dwyer	a5a26f5efb	fix(router2): lazily connect to ingesters Allow the routers to start up without requiring full availability of all downstream ingesters. Previously a single unavailable ingester prevented the routers from starting up. This has downsides: * Lazily initialising a connection will cause the first writes to have higher latency as the connection is established. * The routers MAY come up in a state that will never work (i.e. bad ingester addresses) * Using the opaque gRPC load balancing mechanism restricts the visibility into which nodes are up/down (hindering useful log messages) and prevents us from implementing more advanced circuit breaking / probing logic / load-balancing strategies. This change is a quick fix - it leaves the round-robin handler in place, load-balancing over a single tonic Channel, which internally load-balances. This will need cleaning up.	2023-01-05 11:25:35 +01:00
dependabot[bot]	8478d41bcb	chore(deps): Bump paste from 1.0.10 to 1.0.11 (#6430 ) Bumps [paste](https://github.com/dtolnay/paste) from 1.0.10 to 1.0.11. - [Release notes](https://github.com/dtolnay/paste/releases) - [Commits](https://github.com/dtolnay/paste/compare/1.0.10...1.0.11) --- updated-dependencies: - dependency-name: paste dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-19 10:31:05 +00:00
dependabot[bot]	7f2aa8b10c	chore(deps): Bump serde_json from 1.0.89 to 1.0.91 Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.89 to 1.0.91. - [Release notes](https://github.com/serde-rs/json/releases) - [Commits](https://github.com/serde-rs/json/compare/v1.0.89...v1.0.91) --- updated-dependencies: - dependency-name: serde_json dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-12-19 01:44:18 +00:00
dependabot[bot]	e108a8b6c9	chore(deps): Bump paste from 1.0.9 to 1.0.10 (#6384 ) Bumps [paste](https://github.com/dtolnay/paste) from 1.0.9 to 1.0.10. - [Release notes](https://github.com/dtolnay/paste/releases) - [Commits](https://github.com/dtolnay/paste/compare/1.0.9...1.0.10) --- updated-dependencies: - dependency-name: paste dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-12-13 06:03:05 +00:00
Luke Bond	551bb0ef6a	feat: allow enabling/disabling ns autocreation in router (#6346 ) * feat: allow enabling/disabling ns autocreation in router * fix: missed an import for something behind router2 compile flag	2022-12-07 16:12:00 +00:00
dependabot[bot]	1d38d400f0	chore(deps): Bump object_store from 0.5.1 to 0.5.2 (#6339 ) * chore(deps): Bump object_store from 0.5.1 to 0.5.2 Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.1 to 0.5.2. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.1...object_store_0.5.2) --- updated-dependencies: - dependency-name: object_store dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * chore: Run cargo hakari tasks Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-06 07:53:54 +00:00
Carol (Nichols \|\| Goulding)	a51848b361	fix: Use client_util GrpcConnection instead of tonic Channel (#6320 ) * fix: Use client_util GrpcConnection instead of tonic Channel * refactor: include server addr in error Co-authored-by: Dom <dom@itsallbroken.com>	2022-12-02 15:57:42 +00:00
Carol (Nichols \|\| Goulding)	c008219692	feat: Add a feature flag to switch to the router RPC write path (#6247 ) * feat: Add a feature flag to switch to the router RPC write path Fixes #6242. * refactor: Remove a weird arc clone/rename that's not needed I'm sure this was needed at some point, but it doesn't make much sense. I wasn't going to change this, but I'm now trying to minimize the differences between this function and the write path init function, so make this one better too. * fix: Add the namespace autocreation to the RPC write path too The topic/query pool don't really apply to this case, but use them anyway to be able to use the existing catalog methods. Also add a bunch of comments pointing out where the RPC write path initializer and the old router's initializer are the same and where they're different, so that perhaps it'll be easier to keep them in sync while they both exist. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-12-01 11:05:39 +00:00
Luke Bond	d07658282c	feat: add router config parameter for retention (#6278 ) * chore: remove unused/moved ns_autocreation dml handler * feat(router): expose new ns retention as config * fix: forgot to set default value for router retention arg * chore: make new namespace retention param an option	2022-11-30 13:14:39 +00:00
dependabot[bot]	caa595a6fc	chore(deps): Bump serde_json from 1.0.88 to 1.0.89 (#6203 ) Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.88 to 1.0.89. - [Release notes](https://github.com/serde-rs/json/releases) - [Commits](https://github.com/serde-rs/json/compare/v1.0.88...v1.0.89) --- updated-dependencies: - dependency-name: serde_json dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-22 09:28:31 +00:00
dependabot[bot]	04c00bbb62	chore(deps): Bump bytes from 1.2.1 to 1.3.0 (#6199 ) Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.2.1 to 1.3.0. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/bytes/commits) --- updated-dependencies: - dependency-name: bytes dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-22 08:23:24 +00:00
dependabot[bot]	52c50c16e1	chore(deps): Bump serde_json from 1.0.87 to 1.0.88 Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.87 to 1.0.88. - [Release notes](https://github.com/serde-rs/json/releases) - [Commits](https://github.com/serde-rs/json/compare/v1.0.87...v1.0.88) --- updated-dependencies: - dependency-name: serde_json dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2022-11-21 01:52:18 +00:00
Dom Dwyer	af78f0d5db	refactor: remove names from DML init Fixes conflicts introduced by #6170.	2022-11-18 17:16:33 +01:00
Dom Dwyer	72939f8bf0	feat(router): handler for direct write to ingester This commit adds the (unused) RpcWrite implementation of the DmlHandler trait that implements pushing a write over gRPC to a single, arbitrary ingester. Requests are round-robin'ed across all available ingesters. This DmlHandler implementation can be swapped out with the ShardedWriteBuffer to change how writes are propagated to the ingester.	2022-11-18 17:08:20 +01:00
Carol (Nichols \|\| Goulding)	02c3083192	fix: Remove table names from Dml operations	2022-11-18 10:40:38 -05:00
Carol (Nichols \|\| Goulding)	a225b81e59	docs: Clarify and make consistent schema validation type comments	2022-11-18 10:39:27 -05:00
Nga Tran	49a9565240	feat: gRPC that creates namespace (#6103 ) * feat: create namespace API call in router Co-authored-by: Nga Tran <nga-tran@live.com> * chore: treat retention as ns except in CLI * fix: overflow in nanosecond calc * fix: retention test after changing it from hours to ns * chore: comment clarification in cli; better response type for error in ns API * fix: correct some rebase mistakes * chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const * fix: ns autocreation test was wrong after rebase * fix: mem catalog has default 1hr retention, accidently removed in rebase * chore: remove mem catalogs default 1hr retention; make it settable in sets & router Co-authored-by: Luke Bond <luke.n.bond@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-18 13:02:12 +00:00
Nga Tran	6f7b1e2e26	feat: reject writes that are outside the retention period (#6148 ) * feat: reject writes that are outside the retention period * feat: add retention validator into handler stack * chore: Apply suggestions from code review Co-authored-by: Dom <dom@itsallbroken.com> * refactor: address review comments * test: unit tests fot retention validation * chore: address review comments * test: more unit tests and integration tests * refactor: make time inside retention period for emphemeral_mode test * fix: 2 hours Co-authored-by: Dom <dom@itsallbroken.com>	2022-11-17 20:55:58 +00:00
Dom	cd33f25d8a	Merge branch 'main' into dom/correct-comment	2022-11-16 15:42:47 +00:00
Luke Bond	9365d933f1	chore: router namespace api (#6151 ) * chore: move ns api from querier to router * chore: add explanatory comment in querier about moved namespace API * fix: add namespace service to router * fix: querier returns unimplemented error for ns retention, not panic * chore: reuse namespace -> proto in router ns api * chore: grpc namespace - consume ns to avoid clone Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-16 15:25:49 +00:00
Dom Dwyer	8c38911e8c	docs: remove redundant comment This comment remains by mistake - table_ids is now used.	2022-11-16 14:40:53 +01:00
Carol (Nichols \|\| Goulding)	3943faf998	fix: Remove namespace from DmlWrite and DmlDelete constructors	2022-11-14 16:46:04 -05:00
Carol (Nichols \|\| Goulding)	f78195f7c7	fix: Remove namespace name field from DmlWrite and DmlDelete But leave the argument in their constructors for now. Not all numbers in tests can be 42, Dom.	2022-11-14 16:46:04 -05:00
dependabot[bot]	a969754819	chore(deps): Bump chrono from 0.4.22 to 0.4.23 (#6129 ) * chore(deps): Bump chrono from 0.4.22 to 0.4.23 Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.22 to 0.4.23. - [Release notes](https://github.com/chronotope/chrono/releases) - [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md) - [Commits](https://github.com/chronotope/chrono/compare/v0.4.22...v0.4.23) --- updated-dependencies: - dependency-name: chrono dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * refactor: chrono future compat Integer->timstamp conversions should not silently panic. Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Marco Neumann <marco@crepererum.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-11-14 13:34:09 +00:00
kodiakhq[bot]	05d7d1495e	Merge branch 'main' into dependabot/cargo/hashbrown-0.13.1	2022-11-11 21:26:40 +00:00
Carol (Nichols \|\| Goulding)	d965004e52	fix: Rename DmlError::DatabaseNotFound to NamespaceNotFound	2022-11-11 15:46:05 -05:00
Carol (Nichols \|\| Goulding)	bdff4e8848	fix: Consistently use 'namespace' instead of 'database' in comments and other internal text	2022-11-11 15:46:04 -05:00

1 2 3 4

189 Commits (7b69c84ceb5894e0a51f35ec6a8bb79a27c93533)