The gRPC node discovery hack spawns a task that outlives the gRPC
balancer - once the balancer stops, the task should stop too (and not
panic sending on the closed channel).
The tonic / tower load-balance implementation discards failed nodes,
even when using a static list - this causes nodes that fail once to
never be retried.
This doesn't happen for the last node for some reason, and leads to all
the load from one router hitting a single ingester instead of load
balancing across all ingesters.
This commit adds a hack to constantly tell the load balancer to probe
all nodes, hopefully causing them to re-discover previously failed
nodes. I don't have the time to do this properly :(
Allow the routers to start up without requiring full availability of all
downstream ingesters. Previously a single unavailable ingester prevented
the routers from starting up.
This has downsides:
* Lazily initialising a connection will cause the first writes to have
higher latency as the connection is established.
* The routers MAY come up in a state that will never work (i.e. bad
ingester addresses)
* Using the opaque gRPC load balancing mechanism restricts the
visibility into which nodes are up/down (hindering useful log
messages) and prevents us from implementing more advanced circuit
breaking / probing logic / load-balancing strategies.
This change is a quick fix - it leaves the round-robin handler in place,
load-balancing over a single tonic Channel, which internally
load-balances. This will need cleaning up.
* feat: Add a feature flag to switch to the router RPC write path
Fixes#6242.
* refactor: Remove a weird arc clone/rename that's not needed
I'm sure this was needed at some point, but it doesn't make much sense.
I wasn't going to change this, but I'm now trying to minimize the
differences between this function and the write path init function, so
make this one better too.
* fix: Add the namespace autocreation to the RPC write path too
The topic/query pool don't really apply to this case, but use them
anyway to be able to use the existing catalog methods.
Also add a bunch of comments pointing out where the RPC write path
initializer and the old router's initializer are the same and where
they're different, so that perhaps it'll be easier to keep them in sync
while they both exist.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: remove unused/moved ns_autocreation dml handler
* feat(router): expose new ns retention as config
* fix: forgot to set default value for router retention arg
* chore: make new namespace retention param an option
This commit adds the (unused) RpcWrite implementation of the DmlHandler
trait that implements pushing a write over gRPC to a single, arbitrary
ingester. Requests are round-robin'ed across all available ingesters.
This DmlHandler implementation can be swapped out with the
ShardedWriteBuffer to change how writes are propagated to the ingester.
* feat: create namespace API call in router
Co-authored-by: Nga Tran <nga-tran@live.com>
* chore: treat retention as ns except in CLI
* fix: overflow in nanosecond calc
* fix: retention test after changing it from hours to ns
* chore: comment clarification in cli; better response type for error in ns API
* fix: correct some rebase mistakes
* chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const
* fix: ns autocreation test was wrong after rebase
* fix: mem catalog has default 1hr retention, accidently removed in rebase
* chore: remove mem catalogs default 1hr retention; make it settable in sets & router
Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: reject writes that are outside the retention period
* feat: add retention validator into handler stack
* chore: Apply suggestions from code review
Co-authored-by: Dom <dom@itsallbroken.com>
* refactor: address review comments
* test: unit tests fot retention validation
* chore: address review comments
* test: more unit tests and integration tests
* refactor: make time inside retention period for emphemeral_mode test
* fix: 2 hours
Co-authored-by: Dom <dom@itsallbroken.com>
* chore: move ns api from querier to router
* chore: add explanatory comment in querier about moved namespace API
* fix: add namespace service to router
* fix: querier returns unimplemented error for ns retention, not panic
* chore: reuse namespace -> proto in router ns api
* chore: grpc namespace - consume ns to avoid clone
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: Ensure router's HTTP error messages are stable
If you change the text of an error, the tests will fail.
If you add a new error variant to the `Error` enum but don't add it to
the test, test compilation will fail with a "non-exhaustive patterns"
message.
If you remove an error variant, test compilation will fail with a "no
variant named `RemovedError`" message.
You can get the list of error variants and their current text via
`cargo test -p router -- print_out_error_text --nocapture`.
A step towards accomplishing #5863
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
* fix: Remove optional commas and document macro arguments
* docs: Clarify the purpose of the tests the check_errors macro generates
* fix: Add tests for inner mutable batch LP error variants
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Expose the Table and Namespace IDs encoded within the serialised DML
write (added in #6036).
This makes the IDs available for use in the consumers, ending the
transition period. This commit DOES NOT remove the strings sent over the
wire.
* chore: delete metric duplicate character
* fix: failure ci test case
* fix: failure ci test case
* fix: failure ci test case
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: make namespace folder for all namesapce's commands
* feat: WIP for add command to set retention period
* feat: more on updating retention period
* feat: grpc for update namespace retention period
* test: end to end test fpr namespace retention
* fix: lint proto
* chore: cleanup
* chore: kick CI run again
* fix: command hierachy
* chore: fix comments
Changes the DmlDelete to contain the NamespaceId for which it should be
applied, propagating this value over the wire.
Like the existing IDs within the DmlWrite, these values are marked
unsafe to use due to avoid the consumers utilising them accidentally
during deployment. Unlike DmlWrite, the DmlDelete is completely unused,
so this is less of an issue.
This commit is part of a two-part change in order to add the table &
namespace IDs to the write buffer wire format. This commit forms the
first half; changing the producer to send the IDs.
In this commit the new ID values are never read on the consumer side,
ensuring there is no consumer dependency on them. This ensures they
remain operational during a rollout, where the consumer may be updated
to the latest code dependent on the IDs before the producer is updated
to send them. This also ensures we have a window of time where where the
consumers can be rolled back after being updated, and still handle
replaying messages in Kafka.
Changes the DML handler transformers to pass through the TableId once it
has been resolved during schema validation.
This value is collated by shard, and then unused. This collated TableId
map will be used in a follow-up PR.
This commit introduces a new (composable) trait; a NamespaceResolver is
an abstraction responsible for taking a string namespace from a user
request, and mapping to it's catalog ID.
This allows the NamespaceId to be injected through the DmlHandler chain
in addition to the namespace name.
As part of this change, the NamespaceAutocreation layer was changed from
an implementator of the DmlHandler trait, to a NamespaceResolver as it
is a more appropriate abstraction for the functionality it provides.