Service limits are enforced on two values:
* Number of tables in a namespace
* Number of columns in a table
This commit labels the existing service limit hit metric with the type
of limit reached, and adds this information to the log lines emitted.
Ensure a "probe" node is always returned as the first candidate, driving
it to recovery faster.
This also includes a fix for the balancer metrics that would report
probe candidate nodes as healthy nodes.
Similar to https://github.com/influxdata/influxdb_iox/pull/6509, this
forces a constant re-querying of the DNS address of an ingester to drive
rediscovery.
Unlike the above PR, this only reconnects when there are errors
observed. This still isn't ideal - something is wrong with the discovery
itself - this just papers over it.
Adds a metric with a per-ingester label recording the current health
state of the upstream ingester from the perspective of the router
instance.
Also logs periodically when one or more ingesters are offline.
Lazily establish connections in the background, instead of using tonic's
connect_lazy().
connect_lazy() causes error handling to take a different path in tonic
compared to "normal" connections, and this stops reconnections from
being performed when the endpoint goes away (likely a bug).
It also means the first few write requests won't have to wait while the
connection is dialed, which brings down the P99 as a nice side-effect.
Adds on-path health checking / recording using the CircuitBreaker
construct, stopping requests to unhealthy upstreams (minus the probe
requests) until they recover.
This removes the horrible gRPC balancer hack I added to get us deployed
ASAP, and should eliminate latency spikes and elevated error responses
observed during deployments as a result.
Adds on-path health checking / recording using the CircuitBreaker
construct, stopping requests to unhealthy upstreams (minus the probe
requests) until they recover.
This removes the horrible gRPC balancer hack I added to get us deployed
ASAP, and should eliminate latency spikes and elevated error responses
observed during deployments as a result.
last_probe was "the instant at which the last set of probes started
being sent" in my head, but Carol saw it as "first_probe - the time at
which probes started being sent".
Hopefully probe_window_started_at is less ambiguous.
Implements a "circuit breaker", a construct that tracks the error &
success of requests to a remote node, and uses this information to allow
or deny further requests.
This circuit breaker stops sending requests to the remote when the error
count exceeds 80% of requests in a 5 second window. Once this happens,
up to 10 "probe" requests per second are allowed, and when they succeed,
normal operation resumes (though concurrent requests may still be
completing during the probe regime and are counted towards the probe
results).
In the happy path, this circuit breaker is very cheap (lock free; WFPO)
to evaluate and record request results in, minimising the throughput
penalty. Once the breaker enters an unhealthy state (hopefully a rare
occurrence) it uses a mutex to manage the probe state (with a higher
overhead) for simplicity; it's definitely possible to optimise this away
if high latencies are observed during upstream outages when the circuit
breaker is open/unhealthy.
The gRPC node discovery hack spawns a task that outlives the gRPC
balancer - once the balancer stops, the task should stop too (and not
panic sending on the closed channel).
The tonic / tower load-balance implementation discards failed nodes,
even when using a static list - this causes nodes that fail once to
never be retried.
This doesn't happen for the last node for some reason, and leads to all
the load from one router hitting a single ingester instead of load
balancing across all ingesters.
This commit adds a hack to constantly tell the load balancer to probe
all nodes, hopefully causing them to re-discover previously failed
nodes. I don't have the time to do this properly :(
Allow the routers to start up without requiring full availability of all
downstream ingesters. Previously a single unavailable ingester prevented
the routers from starting up.
This has downsides:
* Lazily initialising a connection will cause the first writes to have
higher latency as the connection is established.
* The routers MAY come up in a state that will never work (i.e. bad
ingester addresses)
* Using the opaque gRPC load balancing mechanism restricts the
visibility into which nodes are up/down (hindering useful log
messages) and prevents us from implementing more advanced circuit
breaking / probing logic / load-balancing strategies.
This change is a quick fix - it leaves the round-robin handler in place,
load-balancing over a single tonic Channel, which internally
load-balances. This will need cleaning up.
* feat: Add a feature flag to switch to the router RPC write path
Fixes#6242.
* refactor: Remove a weird arc clone/rename that's not needed
I'm sure this was needed at some point, but it doesn't make much sense.
I wasn't going to change this, but I'm now trying to minimize the
differences between this function and the write path init function, so
make this one better too.
* fix: Add the namespace autocreation to the RPC write path too
The topic/query pool don't really apply to this case, but use them
anyway to be able to use the existing catalog methods.
Also add a bunch of comments pointing out where the RPC write path
initializer and the old router's initializer are the same and where
they're different, so that perhaps it'll be easier to keep them in sync
while they both exist.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: remove unused/moved ns_autocreation dml handler
* feat(router): expose new ns retention as config
* fix: forgot to set default value for router retention arg
* chore: make new namespace retention param an option
This commit adds the (unused) RpcWrite implementation of the DmlHandler
trait that implements pushing a write over gRPC to a single, arbitrary
ingester. Requests are round-robin'ed across all available ingesters.
This DmlHandler implementation can be swapped out with the
ShardedWriteBuffer to change how writes are propagated to the ingester.
* feat: create namespace API call in router
Co-authored-by: Nga Tran <nga-tran@live.com>
* chore: treat retention as ns except in CLI
* fix: overflow in nanosecond calc
* fix: retention test after changing it from hours to ns
* chore: comment clarification in cli; better response type for error in ns API
* fix: correct some rebase mistakes
* chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const
* fix: ns autocreation test was wrong after rebase
* fix: mem catalog has default 1hr retention, accidently removed in rebase
* chore: remove mem catalogs default 1hr retention; make it settable in sets & router
Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: reject writes that are outside the retention period
* feat: add retention validator into handler stack
* chore: Apply suggestions from code review
Co-authored-by: Dom <dom@itsallbroken.com>
* refactor: address review comments
* test: unit tests fot retention validation
* chore: address review comments
* test: more unit tests and integration tests
* refactor: make time inside retention period for emphemeral_mode test
* fix: 2 hours
Co-authored-by: Dom <dom@itsallbroken.com>
* chore: move ns api from querier to router
* chore: add explanatory comment in querier about moved namespace API
* fix: add namespace service to router
* fix: querier returns unimplemented error for ns retention, not panic
* chore: reuse namespace -> proto in router ns api
* chore: grpc namespace - consume ns to avoid clone
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: Ensure router's HTTP error messages are stable
If you change the text of an error, the tests will fail.
If you add a new error variant to the `Error` enum but don't add it to
the test, test compilation will fail with a "non-exhaustive patterns"
message.
If you remove an error variant, test compilation will fail with a "no
variant named `RemovedError`" message.
You can get the list of error variants and their current text via
`cargo test -p router -- print_out_error_text --nocapture`.
A step towards accomplishing #5863
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
* fix: Remove optional commas and document macro arguments
* docs: Clarify the purpose of the tests the check_errors macro generates
* fix: Add tests for inner mutable batch LP error variants
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Expose the Table and Namespace IDs encoded within the serialised DML
write (added in #6036).
This makes the IDs available for use in the consumers, ending the
transition period. This commit DOES NOT remove the strings sent over the
wire.
* chore: delete metric duplicate character
* fix: failure ci test case
* fix: failure ci test case
* fix: failure ci test case
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: make namespace folder for all namesapce's commands
* feat: WIP for add command to set retention period
* feat: more on updating retention period
* feat: grpc for update namespace retention period
* test: end to end test fpr namespace retention
* fix: lint proto
* chore: cleanup
* chore: kick CI run again
* fix: command hierachy
* chore: fix comments
Changes the DmlDelete to contain the NamespaceId for which it should be
applied, propagating this value over the wire.
Like the existing IDs within the DmlWrite, these values are marked
unsafe to use due to avoid the consumers utilising them accidentally
during deployment. Unlike DmlWrite, the DmlDelete is completely unused,
so this is less of an issue.
This commit is part of a two-part change in order to add the table &
namespace IDs to the write buffer wire format. This commit forms the
first half; changing the producer to send the IDs.
In this commit the new ID values are never read on the consumer side,
ensuring there is no consumer dependency on them. This ensures they
remain operational during a rollout, where the consumer may be updated
to the latest code dependent on the IDs before the producer is updated
to send them. This also ensures we have a window of time where where the
consumers can be rolled back after being updated, and still handle
replaying messages in Kafka.
Changes the DML handler transformers to pass through the TableId once it
has been resolved during schema validation.
This value is collated by shard, and then unused. This collated TableId
map will be used in a follow-up PR.
This commit introduces a new (composable) trait; a NamespaceResolver is
an abstraction responsible for taking a string namespace from a user
request, and mapping to it's catalog ID.
This allows the NamespaceId to be injected through the DmlHandler chain
in addition to the namespace name.
As part of this change, the NamespaceAutocreation layer was changed from
an implementator of the DmlHandler trait, to a NamespaceResolver as it
is a more appropriate abstraction for the functionality it provides.
Changes the DmlWrite type to require a PartitionKey be specified,
instead of accepting an Option.
This requirement was already in place - the write buffer upheld an
invariant that all writes contained a partition key value (was not
"None") or it panicked at runtime when attempting to enqueue the write.
It is now possible to encode this invariant in the type system, which is
what this change does.
* fix: Avoid some allocations by collecting instead of inserting into a vec
* refactor: Encode that adding columns is for one table at a time
* test: Add another test of column limits
* test: Add below/above limit tests for create_or_get_many
* fix: Explicitly DO NOT check column limits when inserting many columns
* feat: Cache the max_columns_per_table on the NamespaceSchema
* feat: Add a function to validate column limits in-memory
* fix: Provide more useful information when over column limits
* fix: Swap types to remove intermediate allocation
* docs: Explain the interactions of the cache and the column limits
* test: Actually set up test that showcases column limit race condition
* fix: Allow writing to existing columns even if table is over column limit
Co-authored-by: Dom <dom@itsallbroken.com>
* refactor: Remove grpc WriteService
* fix: update end to end test
* fix: Update generated_types/protos/influxdata/pbdata/v1/influxdb_pb_data_protocol.proto
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Criterion comes with some extra cargo tooling called `cargo criterion`
which can be used instead of `cargo bench`. The advantage is that we
don't need to compile the entire reporting infrastructure into our
benchmarks. So let's embrace this separation of concerns.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Implements the ShardService to expose the shard mapping produced by
routers to external clients.
This impl uses an internal cache to eliminate unnecessary Catalog
queries, as the Kafka partition/Sequencer/Shard mapping is static once a
router has initialised.
This doesn't really need to be fallible but forces propagation of a ton
of error handling - no shards is always a sign of something being very
wrong, and can be caught in the caller if it's for some reason an
acceptable state / can be recovered from.
The Sequencer (which will be renamed shortly) is a type that represents
a single sequencer/shard/kafka partition in the router.
In order to minimise confusion with all the various IDs floating around,
we have a KafkaPartition - this commit changes the Sequencer to return
the Kafka partition index as a typed value, rather than a usize to help
eliminate any inconsistencies.
As a side effect of these conversion changes, I've tightened up the
casting to ensure we assert on any overflows - we juggle a lot of
numeric types!
Went through and remove all lazy_static uses with once_cell (while waiting for the project to compile). There are still dependencies using lazy_static so it is still in the crate graph but at least there isn't an explicit dependency on it (and it is easier to update to `std::lazy::Lazy` once that is stable).
Changes the kafka message wire format to include the partition key for
serialised DML writes on the wire.
After this commit, the kafka messages will contain the partition key for
each op, but this information will go unused in the ingester - this
enables us to roll out the producer side, before making the value's
presence necessary on the consumer side.
A follow-up PR will change the ingester to utilise this embedded
partition key.
This has the unfortunate side effect of making the partition key part of
the public gRPC write API:
https://github.com/influxdata/influxdb_iox/issues/4866
* feat: Log time spent requesting ingester partitions
Fixes#4558.
* feat: Record a metric for the duration queriers wait on ingesters
* fix: Use DurationHistogram instead of U64 Histogram
* test: Add a test for the ingester ms metric
* feat: Add back the logging to provide both logging and metrics for ingester duration
* refactor: Use sample_count method on metrics
* feat: Record ingester duration separately for success or failure
* fix: Create a separate test for the ingester metrics
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Changes the JumpHash sharder and modifies the ShardedWriteBuffer's
DmlHandler::delete() impl in order to enqueue delete ops across all
shards if no table name is specified.
If a table name is specified, it is sharded as before: a delete for a
given table & namespace always maps to the same shard as a write to the
same table & namespace.
These were found by iterating over all of the dependencies of each
Cargo.toml, then grepping that crate for the dependency's name. If it
didn't show up, I attempted to remove it.
I left a few dependencies that this process flagged:
* generated_types
- `pbjson`,`serde`. Apparently used by the generated code.
* grpc-router-test-gen
- `prost`. Apparently used by the generated code.
* influxdb_iox
- `heappy`. Doesn't appear used, but is behind enough feature
flags that I don't care to reason about and it's already optional.
- `tikv_jemalloc_sys`. Appears to be setting a feature flag of an
indirect dependency.
* iox_gitops_adapter
- `k8s_openapi`. Appears to be setting a feature flag of an indirect
dependency.
All features are now covered by rskafka. This also removes the need to
specify a server ID for write buffer consumers. This was only used for
rdkafka since there we needed to specify a consumer group, even though
we did not use any transactions.
* feat: Add db_name/namespace to DmlWrite and DmlDelete
This is required for the new ingester to be able to work with the write buffer. The protobuf that gets serialized over Kafka already includes the database name, it just wasn't getting carried through to the marshaled Dml operation.
* fix: database != namespace, propagation through write buffer
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Some crates listed features they don't use; other crates ware relying on
feature flags enabled by something else. I tested these changes by
disabling the workspace hack crate and testing each crate.
Move sharding implementation from router to `dml` (where the types can
be consumed w/o cloning the data) and only support the new sharding
configs.
The old sharding configs will be removed soon.
The direction was required when a database could read or write from/to a
write buffer. Now it is clear from the usage context of a write buffer
context which of the two applications is meant (databases read, routers
write) so the direction flag is no longer required.
IOx is the only consumer of this API so we might just use the serialized
form. Cloud2 uses the HTTP API which supports SQL-like predicates.
Fixes#3192.
* feat: migrate server to DbWrite (#2724)
* chore: print perf log output
* fix: don't suppress CI status code
* chore: review feedback
* fix: don't error on empty line protocol write payloads
* fix: test
* fix: test