The NamespaceResolver was using its own very similar look-aside caching
to the DML handlers, this commit leverages the read-through cache
implementation to deduplicate more code and makes the read through
behavioural expectation explicit for namespace autocreation.
This removes the look-aside cache from the retention_validation
and schema_validation DML handlers, instead setting up the new
NamespaceCache decorator and using that to handle cache misses.
This commit refactors the NamespaceCache trait to return a result
instead of an option for calls to `get_schema()`, allowing callers and
decorators to differentiate between cache misses, namespaces not
existing and transient I/O errors. This allows implementations to
interact with backend catalog storage.
In order to implement a read-through NamespaceCache
decorator the `get_cache()` call will need to interact
with async catalog methods, so this allows implementations
to call await within the `get_cache()` body.
Part of the wider effort to consistently use tht term "database"
for the user-facing terminology, update the authorization system.
Whilst this system is technically user-facing, it is unlikely many
users will see it. It is however new enough that the change is
relatively little effort.
Adds tests that drive the multi-tenant HTTP write request parser.
Note that in addition to these unit tests, there's still considerable
integration converge of the HTTP write endpoint in multi-tenant mode in
http.rs (test_write_handler!) that asserts the system is unchanged in
the "default" run-mode of multi-tenant.
Adds a single-tenant mode (CST) to the IOx routers.
Single-tenancy mode differs in two main ways:
* V1 write endpoint is partially supported
* V2 write endpoint ignores "org" parameter
The "normal" mode is "multi tenant" which is the default operational
mode, and all existing behaviour remains unchanged. Single tenant mode
can be enabled by specifying INFLUXDB_IOX_SINGLE_TENANCY=true.
Request parsing is delegated to two implementations of the
WriteParamExtractor trait, one each for CST and MT - the logic of each
"mode" is defined within these files and all other functionality is
common between the two.
This commit also renames some of the error types for clarity
(NoSpecified -> NoOrgBucketSpecified, other NotSpecified ->
NoQueryParams, etc).
Note: single tenant code requires testing
This commit also cleans up the code formatting for the gRPC handler and
simplifies some of the gRPC handler tests for the new update service
limit API.
This adds additional testing coverage for updates to service protection
limits to a namespace, and how they affect subsequent writes that
exceed the limits.
There was a mix of different ways of returning errors - this commit
unifies them, adds some documentation to the returned errors, and
removes the capitalisation.
Errors should be lower-case so they compose nicely like this:
"something failed: super important error: inner error"
rather than:
"something failed: Super important error: Inner error"
Changes the org/bucket to NamespaceName calls to move the values into
the constructor, allowing it to reuse them if they do not require
encoding (the common case) instead of forcing them to be cloned to
obtain a 'static NamespaceName.
* feat(service_grpc_flight): optional query authorization
Add support for requiriing namespace-level authorization for
arrow flight based query requests. These are the flight SQL commands
as well as the IOx-specific SQL over flight and InfluxQL over flight
protocols.
Supports the optional configuration of an authorization sidecar,
in the same manner as is used in the router. If this is configured
then all arrow flight gRPC requests that are implemented will require
a valid authorization token to be supplied in the request. For a
multi-legged operation such as GetFlightInfo + DoGet required for
FlightSQL then a valid authorization is required for every request.
Ideally this support would be implemented using some sort of
interceptor, however the namespace isn't known until the request
processing has been started. The authorization check is performed
as soon as possible once the desired operation is known.
The legacy "storage" API has no authorization checks. Care should
be taken to ensure this API is never exposed to an untrusted network.
* chore(service_grpc_flight): review suggestions
Implement some suggestions from reviewers. The main change is adding
authorization checks to the handshake command.
* chore(service_grpc_flight): remove authorization of handshake
The Handshake call is used by existing clients to verify the
connection. These clients do not send a namespace header with the
request meaning there is nothing to authorize against. Remove this
authorization for now to avoid breaking existing clients.
* refactor: implement Authorizer trait on Option
Based on a suggestion from Dom implement the Authorizer trait on
Option<T: Authorizer> so that the call sites no longer need to check
if an authorizer is configured. This simplifies the code at the
call sites.
To maximise the utility the signature has changed so that a optional
token is now used. When no authorizer is configured this will not
be looked at. When a token is required a new error will be returned
if no token was supplied.
* fix: suggestions from clippy
* test: Add an e2e test for write replication
* fix: Pass through rpc_write_replicas configuration to RpcWrite handler
---------
Co-authored-by: Dom <dom@itsallbroken.com>
This commit implements replication for the router's RpcWrite handler.
The desired number of replica copies is specified at startup time, and
each user write will be fanned-out with the specified replication factor
(replicas + 1).
A failure to write to any upstreams returns the write error, but a
failure to obtain enough ACKs (enough successful writes) after at least
1 ACK will return a "partial write" error - this differentiation is
important, as the user's write will be readable after a partial write
error has occurred.
This currently writes to upstreams serially; this is clearly an
opportunity for improvement! A follow-on PR will parallelise writes
across the desired number of replicas while maintaining the "at most one
ack'd write to one host" invariant.
Note that replication is currently hard-coded as disabled.
Whenever an RPC write to an upstream ingester fails, it is retried after
an increasing delay, until the RPC_TIMEOUT is hit. Because of this, any
RPC write error would be returned as a "timeout", masking the underling
reason the write actually failed.
This commit pushes down the timeout logic, and retains the most recently
observed RPC write error, returning it to the user instead of the
timeout error.
This PR changes the RpcWrite DmlHandler to facilitate testing using
mocked RPC client circuit breaker states, and assert the error response
for no healthy upstreams.
This simplifies the semantics of upstream RPC endpoint snapshots; an
empty snapshot will never be returned, and therefore if the caller has a
snapshot, it contains at least one endpoint to attempt an RPC call
against.
This moves the "no upstream" error handling from a write-time error, to
a pre-write error.
Most of the test cases set up a context with or without namespace
autocreation, while few set a retention period. This removes the
requirement to pass a retention period when enabling autocreation
or vice versa.
Using the builder pattern enables a more ergonomic workflow
for setting optional namespace creation semantics when constructing
the TestContext for the router.
Changes the router's RPC balancer to return a iterator of elements
starting from the given offset, that can remove elements from the
infinite/cycling iterator to prevent them from being yielded again.
* feat(authz): add authorization client.
Add a new authz crate to provide the interface for making authorization
checks from within IOx. This includes the default client that uses
the influxdata.iox.authz.v1 gRPC protocol. This feature is not used
by any IOx component yet.
* feat: optional authorization on write path
Support optionally enabling authorization checks on the /api/v2/write
handler. If an authrorizer is configured then the handler will
attempt to retrieve a token from the request's Authorization header.
If no such token exists then a response with a 401 error code is
returned. If the token is not valid, or does not have write permission
for the requested namespace then a response with a 403 error is
returned.
* chore: add unit test for authz in write handler
Add unit tests that test the correct functioning of the /api/v2/write
handler when an Authorizer is configured.
* chore(authz): use lazy connection
Change the initialization of the authz client to use a lazy connection.
This allows the client to be initialised synchronously.
* chore: Run cargo hakari tasks
* fix(authz): protolint complaints
* fix: authz tests
* fix: benches and lint
* chore: Update clap_blocks/src/authz.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* chore: Update authz/src/lib.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* chore: Update clap_blocks/src/authz.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* chore: review suggestions
* chore: review suggestions
Apply a number of suggestions from review comments. The main
behavioural change is that if the authz service is configured
applictions will perform a probe request to ensure it can communicate
before continuing startup.
* chore: Update router/src/server/http.rs
Co-authored-by: Dom <dom@itsallbroken.com>
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
Co-authored-by: Dom <dom@itsallbroken.com>
This implements the RPC "delete_namespace" handler, allowing a namespace
to be soft-deleted.
Adds unit coverage for all handlers & e2e test coverage for the new
handler (the rest were already covered).
The tests also highlight the caching issue documented here:
https://github.com/influxdata/influxdb_iox/issues/6175
This commit adds initial support for "soft" namespace deletion, where
the actual records & data remain, but are no longer queryable /
writeable.
Soft deletion is eventually consistent - users can expect to continue
writing to and reading from a bucket after issuing a soft delete call,
until the various components either restart, or have their caches
flushed.
The components treat soft-deleted namespaces differently:
* router: ignore soft deleted namespaces
* ingester: accept soft deleted namespaces
* compactor: accept soft deleted namespaces
* querier: ignore soft deleted namespaces
* various gRPC services: ignore soft deleted namespaces
This ensures that the ingester & compactor do not see rows "vanishing"
from the database, and continue to make forward progress.
Writes for the deleted namespace that are buffered in the ingester will
be persisted as normal, allowing us to support "un-delete" operations
where the system is restored to a the state at which the delete was
issued (rather than loosing the buffered data).
Follow-on work is required to ensure GC drops the orphaned parquet files
after the configured GC time, and optimisations such as not compacting
parquet from soft-deleted namespaces seems like a trivial win.
Use the namespace schema cache in the router to enforce the
per-namespace table limit (service protection limit), adding O(1)
overhead to the existing column limit evaluation logic.
Prior to this commit, each request that would breach the table limit
would be (potentially partially) applied to the catalog and return an
error. Every subsequent request creating a new table continued to cause
a catalog query, unnecessarily adding load proportional to request
counts.
After this commit, catalog requests are sent when the router instance
can determine (to the best of it's ability, see below) that the request
will not cause the namespace to exceed the table limit.
Because this uses cached schemas, the actual state set of tables may
have changed - this will cause inconsistent enforcement and spurious
errors in the same way it currently does for the column limit. For more
details (and to track a resolution) see:
https://github.com/influxdata/influxdb_iox/issues/5957
The maximum number of tables is part of the Namespace, which is already
loaded in its entirety. This commit copies the value into the
NamespaceSchema, making it available for the router to utilise.
Envoy will connect to an endpoint on demand, and return an
application-level error if it fails with a gRPC status code of
"Unavailable".
It also embeds a metadata entry of {"server": "envoy"} - this commit
uses the two signals (error status code + metadata entry) to drive an
immediate reconnection when observed, assuming the connection is bad.
Prior to this commit, namespaces that had been created on one router
could not be used on another router until the latter was restarted.
Effectively, newly created namespaces couldn't be used.
After this commit, the catalog is also checked when a cache miss occurs,
ensuring the router discovers new, not-yet-cached namespaces.
Service limits are enforced on two values:
* Number of tables in a namespace
* Number of columns in a table
This commit labels the existing service limit hit metric with the type
of limit reached, and adds this information to the log lines emitted.
Ensure a "probe" node is always returned as the first candidate, driving
it to recovery faster.
This also includes a fix for the balancer metrics that would report
probe candidate nodes as healthy nodes.
Similar to https://github.com/influxdata/influxdb_iox/pull/6509, this
forces a constant re-querying of the DNS address of an ingester to drive
rediscovery.
Unlike the above PR, this only reconnects when there are errors
observed. This still isn't ideal - something is wrong with the discovery
itself - this just papers over it.
Adds a metric with a per-ingester label recording the current health
state of the upstream ingester from the perspective of the router
instance.
Also logs periodically when one or more ingesters are offline.
Lazily establish connections in the background, instead of using tonic's
connect_lazy().
connect_lazy() causes error handling to take a different path in tonic
compared to "normal" connections, and this stops reconnections from
being performed when the endpoint goes away (likely a bug).
It also means the first few write requests won't have to wait while the
connection is dialed, which brings down the P99 as a nice side-effect.
Adds on-path health checking / recording using the CircuitBreaker
construct, stopping requests to unhealthy upstreams (minus the probe
requests) until they recover.
This removes the horrible gRPC balancer hack I added to get us deployed
ASAP, and should eliminate latency spikes and elevated error responses
observed during deployments as a result.
Adds on-path health checking / recording using the CircuitBreaker
construct, stopping requests to unhealthy upstreams (minus the probe
requests) until they recover.
This removes the horrible gRPC balancer hack I added to get us deployed
ASAP, and should eliminate latency spikes and elevated error responses
observed during deployments as a result.
last_probe was "the instant at which the last set of probes started
being sent" in my head, but Carol saw it as "first_probe - the time at
which probes started being sent".
Hopefully probe_window_started_at is less ambiguous.
Implements a "circuit breaker", a construct that tracks the error &
success of requests to a remote node, and uses this information to allow
or deny further requests.
This circuit breaker stops sending requests to the remote when the error
count exceeds 80% of requests in a 5 second window. Once this happens,
up to 10 "probe" requests per second are allowed, and when they succeed,
normal operation resumes (though concurrent requests may still be
completing during the probe regime and are counted towards the probe
results).
In the happy path, this circuit breaker is very cheap (lock free; WFPO)
to evaluate and record request results in, minimising the throughput
penalty. Once the breaker enters an unhealthy state (hopefully a rare
occurrence) it uses a mutex to manage the probe state (with a higher
overhead) for simplicity; it's definitely possible to optimise this away
if high latencies are observed during upstream outages when the circuit
breaker is open/unhealthy.
The gRPC node discovery hack spawns a task that outlives the gRPC
balancer - once the balancer stops, the task should stop too (and not
panic sending on the closed channel).
The tonic / tower load-balance implementation discards failed nodes,
even when using a static list - this causes nodes that fail once to
never be retried.
This doesn't happen for the last node for some reason, and leads to all
the load from one router hitting a single ingester instead of load
balancing across all ingesters.
This commit adds a hack to constantly tell the load balancer to probe
all nodes, hopefully causing them to re-discover previously failed
nodes. I don't have the time to do this properly :(
Allow the routers to start up without requiring full availability of all
downstream ingesters. Previously a single unavailable ingester prevented
the routers from starting up.
This has downsides:
* Lazily initialising a connection will cause the first writes to have
higher latency as the connection is established.
* The routers MAY come up in a state that will never work (i.e. bad
ingester addresses)
* Using the opaque gRPC load balancing mechanism restricts the
visibility into which nodes are up/down (hindering useful log
messages) and prevents us from implementing more advanced circuit
breaking / probing logic / load-balancing strategies.
This change is a quick fix - it leaves the round-robin handler in place,
load-balancing over a single tonic Channel, which internally
load-balances. This will need cleaning up.
* feat: Add a feature flag to switch to the router RPC write path
Fixes#6242.
* refactor: Remove a weird arc clone/rename that's not needed
I'm sure this was needed at some point, but it doesn't make much sense.
I wasn't going to change this, but I'm now trying to minimize the
differences between this function and the write path init function, so
make this one better too.
* fix: Add the namespace autocreation to the RPC write path too
The topic/query pool don't really apply to this case, but use them
anyway to be able to use the existing catalog methods.
Also add a bunch of comments pointing out where the RPC write path
initializer and the old router's initializer are the same and where
they're different, so that perhaps it'll be easier to keep them in sync
while they both exist.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: remove unused/moved ns_autocreation dml handler
* feat(router): expose new ns retention as config
* fix: forgot to set default value for router retention arg
* chore: make new namespace retention param an option
This commit adds the (unused) RpcWrite implementation of the DmlHandler
trait that implements pushing a write over gRPC to a single, arbitrary
ingester. Requests are round-robin'ed across all available ingesters.
This DmlHandler implementation can be swapped out with the
ShardedWriteBuffer to change how writes are propagated to the ingester.
* feat: create namespace API call in router
Co-authored-by: Nga Tran <nga-tran@live.com>
* chore: treat retention as ns except in CLI
* fix: overflow in nanosecond calc
* fix: retention test after changing it from hours to ns
* chore: comment clarification in cli; better response type for error in ns API
* fix: correct some rebase mistakes
* chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const
* fix: ns autocreation test was wrong after rebase
* fix: mem catalog has default 1hr retention, accidently removed in rebase
* chore: remove mem catalogs default 1hr retention; make it settable in sets & router
Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: reject writes that are outside the retention period
* feat: add retention validator into handler stack
* chore: Apply suggestions from code review
Co-authored-by: Dom <dom@itsallbroken.com>
* refactor: address review comments
* test: unit tests fot retention validation
* chore: address review comments
* test: more unit tests and integration tests
* refactor: make time inside retention period for emphemeral_mode test
* fix: 2 hours
Co-authored-by: Dom <dom@itsallbroken.com>
* chore: move ns api from querier to router
* chore: add explanatory comment in querier about moved namespace API
* fix: add namespace service to router
* fix: querier returns unimplemented error for ns retention, not panic
* chore: reuse namespace -> proto in router ns api
* chore: grpc namespace - consume ns to avoid clone
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>