* feat(idpe-17265): authorization should occur as part of the single_tenant specific mod
* authz service is accessed only through the single_tenant mod handler
* authz service is wrapped in auth mod
* move auth integration test into auth mod
* push down the authorize() call into the query params parser call, in order to access query params in the extract_token
* provide configuration error when authz or single_tenant mode are not co-presented
* update authz e2e fixtures
* feat(idpe-17265): extract tokens based upon preferred ordering in spec, and write tests to verify behavior.
* chore(idpe-17265): update naming conventions for a unifying parser
* test: make MockAuthorizer have default, and add a test_delegate_to_authz for CST
* chore: record authz duration metric, and include in delegation test.
* chore: use authz terminology instead of auth_service
* chore: more explicit naming
* Revert "chore: record authz duration metric, and include in delegation test."
This reverts commit 05c36888ca7247b6953343d759a5185098fae679.
* refactor: extract_header_token versus the else condition
* refactor: make single_tenant mod and move auth within
* chore: make unreachable explicitly panic in the build
* test: make token values be const, to be consumed when MockAuthorizer is used
* test: use locking for calls_counter in test
* fix: add base64 encoding as expected for Basic header
* fix: merge conflict resolution. The AuthorizationHeaderExtension is now under the authz::http mod, which is a required feature for router package.
* chore: run rustfmt nightly with preferred import handling, on files with modified imports
* chore: code cleanup, to have minimal code needed
Provide a configuration item for the router (in RPC mode) that controls
the maximum outgoing RPC message size when communicating with an
Ingester.
Raises the maximum from the default 4MiB to 100MiB. This does not
increase exposure to memory-based DOS, as writes are size-limited by the
HTTP layer to 10MiB, preventing a user from submitting a write this
large (or larger!) across the RPC boundary.
The NamespaceResolver was using its own very similar look-aside caching
to the DML handlers, this commit leverages the read-through cache
implementation to deduplicate more code and makes the read through
behavioural expectation explicit for namespace autocreation.
This removes the look-aside cache from the retention_validation
and schema_validation DML handlers, instead setting up the new
NamespaceCache decorator and using that to handle cache misses.
In order to implement a read-through NamespaceCache
decorator the `get_cache()` call will need to interact
with async catalog methods, so this allows implementations
to call await within the `get_cache()` body.
Adds a single-tenant mode (CST) to the IOx routers.
Single-tenancy mode differs in two main ways:
* V1 write endpoint is partially supported
* V2 write endpoint ignores "org" parameter
The "normal" mode is "multi tenant" which is the default operational
mode, and all existing behaviour remains unchanged. Single tenant mode
can be enabled by specifying INFLUXDB_IOX_SINGLE_TENANCY=true.
Request parsing is delegated to two implementations of the
WriteParamExtractor trait, one each for CST and MT - the logic of each
"mode" is defined within these files and all other functionality is
common between the two.
This commit also renames some of the error types for clarity
(NoSpecified -> NoOrgBucketSpecified, other NotSpecified ->
NoQueryParams, etc).
Note: single tenant code requires testing
* test: Add an e2e test for write replication
* fix: Pass through rpc_write_replicas configuration to RpcWrite handler
---------
Co-authored-by: Dom <dom@itsallbroken.com>
This commit implements replication for the router's RpcWrite handler.
The desired number of replica copies is specified at startup time, and
each user write will be fanned-out with the specified replication factor
(replicas + 1).
A failure to write to any upstreams returns the write error, but a
failure to obtain enough ACKs (enough successful writes) after at least
1 ACK will return a "partial write" error - this differentiation is
important, as the user's write will be readable after a partial write
error has occurred.
This currently writes to upstreams serially; this is clearly an
opportunity for improvement! A follow-on PR will parallelise writes
across the desired number of replicas while maintaining the "at most one
ack'd write to one host" invariant.
Note that replication is currently hard-coded as disabled.
* feat(authz): add authorization client.
Add a new authz crate to provide the interface for making authorization
checks from within IOx. This includes the default client that uses
the influxdata.iox.authz.v1 gRPC protocol. This feature is not used
by any IOx component yet.
* feat: optional authorization on write path
Support optionally enabling authorization checks on the /api/v2/write
handler. If an authrorizer is configured then the handler will
attempt to retrieve a token from the request's Authorization header.
If no such token exists then a response with a 401 error code is
returned. If the token is not valid, or does not have write permission
for the requested namespace then a response with a 403 error is
returned.
* chore: add unit test for authz in write handler
Add unit tests that test the correct functioning of the /api/v2/write
handler when an Authorizer is configured.
* chore(authz): use lazy connection
Change the initialization of the authz client to use a lazy connection.
This allows the client to be initialised synchronously.
* chore: Run cargo hakari tasks
* fix(authz): protolint complaints
* fix: authz tests
* fix: benches and lint
* chore: Update clap_blocks/src/authz.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* chore: Update authz/src/lib.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* chore: Update clap_blocks/src/authz.rs
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* chore: review suggestions
* chore: review suggestions
Apply a number of suggestions from review comments. The main
behavioural change is that if the authz service is configured
applictions will perform a probe request to ensure it can communicate
before continuing startup.
* chore: Update router/src/server/http.rs
Co-authored-by: Dom <dom@itsallbroken.com>
---------
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
Co-authored-by: Dom <dom@itsallbroken.com>
Fixes#6418.
Makes sure the querier, the router, and the ingest replica CLI all
accept and validate ingester addresses the same, except whether or not
at least one value is required.
Adds a metric with a per-ingester label recording the current health
state of the upstream ingester from the perspective of the router
instance.
Also logs periodically when one or more ingesters are offline.
Lazily establish connections in the background, instead of using tonic's
connect_lazy().
connect_lazy() causes error handling to take a different path in tonic
compared to "normal" connections, and this stops reconnections from
being performed when the endpoint goes away (likely a bug).
It also means the first few write requests won't have to wait while the
connection is dialed, which brings down the P99 as a nice side-effect.
Adds on-path health checking / recording using the CircuitBreaker
construct, stopping requests to unhealthy upstreams (minus the probe
requests) until they recover.
This removes the horrible gRPC balancer hack I added to get us deployed
ASAP, and should eliminate latency spikes and elevated error responses
observed during deployments as a result.
Adds on-path health checking / recording using the CircuitBreaker
construct, stopping requests to unhealthy upstreams (minus the probe
requests) until they recover.
This removes the horrible gRPC balancer hack I added to get us deployed
ASAP, and should eliminate latency spikes and elevated error responses
observed during deployments as a result.
Prior to this commit, the (happy path) shutdown sequence of an IOx
process was hard coded to:
1. Stop gRPC & HTTP servers
2. Stop backend server (i.e. ingester2)
After this commit, the execution of step 1 is delegated to the handler
for step 2; the server implementation (router / ingester / querier /
etc) now chooses when to shut down the RPC & HTTP servers.
This allows the server shutdown delegate to correctly sequence the
shutdown of all components of the IOx server. This allows ingester2 to
correctly sequence the shutdown of the query RPC server w.r.t the
graceful stop & persist, ensuring queries continue to be serviced.
Allow the routers to start up without requiring full availability of all
downstream ingesters. Previously a single unavailable ingester prevented
the routers from starting up.
This has downsides:
* Lazily initialising a connection will cause the first writes to have
higher latency as the connection is established.
* The routers MAY come up in a state that will never work (i.e. bad
ingester addresses)
* Using the opaque gRPC load balancing mechanism restricts the
visibility into which nodes are up/down (hindering useful log
messages) and prevents us from implementing more advanced circuit
breaking / probing logic / load-balancing strategies.
This change is a quick fix - it leaves the round-robin handler in place,
load-balancing over a single tonic Channel, which internally
load-balances. This will need cleaning up.
* feat: Add a feature flag to switch to the router RPC write path
Fixes#6242.
* refactor: Remove a weird arc clone/rename that's not needed
I'm sure this was needed at some point, but it doesn't make much sense.
I wasn't going to change this, but I'm now trying to minimize the
differences between this function and the write path init function, so
make this one better too.
* fix: Add the namespace autocreation to the RPC write path too
The topic/query pool don't really apply to this case, but use them
anyway to be able to use the existing catalog methods.
Also add a bunch of comments pointing out where the RPC write path
initializer and the old router's initializer are the same and where
they're different, so that perhaps it'll be easier to keep them in sync
while they both exist.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: remove unused/moved ns_autocreation dml handler
* feat(router): expose new ns retention as config
* fix: forgot to set default value for router retention arg
* chore: make new namespace retention param an option