Now there's a Topic, there's no need for a giant "all message types"
enum.
As part of this shift, the gossip_message::GossipMessage used for schema
gossiping is sounding overly generic. This commit changes the name to
schema_message::SchemaMessage and updates the code.
This is a backwards-compatible change (and if anything goes wrong, the
"old" routers simply log a warning if a message is unreadable).
Adds "topic" support, allowing a node to subscribe to one or more types
of application payloads independently.
A gossip node is optionally initialised with a set of topics (defaulting
to "all topics") and this set of topic interests is propagated
throughout the cluster via the usual PEX mechanism, alongside the
existing connection & identity information.
When broadcasting an application payload, the sender only transmits it
to nodes that had registered an interest in this payload type. This
prevents wasted network bandwidth and CPU for all nodes, and allows
multiple, distinct payload types to be propagated independently to keep
subsystems that rely on gossip decoupled from each other (no giant,
brittle payload enum type).
Configuring the `ERROR_WINDOW` of the router's on-path health check
did not provide a consistent improvement for low write volume clusters.
Now that the `NUM_PROBES` parameter is configurable, this can be
un-exposed to simplify configuration options and clean up boiler plate.
This commit allows schema gossiping to be enabled on router nodes.
Enabling gossiping allows any schema changes made on router A to be sent
to the N-1 other routers, populating their internal caches in
anticipation of handling a similar request.
By populating their cache, they avoid incurring a catalog lookup to
populate their local state upon a cache miss, therefore reducing request
latency, and reducing catalog load.
Enabling gossip on the routers automatically enables schema gossiping -
enabling gossip remains optional, and off by default.
This commit adds the SchemaChangeObserver, the delegate which is handed
a schema diff, and is responsible for computing the gossip message and
handing it off to the gossip system.
This sits between the cache layer, and the gossip layer, converting
schema things into gossip things.
This isn't connected up, so no messages will be sent.
This commit adds the NamespaceSchemaGossip type, a decorator of
[`NamespaceCache`] implementations utilising peer gossiping to provide
best-effort convergence of the local cache state.
This decorator will sit in the NamespaceCache stack, allowing it to
receive incoming schema gossip messages, and update the local cache
through the regular NamespaceCache abstraction methods.
This currently implements the message handlers only - no messages are
sent yet!
This benchmark covers two axis of performance for calls to the
namespace cache's `put_schema()` stack. These are the cost of adding
varying numbers of new columns to an existing table in the namespace, as
well as adding new tables with their own set of columns to an existing
namespace.
Adds the supporting types required to integrate the generic gossip crate
into a schema-specific broadcast primitive.
This commit implements the two "halves":
* GossipMessageDispatcher: async processing of incoming gossip msgs
* Handle: the send-side handle for async sending of gossip msgs
These types are responsible for converting into/from the serialised
bytes sent over the gossip primitive into application-level / protobuf
types.
This allows routers to be configured to mark downstreams as healthy/
unhealthy with a requirement for the number of probe requests
which can/must be collected to transition the health checkers circuit
state to healthy/unhealthy.
Allows the router to optionally enable and start the gossip subsystem
(disabled by default).
No code uses the gossip system, so no application-level messages are
exchanged, but this allows the gossip subsystem to run and exchange
control frames / perform discovery / etc.
During the schema merge the new tables are iterated over already (to find
which tables and columns are new), so the number needed for the metrics
can be pre-computed to spare two extra loops over the new tables and new
columns returned in `ChangeStats`.
This adds some computational overhead during the merging of new
namespace schema with what's in the router's local cache, but will allow
gossiping of changes.
Exposes the `ERROR_WINDOW` parameter that controls the router's
downstream error-gate health check behaviour as an environment
variable/command line flag. This allows tuning, per-environment, the
period over which the error rate of 80% must be exceeded to cause an
ingester to appear unhealthy.
* test case for proposed new behavior in v1 write endpoint.
* autogen and default are equivalent reserved words for rp
* have write endpoint match query endpoint, in that db and rp are always concated
This test aims to add some assertion that the span context is correctly
encoded into an RPC write request as long as the [`TraceHeaderParser`]
is responsible for decorating the requests extensions with the added
information.
Changes the proptest for the router's partitioner handler to use a
timestamp generation strategy that more accurately models the
distribution of timestamps in real-world requests.
Asserts the partitioning code within the router (that drives the
low-level partitioning logic) generates partitions with rows with
timestamps that belong in those partitions.
* feat: Allow passing service protection limits in create db gRPC call
* fix: Move the impl into the catalog namespace trait
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
- Create data_types::partition_template::ValidationError
- Make creation of NamespacePartitionTemplateOverride and
TablePartitionTemplateOverride fallible
- Move SerializationWrapper into a module to make its inner field
private to force creation through one fallible constructor; this is
where the validation logic will go to be shared among all uses of
partition templates
Adds a benchmark that exercises the router's partitioning DmlHandler
implementation against a set of three files (very small, small, medium)
with 4 different partitioning schemes:
* Single tag, which occurs in all rows
* Single tag, which does not occur in any row
* Default strftime formatter (YYYY-MM-DD)
* Long and complicated strftime formatter
This covers the entire partitioning overhead - building the formatters,
evaluating each row, grouping the values into per-partition buckets, and
returning to the caller, where it normally would be passed to the next
handler in the pipeline.
Note that only one template part is evaluated in each case - this
measures the overhead of each type of formatter. In reality, we'd expect
partitioning with custom schemes to utilise more than one part,
increasing the cost of partitioning proportionally. This is a
lower-bound measurement!
An integration test asserting that a router returns an error when
attempting to partition a write with an invalid strftime partition
formatter, rather than panicking.
Changes the partitioning logic to be fallible. This prevents an invalid
partition template from causing a panic, previously possible through two
known code paths:
* TagValue formatter referencing a non-tag column
* Time formatter using an invalid strftime format string
If either occurs, the write attempt is now aborted and an error returned
to the user with a HTTP 500 status code.
Additionally unexpected partitioner errors now map to a catch-all error
instead of panicking.