Commit Graph

424 Commits (196c589ef64f73677eb3e89e60b219f862bde19a)

Author SHA1 Message Date
Dom Dwyer ee063057b3
refactor(router): use gossip_schema
Replace the bespoke schema gossip logic in the router with the reusable
gossip_schema crate.
2023-08-23 12:42:33 +02:00
Dom Dwyer d35bd48f65
refactor(gossip): rename GossipMessage
Now there's a Topic, there's no need for a giant "all message types"
enum.

As part of this shift, the gossip_message::GossipMessage used for schema
gossiping is sounding overly generic. This commit changes the name to
schema_message::SchemaMessage and updates the code.

This is a backwards-compatible change (and if anything goes wrong, the
"old" routers simply log a warning if a message is unreadable).
2023-08-22 12:06:49 +02:00
Dom Dwyer ca29e9b0d8
feat(gossip): topic support
Adds "topic" support, allowing a node to subscribe to one or more types
of application payloads independently.

A gossip node is optionally initialised with a set of topics (defaulting
to "all topics") and this set of topic interests is propagated
throughout the cluster via the usual PEX mechanism, alongside the
existing connection & identity information.

When broadcasting an application payload, the sender only transmits it
to nodes that had registered an interest in this payload type. This
prevents wasted network bandwidth and CPU for all nodes, and allows
multiple, distinct payload types to be propagated independently to keep
subsystems that rely on gossip decoupled from each other (no giant,
brittle payload enum type).
2023-08-17 14:53:40 +02:00
Fraser Savage a3ab4d33da
refactor(router): Revert configurable health-check `ERROR_WINDOW`
Configuring the `ERROR_WINDOW` of the router's on-path health check
did not provide a consistent improvement for low write volume clusters.
Now that the `NUM_PROBES` parameter is configurable, this can be
un-exposed to simplify configuration options and clean up boiler plate.
2023-08-08 17:49:14 +01:00
Dom Dwyer a017d1d7f9
test: simplify integration test setup
Remove the redundant async mutex (previously required, but I refactored
the code to make it unnecessary) and DRY the node setup.
2023-08-03 18:02:33 +02:00
Dom Dwyer 757ecc1d03
perf(router): schema gossip between peers
This commit allows schema gossiping to be enabled on router nodes.

Enabling gossiping allows any schema changes made on router A to be sent
to the N-1 other routers, populating their internal caches in
anticipation of handling a similar request.

By populating their cache, they avoid incurring a catalog lookup to
populate their local state upon a cache miss, therefore reducing request
latency, and reducing catalog load.

Enabling gossip on the routers automatically enables schema gossiping -
enabling gossip remains optional, and off by default.
2023-08-03 17:10:17 +02:00
Dom Dwyer 8928c838a8
test: schema gossip w/ default partition keys
Ensure gossiping namespace & tables with empty partition keys is
correct.
2023-08-03 17:10:16 +02:00
Dom Dwyer 00542f7041
test: schema gossip integration
Adds an integration test ensuring the schema gossip layer added to one
instance ("node A") propagates schema diffs to another ("node B").
2023-08-03 17:10:16 +02:00
Dom Dwyer 16c115d5cb
docs(router): gossip subsystem types / topology
Describes the router's schema gossiping types and how they fit together.
2023-08-03 17:10:15 +02:00
Dom Dwyer 3133318e16
refactor: remove redundant NamespaceCache impl
The NamespaceCache does not need to be a decorator itself - it can
operate using a reference to the cache without needing access to cache
requests.
2023-08-03 17:10:14 +02:00
Dom Dwyer b1cdb928f6
refactor: always log error message
Always log the actual error as it may change.
2023-08-03 16:59:06 +02:00
Dom Dwyer fc903b8102
test: preserve duplicates in column set assertions
Don't collect into a BTreeSet for sorting as it drops duplicates.
2023-08-03 16:56:56 +02:00
Dom Dwyer 7a4ed257a2
feat: send-side schema gossip implementation
This commit adds the SchemaChangeObserver, the delegate which is handed
a schema diff, and is responsible for computing the gossip message and
handing it off to the gossip system.

This sits between the cache layer, and the gossip layer, converting
schema things into gossip things.

This isn't connected up, so no messages will be sent.
2023-08-03 12:42:16 +02:00
Dom a32c3d0fa8
Merge branch 'main' into dom/gossip-namespace-cache 2023-08-02 16:39:32 +01:00
Fraser Savage ff207ec158
fix(router): Use BatchSize::NumIterations(1) for namespace schema cache benchmark
Batches share the same set-up step between iterations, so using a batch
size of more than 1 per setup provides inaccurate readings.
2023-08-02 13:35:55 +01:00
Dom Dwyer 10a3a048d8
feat: NamespaceSchemaGossip cache decorator
This commit adds the NamespaceSchemaGossip type, a decorator of
[`NamespaceCache`] implementations utilising peer gossiping to provide
best-effort convergence of the local cache state.

This decorator will sit in the NamespaceCache stack, allowing it to
receive incoming schema gossip messages, and update the local cache
through the regular NamespaceCache abstraction methods.

This currently implements the message handlers only - no messages are
sent yet!
2023-08-02 14:08:06 +02:00
Fraser Savage 33e4098cf8
perf(router): Add benchmark for additions to namespace schema cache
This benchmark covers two axis of performance for calls to the
namespace cache's `put_schema()` stack. These are the cost of adding
varying numbers of new columns to an existing table in the namespace, as
well as adding new tables with their own set of columns to an existing
namespace.
2023-08-02 12:45:30 +01:00
Dom Dwyer 41c9604e46
feat(router): schema gossip skeleton
Adds the supporting types required to integrate the generic gossip crate
into a schema-specific broadcast primitive.

This commit implements the two "halves":

    * GossipMessageDispatcher: async processing of incoming gossip msgs
    * Handle: the send-side handle for async sending of gossip msgs

These types are responsible for converting into/from the serialised
bytes sent over the gossip primitive into application-level / protobuf
types.
2023-08-01 17:11:09 +02:00
Fraser Savage df2c1850fb
refactor(router): Try to fix rustfmt having a nap 2023-08-01 14:51:20 +01:00
Fraser Savage e643014900
docs(router): Fix typo in circuit breaker document comment 2023-08-01 14:46:17 +01:00
Fraser Savage e4a5d2efaa
feat(router): Expose `num_probes` request count used to health-check ingesters as config option
This allows routers to be configured to mark downstreams as healthy/
unhealthy with a requirement for the number of probe requests
which can/must be collected to transition the health checkers circuit
state to healthy/unhealthy.
2023-08-01 14:21:56 +01:00
Dom Dwyer 8da08fa574
feat(router): optionally enable gossip subsystem
Allows the router to optionally enable and start the gossip subsystem
(disabled by default).

No code uses the gossip system, so no application-level messages are
exchanged, but this allows the gossip subsystem to run and exchange
control frames / perform discovery / etc.
2023-07-31 11:01:30 +02:00
Fraser Savage a930be45f7
refactor(router): Use map & sum over values instead of fold over iter
Also add a nice comment explaining what the string keys are for
[`ChangeStats`].

Co-authored-by: Dom <dom@itsallbroken.com>
2023-07-28 15:11:13 +01:00
Fraser Savage e00a5cab13
perf(router): Pre-compute `ChangeStats` new column total during schema merge
During the schema merge the new tables are iterated over already (to find
which tables and columns are new), so the number needed for the metrics
can be pre-computed to spare two extra loops over the new tables and new
columns returned in `ChangeStats`.
2023-07-27 14:01:50 +01:00
Fraser Savage 5453ad8ba4
feat(router): Include table/column diff for namespace schema cache update
This adds some computational overhead during the merging of new
namespace schema with what's in the router's local cache, but will allow
gossiping of changes.
2023-07-27 13:37:47 +01:00
Fraser Savage c818f90aef
docs(router): Remove code doc ref from router CLI flag text 2023-07-26 11:01:13 +01:00
Fraser Savage 61e79374e0
feat(router): Expose circuit breaker healthcheck config
Exposes the `ERROR_WINDOW` parameter that controls the router's
downstream error-gate health check behaviour as an environment
variable/command line flag. This allows tuning, per-environment, the
period over which the error rate of 80% must be exceeded to cause an
ingester to appear unhealthy.
2023-07-26 09:48:55 +01:00
Fraser Savage c834ec171f
test(router): Custom partition template API create using `time` tag value is rejected
This removes the double negative from the error message and adds
coverage at the router's gRPC API level for the rejection of the bad
TagValue value.
2023-07-24 13:07:04 +01:00
wiedld efae0f108a
feat(idpe-17887): enable `/` in db name for v1 write. (#8235)
* test case for proposed new behavior in v1 write endpoint.
* autogen and default are equivalent reserved words for rp
* have write endpoint match query endpoint, in that db and rp are always concated
2023-07-18 09:36:25 -07:00
dependabot[bot] e33a078128
chore(deps): Bump paste from 1.0.13 to 1.0.14 (#8244)
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.13 to 1.0.14.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.13...1.0.14)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-17 16:10:02 +00:00
Carol (Nichols || Goulding) 10a0f8e3bf
fix: Remove ::default() when constructing unit structs
As recommended by https://rust-lang.github.io/rust-clippy/master/index.html#default_constructed_unit_structs
2023-07-14 10:50:55 -04:00
Carol (Nichols || Goulding) d40bc54b71
fix: Remove unneeded double derefs found with new lint suspicious_double_ref_op 2023-07-14 10:25:21 -04:00
Fraser Savage 7e595eca88
test(router): Assert RPC write span contexts can be parsed as encoded
This test aims to add some assertion that the span context is correctly
encoded into an RPC write request as long as the [`TraceHeaderParser`]
is responsible for decorating the requests extensions with the added
information.
2023-07-12 16:41:40 +01:00
Fraser Savage 5a37c92c2c
feat(router): Send tracing SpanContext header to ingester during RPC write 2023-07-12 11:30:50 +01:00
dependabot[bot] 8b000862e1
chore(deps): Bump pretty_assertions from 1.3.0 to 1.4.0 (#8182)
Bumps [pretty_assertions](https://github.com/rust-pretty-assertions/rust-pretty-assertions) from 1.3.0 to 1.4.0.
- [Release notes](https://github.com/rust-pretty-assertions/rust-pretty-assertions/releases)
- [Changelog](https://github.com/rust-pretty-assertions/rust-pretty-assertions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/rust-pretty-assertions/rust-pretty-assertions/compare/v1.3.0...v1.4.0)

---
updated-dependencies:
- dependency-name: pretty_assertions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-07 09:35:18 +00:00
dependabot[bot] bc6bf2d8e5
chore(deps): Bump smallvec from 1.10.0 to 1.11.0 (#8164)
Bumps [smallvec](https://github.com/servo/rust-smallvec) from 1.10.0 to 1.11.0.
- [Release notes](https://github.com/servo/rust-smallvec/releases)
- [Commits](https://github.com/servo/rust-smallvec/compare/v1.10.0...v1.11.0)

---
updated-dependencies:
- dependency-name: smallvec
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-06 09:43:27 +00:00
dependabot[bot] 9a03d9c9fe
chore(deps): Bump paste from 1.0.12 to 1.0.13 (#8139)
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.12 to 1.0.13.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.12...1.0.13)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-04 07:57:41 +00:00
Dom Dwyer 8dd159456a
test: assert partitioner row counts
Assert the number of rows yielded by the partitioner matches the number
of input rows.
2023-06-16 14:14:03 +02:00
Dom Dwyer f92b866979
test: better proptest timestamp for DML partition
Changes the proptest for the router's partitioner handler to use a
timestamp generation strategy that more accurately models the
distribution of timestamps in real-world requests.
2023-06-16 12:40:33 +02:00
Dom Dwyer 0a2a315b91
chore: limit chrono features
See https://rustsec.org/advisories/RUSTSEC-2020-0071
2023-06-15 16:41:20 +02:00
Dom Dwyer 5388e49734
test: router partition handler
Asserts the partitioning code within the router (that drives the
low-level partitioning logic) generates partitions with rows with
timestamps that belong in those partitions.
2023-06-15 14:54:46 +02:00
Marco Neumann 335d9f7357
chore: minimize proptest features (#7993) 2023-06-14 12:28:18 +00:00
dependabot[bot] 2ffa9f3cda
chore(deps): Bump crossbeam-utils from 0.8.15 to 0.8.16
Bumps [crossbeam-utils](https://github.com/crossbeam-rs/crossbeam) from 0.8.15 to 0.8.16.
- [Release notes](https://github.com/crossbeam-rs/crossbeam/releases)
- [Changelog](https://github.com/crossbeam-rs/crossbeam/blob/master/CHANGELOG.md)
- [Commits](https://github.com/crossbeam-rs/crossbeam/compare/crossbeam-utils-0.8.15...crossbeam-utils-0.8.16)

---
updated-dependencies:
- dependency-name: crossbeam-utils
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-06-13 02:00:14 +00:00
Marko Mikulicic d26ad8e079
feat: Allow passing service protection limits in create db gRPC call (#7941)
* feat: Allow passing service protection limits in create db gRPC call

* fix: Move the impl into the catalog namespace trait

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-06-08 14:28:32 +00:00
Carol (Nichols || Goulding) d0db1194e2
feat: Validate custom partition templates on their creation
Make sure custom partition templates have:

- At least one part
- No more than 8 parts
- Only nonempty, valid strftime formats
2023-06-07 11:38:12 -04:00
Carol (Nichols || Goulding) ac26ceef91
feat: Make a place to do partition template validation
- Create data_types::partition_template::ValidationError
- Make creation of NamespacePartitionTemplateOverride and
  TablePartitionTemplateOverride fallible
- Move SerializationWrapper into a module to make its inner field
  private to force creation through one fallible constructor; this is
  where the validation logic will go to be shared among all uses of
  partition templates
2023-06-07 11:38:12 -04:00
Dom Dwyer 8e61dc5aef
refactor: remove InvalidStrftime value
It's big, it's annoying, it's already available to the user.
2023-06-05 11:31:02 +02:00
Dom Dwyer a873e119c4
test(bench): router partitioner
Adds a benchmark that exercises the router's partitioning DmlHandler
implementation against a set of three files (very small, small, medium)
with 4 different partitioning schemes:

    * Single tag, which occurs in all rows
    * Single tag, which does not occur in any row
    * Default strftime formatter (YYYY-MM-DD)
    * Long and complicated strftime formatter

This covers the entire partitioning overhead - building the formatters,
evaluating each row, grouping the values into per-partition buckets, and
returning to the caller, where it normally would be passed to the next
handler in the pipeline.

Note that only one template part is evaluated in each case - this
measures the overhead of each type of formatter. In reality, we'd expect
partitioning with custom schemes to utilise more than one part,
increasing the cost of partitioning proportionally. This is a
lower-bound measurement!
2023-06-02 16:04:09 +02:00
Dom Dwyer f0832818ee
test(router): invalid strftime partition template
An integration test asserting that a router returns an error when
attempting to partition a write with an invalid strftime partition
formatter, rather than panicking.
2023-06-01 17:44:44 +02:00
Dom Dwyer 47214ec9a0
fix: prevent panics in partitioning logic
Changes the partitioning logic to be fallible. This prevents an invalid
partition template from causing a panic, previously possible through two
known code paths:

    * TagValue formatter referencing a non-tag column
    * Time formatter using an invalid strftime format string

If either occurs, the write attempt is now aborted and an error returned
to the user with a HTTP 500 status code.

Additionally unexpected partitioner errors now map to a catch-all error
instead of panicking.
2023-06-01 17:44:44 +02:00