Commit Graph

69 Commits (e4fb227c6e04daa01328294a554180e965d3fa93)

Author SHA1 Message Date
Dom Dwyer 7c5ba34d44 refactor: enable gRPC handler
Plumbs the gRPC write handler into the existing router2 server.
2022-03-04 14:51:43 +00:00
Dom Dwyer 26c43f0a2c feat: grpc write handler
Implements the gRPC write handler endpoint.
2022-03-04 14:51:42 +00:00
Dom Dwyer 14d90d1011 feat: schema validation benchmarks
Useful for confirming the scalability of the schema check algorithm.
2022-03-03 23:40:13 +00:00
Dom Dwyer 6b5283bf36 refactor: NamespaceCache latency histograms
Switches the get/put counters to latency histograms to record the
duration of each call - this might be interesting!
2022-03-03 23:40:13 +00:00
Dom Dwyer bb9b140f4b refactor: sequencer metrics
Records per-sequencer (kafka partition) enqueue latency / counts broken
down by operation success/error.
2022-03-03 23:40:13 +00:00
Dom Dwyer e00986c563 refactor: write table count metric
Record the number of tables in each write - this will let us observe the
total number of tables a router instance has observed, which when
combined with the existing metrics helps us understand the shape
(distribution of tables/lines/fields) of the workload hitting the
routers.
2022-03-03 23:40:13 +00:00
Dom Dwyer 8de453edd1 feat: batch column upsert for schema validation
Uses the new ColumnRepo::create_or_get_many() catalog method to perform
a bulk upsert of (potentially) new columns to the catalog during schema
validation.
2022-03-03 11:18:29 +00:00
Carol (Nichols || Goulding) 3f2a58b47f
refactor: pub use data_types from data_types2
So it's clearer which parts of data_types the NG design is using, and
which types can be cleaned up eventually.
2022-03-02 13:55:31 -05:00
Carol (Nichols || Goulding) 8f3e44bf76
refactor: Extract a crate for shared data types in the new design 2022-03-02 12:16:15 -05:00
Dom Dwyer bd64f55658 feat: http ingest metrics
Records LP line count, field count & request body size (decompressed,
byte size) for writes, and request body byte size for deletes.
2022-03-02 13:05:55 +00:00
Marco Neumann 48722783f9
feat: offer metrics for in-mem catalog (#3876)
This can be quite helpful to test certain caching behavior w/o writing
yet-another abstraction layer.
2022-03-01 11:33:54 +00:00
Dom Dwyer b07f15bec7 refactor: parallel column resolution
A quick change to perform the ColumnRepo::create_or_get() calls in
parallel (up to a maximum of 3 in-flight at any one time) in order to
mitigate the latency of the call and reduce the overall schema
validation call duration.

The in-flight limit is enforced to avoid starving the DB connection pool
of connections.
2022-02-24 21:04:25 +00:00
Dom Dwyer 3d77cf5845 test: validate metrics for adding namespace 2022-02-24 16:07:02 +00:00
Dom Dwyer 0ddc35ce73 feat: instrument namespace cache contents
Adds two new metrics:

    * namespace_cache_table_count: total number of tables in cache
    * namespace_cache_column_count: total number of columns in cache

The metric decorator keeps a running total of each of the table and
column counts as namespaces are inserted into the cache, and adjusts the
value accordingly when an existing namespace is overwrote.
2022-02-24 15:11:14 +00:00
Dom Dwyer 4024e95ce9 refactor: borrow metric registry
There's no need for the namespace metrics to take (shared) ownership of
the metric registry, so lend it at the call site instead of cloning the
arc.
2022-02-24 15:04:49 +00:00
Dom Dwyer d7eda88581 refactor: early schema validation
Changes the configuration of the router request pipeline to move schema
validation before partitioning.

This reduces the concurrency of callsm into the schema validator when a
single write is split into one or more partitions, reducing contention
and cash thrashing. It also ensures we don't bother partitioning the
writes if the request will fail.
2022-02-23 18:59:14 +00:00
Dom Dwyer 9707d85e5e test: InstrumentationDecorator DML handler impls 2022-02-23 17:23:02 +00:00
Dom Dwyer b20dce80a2 feat: emit trace spans for router stages
Configures the instrumentation decorator to emit a trace span covering
the duration of the decorated handler's execution, recording the
success/error result and and error message, if any.
2022-02-23 10:39:13 +00:00
Luke Bond e19609ab7b
feat: routing service protection (#3807)
* chore: db migration for namespace table & column limits

* feat: impl table & column limits in catalog

* chore: improved comment in catalog

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 17:26:37 +00:00
Dom Dwyer 497615d715 test: router handler stack integration test
Adds an integration test covering the router's HTTP handler stack.

Given a well-formed HTTP write, the test asserts:

    * Write passes through the stack without error
    * Response code sent to client
    * Write buffer message is enqueued
    * Catalog namespace record is created
    * Metric handler is invoked and the hit is recorded
2022-02-18 14:39:03 +00:00
Dom Dwyer bb132b61ad refactor: chain DML handlers
The router is composed of several DML handlers called in sequence in
order to construct the full request handling pipeline. Prior to this
commit, each handler nested the next handler it calls internally,
producing a nested call chain that resulted metrics (added in #3764)
recording cumulative latency like this:

              ┌ ─

              │     ┌───────────────┐
                    │  NS Creation  │
              │     └───────────────┘
                            │  ┌───────────────┐
              │             │  │  Partitioner  │
                            │  └───────────────┘
              │             │          │
                            │          │
 Cumulative   │             │          │  ┌───────────────┐
   Timings                1.5s        1s  │    etc...     │
              │             │          │  └───────────────┘
                            │          │
              │             │          │
                            │  ┌───────────────┐
              │             │  │  Partitioner  │
                            │  └───────────────┘
              │     ┌───────────────┐
                    │  NS Creation  │
              │     └───────────────┘

              └ ─

This meant it was hard to determine the latency of a single handler
without knowing (and subtracting the latency of) all the child handlers
it calls.

This commit replaces the intrusive nested handler call chain with an
external Chain combinator type to compose together individual handlers,
resulting in correct per-handler timings and simpler code/tests:

          ┌───────────────┐
          │  NS Creation  │
          └───────────────┘
                  │
                 .5s       ┌───────────────┐
                  └───────▶│  Partitioner  │
                           └───────────────┘
                                   │
                                  1s    ┌───────────────┐
                                   └───▶│    etc...     │
                                        └───────────────┘
2022-02-18 14:19:53 +00:00
Dom Dwyer 52fd2af851 refactor: DML handler metric name labels
Emit metrics labelled with "handler=<name>" and a common metric name,
instead of constructing metrics prefixed with the DML handler name.
2022-02-17 15:11:20 +00:00
Dom Dwyer 40e5b19301 feat: metric instrumentation for DML handlers
Adds a decorator type over a DmlHandler implementation that records call
latency for writes & deletes, broken down by result (success/error).
2022-02-16 14:00:49 +00:00
Dom Dwyer 92fe507e52 feat: instrumented namespace cache
Decorates the NamespaceCache with a set of cache get hit/miss counters,
and put insert/update counters to expose cache behaviour.
2022-02-16 14:00:49 +00:00
Dom Dwyer e055800039 refactor: enable Partitioner in request pipeline
Adds the Partitioner DML handler into the handler stack, modifying the
input types of down-stream handlers to accept the partitioned data.
2022-02-15 11:34:33 +00:00
Dom Dwyer c64e9f0d40 refactor: namespace auto-creator generic input
Changes the NamespaceAutocreation handler to be generic over any
WriteInput.

This allows the NamespaceAutocreation layer to be placed anywhere in the
handler stack, without needing a prior transformation or specific write
type.
2022-02-15 11:29:33 +00:00
Dom Dwyer 92218ce8aa feat: write partitioner
Implements a write partitioning DML handler that splits per-table
MutableBatch instances into per-partition, per-table MutableBatch and
concurrently calls the inner DML handler with each.
2022-02-15 11:29:32 +00:00
Dom Dwyer 5c254339fa test: MockDmlHandler generic over write input
Allow the MockDmlHandler to capture any input type given to the write()
method. This lets us reuse the mock across all handler implementations,
regardless of their expected write input type.
2022-02-15 11:27:16 +00:00
Dom Dwyer e99922d518 refactor: parametrise DML handler input type
Allow a DML handler to specify the write input type on which it
operates.

This allows us to construct a write handler pipeline that transforms the
request as it passes through the various handlers. We'll use this to
implement a handler that annotates a normal set of table writes with the
partition key, modifying downstream handlers to expect this annotated
input.
2022-02-15 11:23:45 +00:00
Marco Neumann c6e374a025
feat: allow catalog access w/o a transaction (#3735)
* feat: allow catalog access w/o a transaction

Now the caller has the full control if they want to use a transaction or
not.

* fix: remove non-transaction-safe `create_many`

* fix: remove unnecessary transactions
2022-02-15 10:15:36 +00:00
dependabot[bot] f23574bc5f
chore(deps): bump futures from 0.3.19 to 0.3.21 (#3706)
Bumps [futures](https://github.com/rust-lang/futures-rs) from 0.3.19 to 0.3.21.
- [Release notes](https://github.com/rust-lang/futures-rs/releases)
- [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.19...0.3.21)

---
updated-dependencies:
- dependency-name: futures
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-10 09:19:19 +00:00
kodiakhq[bot] ace76cef14
Merge branch 'main' into dom/sharded-cache 2022-02-08 16:09:48 +00:00
Marco Neumann 5de4d6203f
refactor: catalog transaction (#3660)
* refactor: catalog Unit of Work (= transaction)

Setup an inteface to handle Units of Work within our catalog. Previously
both the Postgres and the in-mem backend used "mini-transactions on
demand". Now the caller has a clear way to establish boundaries and
gets read and write isolation. A single `Arc<dyn Catalog>` can create as
many `Box<dyn UnitOfWork>` as you like, but note that depending on the
backend you may not scale infinitely (postgres will likely impose
certain limits and the in-mem backend limits concurrency to 1 to keep
things simple).

* docs: improve wording

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: rename Unit of Work to Transaction

* test: improve `test_txn_isolation`

* feat: clearify transaction drop semantics

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 13:38:33 +00:00
Dom Dwyer 45f9ef82ba feat: shard namespace cache
Adds a simple wrapper type that maps the namespace keyspace over a set
of N namespace schema caches, thereby reducing cache lock contention by
a factor of N (in a perfect world).

This will help smooth out latency of workloads that include new
namespace requests or incremental schema additions. It should also
significantly help latency during initial cache warming of a freshly
booted router.
2022-02-04 16:12:45 +00:00
Dom Dwyer 026a557c0b refactor: rename TableNamespaceSharder
Rename to JumpHash and expose the hashing internals for reuse (outside
of only table & namespace sharding).
2022-02-04 15:56:09 +00:00
Dom Dwyer aefc70a9ea feat(router2): namespace auto-creation
Decorate the existing request handler pipeline with a layer that
implicitly creates the namespace when a write request is received.
2022-02-04 15:34:15 +00:00
kodiakhq[bot] 3197ea945b
Merge branch 'main' into dom/extract-ns-cache 2022-02-03 12:30:37 +00:00
Dom 2e9b97a4ab
docs: fix typo
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-02-03 12:30:16 +00:00
Paul Dix ce46bbaada
feat: wire up the write buffer to the ingester process (#3533)
This adds the scaffolding for the ingester server to consume data from Kafka. This ingests data in an in memory structure while creating records in the catalog for any partitions that don't yet exist.

I've removed catalog_update.rs in ingester for now. That was mostly a placeholder and will be going in a combination of handler.rs and data.rs on my next PR which will have some primitive lifecycle wired up.

There's one ugly bit here where the DML write is cloned because it's getting borrowed to output spans and metrics. I'll need to follow up with a refactor to make it so that the DML write's tables can be consumed without it gumming up the metrics stuff.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-03 11:47:28 +00:00
Dom Dwyer 3cc4481616 refactor: extract NamespaceSchema cache
Breaks the in-memory cache of NamespaceSchema out into a decoupled type
that can be shared across multiple DML handlers.
2022-02-03 10:01:07 +00:00
Dom Dwyer 26c033d529 style: return directly 2022-02-02 15:20:48 +00:00
Dom Dwyer 4744c5804e refactor: remove Dashmap
Swap Dashmap for a regular RwLock<HashMap<..,>> due to soundness issues:

    https://rustsec.org/advisories/RUSTSEC-2022-0002
2022-02-02 14:04:53 +00:00
Dom Dwyer 6598023726 feat: cache NamespaceSchema in validator
Adds an in-memory cache of table schemas to the SchemaValidator DML
handler.

The cache pulls from the global catalog when observing a column for the
first time, and pushes the column type to set it for subsequent requests
if it does not exist (this pull & push is done by atomically by the
catalog in an "upsert" call).

The in-memory cache is sharded by namespace, with each shard guarded by
an individual lock to minimise contention between readers (the expected
average case) and writers (only when adding new columns/tables).

Relies on the catalog to serialise new column creation and validate
parallel creation requests.
2022-02-02 13:04:53 +00:00
Dom Dwyer c81f207298 feat: schema validation
Implements a write schema validation DML handler, denying requests that
conflict with the schema within the global catalog. Additive schema
changes are accepted, incrementally updating the global catalog schema.

Deletes are passed through unchanged and unvalidated.
2022-02-02 13:04:53 +00:00
Marco Neumann 22778a3a80
chore: upgrade rskafka and parking_lot (#3592) 2022-02-01 11:50:42 +00:00
Luke Bond 011b297f28
feat: more benchmarks of router2 (#3575) 2022-01-28 17:44:10 +00:00
Dom Dwyer b38deaa721 refactor: decouple error types for DmlHandler
Allows the DmlHandler to return different types for each method.

This enables a DmlHandler implementation decorating an inner handler to
return the inner handler's error directly, avoiding any "wrapper"
errors.
2022-01-28 11:01:06 +00:00
Luke Bond 4a96e52290
feat: router2 sharder benchmarking (#3558)
* feat: benchmarking the router2 sharder

* chore: added throughput to sharder benchmarks; vary num buckets
2022-01-27 18:09:16 +00:00
Andrew Lamb 2062267d0f
chore: Update hashbrown (#3551)
* chore: Update hashbrown

* fix: hakari

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 15:34:10 +00:00
Dom 5447554aee
refactor(router2): DML handler stack (#3549)
* refactor: composable DmlHandler stack

Changes the DmlHandler trait to allow composition of handler logic in
order to construct the complete request processing pipeline.

* feat: debug log write/delete requests

Log requests hitting the HTTP endpoint at DEBUG.

* refactor: dml_handler -> dml_handlers

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 14:54:27 +00:00