Commit Graph

42 Commits (5c254339fab690bcf6d2a44ece4624c00383cc1e)

Author SHA1 Message Date
Dom Dwyer 5c254339fa test: MockDmlHandler generic over write input
Allow the MockDmlHandler to capture any input type given to the write()
method. This lets us reuse the mock across all handler implementations,
regardless of their expected write input type.
2022-02-15 11:27:16 +00:00
Dom Dwyer e99922d518 refactor: parametrise DML handler input type
Allow a DML handler to specify the write input type on which it
operates.

This allows us to construct a write handler pipeline that transforms the
request as it passes through the various handlers. We'll use this to
implement a handler that annotates a normal set of table writes with the
partition key, modifying downstream handlers to expect this annotated
input.
2022-02-15 11:23:45 +00:00
Marco Neumann c6e374a025
feat: allow catalog access w/o a transaction (#3735)
* feat: allow catalog access w/o a transaction

Now the caller has the full control if they want to use a transaction or
not.

* fix: remove non-transaction-safe `create_many`

* fix: remove unnecessary transactions
2022-02-15 10:15:36 +00:00
dependabot[bot] f23574bc5f
chore(deps): bump futures from 0.3.19 to 0.3.21 (#3706)
Bumps [futures](https://github.com/rust-lang/futures-rs) from 0.3.19 to 0.3.21.
- [Release notes](https://github.com/rust-lang/futures-rs/releases)
- [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.19...0.3.21)

---
updated-dependencies:
- dependency-name: futures
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-10 09:19:19 +00:00
kodiakhq[bot] ace76cef14
Merge branch 'main' into dom/sharded-cache 2022-02-08 16:09:48 +00:00
Marco Neumann 5de4d6203f
refactor: catalog transaction (#3660)
* refactor: catalog Unit of Work (= transaction)

Setup an inteface to handle Units of Work within our catalog. Previously
both the Postgres and the in-mem backend used "mini-transactions on
demand". Now the caller has a clear way to establish boundaries and
gets read and write isolation. A single `Arc<dyn Catalog>` can create as
many `Box<dyn UnitOfWork>` as you like, but note that depending on the
backend you may not scale infinitely (postgres will likely impose
certain limits and the in-mem backend limits concurrency to 1 to keep
things simple).

* docs: improve wording

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: rename Unit of Work to Transaction

* test: improve `test_txn_isolation`

* feat: clearify transaction drop semantics

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 13:38:33 +00:00
Dom Dwyer 45f9ef82ba feat: shard namespace cache
Adds a simple wrapper type that maps the namespace keyspace over a set
of N namespace schema caches, thereby reducing cache lock contention by
a factor of N (in a perfect world).

This will help smooth out latency of workloads that include new
namespace requests or incremental schema additions. It should also
significantly help latency during initial cache warming of a freshly
booted router.
2022-02-04 16:12:45 +00:00
Dom Dwyer 026a557c0b refactor: rename TableNamespaceSharder
Rename to JumpHash and expose the hashing internals for reuse (outside
of only table & namespace sharding).
2022-02-04 15:56:09 +00:00
Dom Dwyer aefc70a9ea feat(router2): namespace auto-creation
Decorate the existing request handler pipeline with a layer that
implicitly creates the namespace when a write request is received.
2022-02-04 15:34:15 +00:00
kodiakhq[bot] 3197ea945b
Merge branch 'main' into dom/extract-ns-cache 2022-02-03 12:30:37 +00:00
Dom 2e9b97a4ab
docs: fix typo
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-02-03 12:30:16 +00:00
Paul Dix ce46bbaada
feat: wire up the write buffer to the ingester process (#3533)
This adds the scaffolding for the ingester server to consume data from Kafka. This ingests data in an in memory structure while creating records in the catalog for any partitions that don't yet exist.

I've removed catalog_update.rs in ingester for now. That was mostly a placeholder and will be going in a combination of handler.rs and data.rs on my next PR which will have some primitive lifecycle wired up.

There's one ugly bit here where the DML write is cloned because it's getting borrowed to output spans and metrics. I'll need to follow up with a refactor to make it so that the DML write's tables can be consumed without it gumming up the metrics stuff.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-03 11:47:28 +00:00
Dom Dwyer 3cc4481616 refactor: extract NamespaceSchema cache
Breaks the in-memory cache of NamespaceSchema out into a decoupled type
that can be shared across multiple DML handlers.
2022-02-03 10:01:07 +00:00
Dom Dwyer 26c033d529 style: return directly 2022-02-02 15:20:48 +00:00
Dom Dwyer 4744c5804e refactor: remove Dashmap
Swap Dashmap for a regular RwLock<HashMap<..,>> due to soundness issues:

    https://rustsec.org/advisories/RUSTSEC-2022-0002
2022-02-02 14:04:53 +00:00
Dom Dwyer 6598023726 feat: cache NamespaceSchema in validator
Adds an in-memory cache of table schemas to the SchemaValidator DML
handler.

The cache pulls from the global catalog when observing a column for the
first time, and pushes the column type to set it for subsequent requests
if it does not exist (this pull & push is done by atomically by the
catalog in an "upsert" call).

The in-memory cache is sharded by namespace, with each shard guarded by
an individual lock to minimise contention between readers (the expected
average case) and writers (only when adding new columns/tables).

Relies on the catalog to serialise new column creation and validate
parallel creation requests.
2022-02-02 13:04:53 +00:00
Dom Dwyer c81f207298 feat: schema validation
Implements a write schema validation DML handler, denying requests that
conflict with the schema within the global catalog. Additive schema
changes are accepted, incrementally updating the global catalog schema.

Deletes are passed through unchanged and unvalidated.
2022-02-02 13:04:53 +00:00
Marco Neumann 22778a3a80
chore: upgrade rskafka and parking_lot (#3592) 2022-02-01 11:50:42 +00:00
Luke Bond 011b297f28
feat: more benchmarks of router2 (#3575) 2022-01-28 17:44:10 +00:00
Dom Dwyer b38deaa721 refactor: decouple error types for DmlHandler
Allows the DmlHandler to return different types for each method.

This enables a DmlHandler implementation decorating an inner handler to
return the inner handler's error directly, avoiding any "wrapper"
errors.
2022-01-28 11:01:06 +00:00
Luke Bond 4a96e52290
feat: router2 sharder benchmarking (#3558)
* feat: benchmarking the router2 sharder

* chore: added throughput to sharder benchmarks; vary num buckets
2022-01-27 18:09:16 +00:00
Andrew Lamb 2062267d0f
chore: Update hashbrown (#3551)
* chore: Update hashbrown

* fix: hakari

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 15:34:10 +00:00
Dom 5447554aee
refactor(router2): DML handler stack (#3549)
* refactor: composable DmlHandler stack

Changes the DmlHandler trait to allow composition of handler logic in
order to construct the complete request processing pipeline.

* feat: debug log write/delete requests

Log requests hitting the HTTP endpoint at DEBUG.

* refactor: dml_handler -> dml_handlers

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 14:54:27 +00:00
Paul Dix 16d584b2ff
feat: Add db_name/namespace to DmlWrite and DmlDelete (#3531)
* feat: Add db_name/namespace to DmlWrite and DmlDelete

This is required for the new ingester to be able to work with the write buffer. The protobuf that gets serialized over Kafka already includes the database name, it just wasn't getting carried through to the marshaled Dml operation.

* fix: database != namespace, propagation through write buffer

Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 14:12:20 +00:00
Luke Bond 107f39d53c
feat: add trace collector to router2 (#3529)
* feat: add trace collector to router2

* chore: fmt
2022-01-26 11:51:17 +00:00
Dom 6b0f7e6b2b
feat: initialise ShardedWriteBuffer (#3528)
Initialises a ShardedWriteBuffer for the hard-coded "iox_shared" topic.

Adds the following CLI flags:

    * --write-buffer: type of buffer [kafka, rskafka, file]
    * --write-buffer-addr: write buffer endpoint address

The server uses these config options to initialise the appropriate write
buffer backend, and configure the TableNamespaceSharder to shard
operations over the set of sequencers exposed by the write buffer.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-26 10:49:34 +00:00
Dom b846ead320
feat(router2): shard writes/deletes into write buffer (#3499)
* feat: Sequencer wrapper

This type wraps an underlying WriteBufferWriter implementation, tagging
it with a sequencer ID it should use when enqueuing operations to the
buffer.

* feat: mock sharder

Implements a mock Sharder impl that returns pre-configured responses to
shard(), and captures the input to the call.

* feat: sharded write buffer

Implements sharding of ops into an underlying WriteBuffer.

Writes are sharded by some abstract Sharder impl, collated per shard to
maximise the size of each op (and therefore compression efficiency),
converted into a DML operation and then enqueued in parallel to the
underlying WriteBuffer implementation.

Deletes are modelled as being mapped to a single write buffer shard,
which is the case while we support sharding based on the table &
namespace only. Deletes will be extended to support (potentially)
multiple shards when column overrides are implemented.

* refactor: runtime write buffers

Switch from using static dispatch, to using a runtime specified
WriteBufferWriting implementation.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-25 15:19:48 +00:00
Dom d63b35d2b5 refactor: remove redundant T: Ord bounds 2022-01-20 12:01:04 +00:00
Dom d710ea48e1 test: hash bucket fixture test
Ensures mapping key K to bucket B remains stable.
2022-01-20 11:35:13 +00:00
Dom 36d50d083f feat: sharder trait & impl
This commit defines the Sharder trait that should allow us to implement
multiple sharding strategies over a defined set of input types (such as
a MutableBatch for writes, DeletePredicate for deletes, etc).

This commit also includes a jump hash implementation that consistently
shards (table name, namespace) tuples to a given shard for all input
types.
2022-01-20 11:10:37 +00:00
Dom 3122aec71a refactor: use static DatabaseName instances 2022-01-20 11:10:37 +00:00
Dom 2cd063698f refactor: API agnostic DML delete handler
Changes the DmlHandler::delete() trait method to accept required params,
and accept a DeletePredicate instead of a HttpDeleteRequest so that it
can be re-used in the gRPC handler.
2022-01-20 11:10:37 +00:00
Dom 6f2e10cab6 feat(router2): implement delete API handler
Adds support for the HTTP v2 delete API endpoint.
2022-01-17 14:57:36 +00:00
Dom 1b7369e743 docs: fix broken doc link 2022-01-17 11:57:32 +00:00
Dom 7badf37250 refactor: db_name -> namespace
Renames all "database name" references to "namespace".
2022-01-17 11:57:32 +00:00
Dom 885c831aff refactor: avoid constructing DmlOperation
Instead of converting the set of MutableBatches into a DmlOperation to
shard into more DmlOperation instances, the sharder can operate directly
on the MutableBatches.
2022-01-17 11:57:32 +00:00
Dom 7f99d18dd1 refactor: clippy 2022-01-17 11:57:31 +00:00
Dom 40a290f6f7 feat: router2 HTTP handlers
Implements the HTTP v2 write API endpoint for router2.
2022-01-17 11:57:28 +00:00
Dom 80b12d417c feat: abstract DML handler
Defines the DmlHandler trait responsible for processing a request in
some abstract way, decoupling the HTTP/gRPC request handlers from the
underlying routing logic.
2022-01-17 11:56:04 +00:00
Andrew Lamb dd23056efd
chore: update datafusion, arrow, prost, tonic, pbjson, etc (#3455)
* chore: update datafusion, arrow, prost, tonic, etc

* fix: update pprof as well

* chore: update hakari

* fix: update pbjson

* chore: update heappy

* fix: hakari

* fix: workaround https://github.com/influxdata/influxdb_iox/issues/3458

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-13 17:07:15 +00:00
Dom b9bee7f735 build: update workspace-hack 2022-01-12 15:09:06 +00:00
Dom a8cb8755de feat: new router2 crate
This commit adds an almost-empty router2 crate containing enough of a
skeleton to plumb into the IOx CLI/server runner.
2022-01-12 14:43:10 +00:00