Commit Graph

129 Commits (3f6bb3e330a8b6ba14dec6d30903ecca9d2c4620)

Author SHA1 Message Date
Dom Dwyer f40885d4ca
refactor(wal): remove needless async/await
Obtaining a rotation handle isn't async.
2022-12-01 16:03:52 +01:00
kodiakhq[bot] 76e500cb31
Merge branch 'main' into dom/wal-replay 2022-12-01 14:34:51 +00:00
Andrew Lamb 14a9bc92e9
Revert "Revert "chore: Update Datafusion and arrow/arrow-flight/parquet to `28.0.0` (#6279)" (#6294)" (#6296)
This reverts commit b7e52c0d8d.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-12-01 14:20:43 +00:00
Dom Dwyer cb248c75d5
feat(ingester2): WAL entry replay at startup
Replay the WAL log, if any, at startup.

Op replay is performed synchronously, during initialisation of the
ingester2 instance, and passes all ops through the "normal" write path
that the system uses once replay is complete, minus the WAL writer layer
- this helps to DRY the write path and minimise different behaviours.
2022-12-01 15:01:43 +01:00
Dom Dwyer 1be9ffb409
refactor(ingester2): cache more hot partitions
Now partition cache entries are smaller, the number of entries held in
memory can be increased - this now uses ~2MiB of memory and drains the
cache during execution, amortising to 0.
2022-12-01 13:45:19 +01:00
kodiakhq[bot] 047bcc6e7e
Merge branch 'main' into dom/remove-ordering 2022-12-01 12:37:54 +00:00
Andrew Lamb b7e52c0d8d
Revert "chore: Update Datafusion and arrow/arrow-flight/parquet to `28.0.0` (#6279)" (#6294)
This reverts commit 039a45ddd1.
2022-12-01 11:38:42 +00:00
Dom Dwyer 6639568665
refactor: remove unimplemented!()
This lets us keep the existing test coverage for the new impl.
2022-12-01 11:35:00 +01:00
Dom Dwyer 464242ebc6
refactor: remove ordering asserts, sequence ranges
This commit removes the invariant asserts of monotonicity carried over
from the "ingester" crate - ingester2 does not define any ordering of
writes within the system.

This commit also removes the SequenceNumberRange as it is no longer
useful to indirectly check the equality of two sets of ops -
non-monotonic writes means overlapping ranges does not guarantee a full
set of overlapping operations (gaps may be present). Likewise bounding
values (such as "min persisted") provide no useful meaning in an
out-of-order system.
2022-12-01 10:38:20 +01:00
Dom Dwyer 50d5e4a6f1
docs: arbitrarily reordering & sequence numbers
Document the arbitrary reordering of concurrent writes in an ingester,
and the potential divergence of WAL entries / buffered state.

Also documents that in ingester2, sequence numbers only identify writes,
not their ordering.
2022-12-01 10:38:20 +01:00
Dom Dwyer f34d4db994
docs: reference WAL cancellation safety ticket
Reference https://github.com/influxdata/influxdb_iox/issues/6281
2022-11-30 16:37:06 +01:00
Dom Dwyer d2a3d0920b
feat(ingester2): commit writes to write-ahead log
Adds WalSink, an implementation of the DmlSink trait that commits DML
operations to the write-ahead log before passing the DML op into the
decorated inner DmlSink chain.

By structuring the WAL logic as a decorator, the chain of inner DmlSink
handlers within it are only ever invoked after the WAL commit has
successfully completed, which keeps all the WAL commit code in a
singly-responsible component. This also lets us layer on the WAL commit
logic to the DML sink chain after replaying any existing WAL files,
avoiding a circular WAL mess.

The application-logic level WAL code abstracts over the underlying WAL
implementation & codec through the WalAppender trait. This decouples the
business logic from the WAL implementation both for testing purposes,
and for trying different WAL implementations in the future.
2022-11-30 16:37:05 +01:00
Andrew Lamb 039a45ddd1
chore: Update Datafusion and arrow/arrow-flight/parquet to `28.0.0` (#6279)
* chore: Update Datafusion and arrow/arrow-flight/parquet to `28.0.0`

* chore: Update thrift to 0.17

* fix: use workspace arrow-flight in ingester2

* chore: Update for API changes

* fix: test

* chore: Update hakari

* chore: Update hakari again

* chore: Update trace_exporters to latest thrift

* fix: update test

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-30 14:12:30 +00:00
Dom Dwyer 5f1635bfc5
feat(ingester2): sequence write ops
Sequence all gRPC write requests to (internally) order the resulting DML
operations.

These sequence numbers are assigned from a timestamp oracle and passed
through to the downstream DmlSink implementers.
2022-11-30 10:40:26 +01:00
Dom Dwyer ace4b7f669
feat: operation timestamp sequencer
Adds a TimestampOracle to provide an ingester-internal ordering to
incoming DmlOperations using a logical clock.
2022-11-30 10:40:22 +01:00
Dom Dwyer 9648207f01
feat(ingester2): initialise an ingester2 instance
Adds a public constructor to initialise an ingester2 instance.
2022-11-29 17:05:42 +01:00
Dom 0b1449e908
Merge branch 'main' into dom/buffer-tree-query 2022-11-29 15:06:41 +00:00
Andrew Lamb fc520e0c0f
refactor: Remove unecessary optimize_record_batch (#6262)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-29 13:35:46 +00:00
Dom Dwyer 95216055d8
perf(ingester2): stream BufferTree partition data
This commit implements the QueryExec trait for the BufferTree, allow it
to be queried for the partition data it contains. With this change, the
BufferTree now provides "read your writes" functionality.

Notably the implementation streams the contents of individual partitions
to the caller on demand (pull-based execution), deferring acquiring the
partition lock until actually necessary and minimising the duration of
time a strong reference to a specific RecordBatch is held in order to
minimise the memory overhead.

During query execution a client sees a consistent snapshot of
partitions: once a client begins streaming the query response, incoming
writes that create new partitions do not become visible. However
incoming writes to an existing partition that forms part of the snapshot
set become visible iff they are ordered before the acquisition of the
partition lock when streaming that partition data to the client.
2022-11-29 12:01:47 +01:00
Dom Dwyer de6f0468d8
refactor: associated QueryExec return type
Allow the return type of the QueryExec trait's query_exec() method to be
parametrised by the implementer.

This allows the trait to be reused across different data sources that
return differing concrete types.
2022-11-29 12:01:36 +01:00
Dom Dwyer 1a379f5f16
perf(ingester2): streaming query data sourcing
Changes the query code (taken from the ingester crate) to stream data
for query execution, tidy up unnecessary Result types and removing
unnecessary indirection/boxing.

Previously the query data sourcing would collect the set of RecordBatch
for a query response during execution, prior to sending the data to the
caller. Any data that was dropped or modified during this time meant the
underlying ref-counted data could not be released from memory until all
outstanding queries referencing it completed. When faced with multiple
concurrent queries and ongoing ingest, this meant multiple copies of
data could be held in memory at any one time.

After this commit, data is streamed to the user, minimising the duration
of time a reference to specific partition data is held, and therefore
eliminating the memory overhead of holding onto all the data necessary
for a query for as long as the client takes to read the data.

When combined with an upcoming PR to stream RecordBatch out of the
BufferTree, this should provide performant query execution with minimal
memory overhead, even for a maliciously slow reading client.
2022-11-29 11:08:06 +01:00
Dom Dwyer 2ed9780f6b
refactor(ingester2): explicit PartitionStream type
Simplify the streaming types by introducing explicitly named wrappers to
improve visibility.
2022-11-29 11:08:02 +01:00
dependabot[bot] b5aa39db4b
chore(deps): Bump tonic from 0.8.2 to 0.8.3 (#6249)
Bumps [tonic](https://github.com/hyperium/tonic) from 0.8.2 to 0.8.3.
- [Release notes](https://github.com/hyperium/tonic/releases)
- [Changelog](https://github.com/hyperium/tonic/blob/master/CHANGELOG.md)
- [Commits](https://github.com/hyperium/tonic/compare/v0.8.2...v0.8.3)

---
updated-dependencies:
- dependency-name: tonic
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-29 09:53:30 +00:00
Dom Dwyer 443ec49f24
feat: query exec tracing spans
Implement a QueryExec decorator that emits named tracing spans covering
the inner delegate's query_exec() execution.

Captures the result, emitting the error string in the span on failure.
2022-11-28 13:29:18 +01:00
Dom Dwyer a54326d1ae
refactor: rename Response -> QueryResponse
Both more descriptive, and less conflict-y! This seems like a more
sensible name for a system with many Response's.
2022-11-28 13:29:18 +01:00
Dom Dwyer a0ab78298f
feat(ingester2): gRPC methods & type-erased init
This commit implements the gRPC direct-write RPC interface (largely
copied from the ingester crate), and adds a much improved RPC query
handler.

Compared to the ingester crate, the query API is now split into two
defined halves - the API handler side, and types necessary to support it
(server/grpc/query.rs) and the Ingester query execution side (a stub in
query/exec.rs). These two halves maintain a separation of concerns, and
are interfaced by an abstract QueryExec trait (in query/trait.rs).

I also added the catalog RPC interface as it is currently exposed on the
ingester, though I am unsure if it is used by anything.

This commit also introduces the "init" module, and the
IngesterRpcInterface trait within it. This trait forms the public
ingester2 crate API, defining the complete set of methods external
crates can expect to utilise in a stable, unchanging and decoupled way.

The IngesterRpcInterface trait also serves as a method of type-erasure
on the underlying handler implementations, avoiding the need to
expose/pub the types, abstractions, and internal implementation details
of the ingester to external crates.
2022-11-25 12:40:01 +01:00
Dom Dwyer 522ae6c2a3
docs: document possible FSM panic
Document a possible panic if the data within a partition FSM cannot be
converted to an Arrow RecordBatch.
2022-11-25 10:46:44 +01:00
CircleCI[bot] 44eeab7e2b chore: Run cargo hakari tasks 2022-11-24 14:51:21 +00:00
Dom Dwyer a66fc0b645
feat(ingester): ingester2 init
Adds an ingester2 crate to hold the MVP of the Kafkaless project.

This was necessary due to the tight coupling of the ingester internals
with tests in external crates, and eases the parallel development of two
version of the ingester.

This commit contains various changes from the "ingester" crate, mostly
removing the concept/references to a "shard" or "ShardId" where
possible.

This commit does not copy over all of the "ingester" crate - only those
components that are definitely needed. I will drag across more as
functionality is implemented.
2022-11-24 15:34:02 +01:00