This adds extra test coverage for the ingester's WAL replay & RPC write
paths, as well as the WAL E2E tests, to ensure that all sequence numbers
present in a WriteOperation/WalOperation are encoded and present when
decoded.
This commit asks the oracle for a new sequence number for each table
batch of a write operation (and thus each partition of a write) when
handling an RPC write operation before appending the operation to the
WAL. The ingester now honours the sequence numbers per-partition when
WAL replay is performed.
This commit removes the op-level sequence number from the proto
definition, now reading and writing solely to the per table (and thus
per partition) sequence number map. Tables/partitions within the same
write op are still assigned the same number for now, so there should be
no semantic different
This adds a little extra layer of type safety and should be optimised
by the compiler. This commit also makes sure the ingester's WAL sink
tests assert the behaviour for partitioned sequence numbering on an
operation that hits multiple tables & thus partitions.
Writes are partitioned before being placed in the buffer tree. This
has the effect of splitting up the persistence of a DmlWrite's contents
and thus the persistence of data referred to by write operations placed
into a single WAL entry for a write op.
This change associates the currently assigned sequence number
with every `TableId` in the write, so that persist events for a single
write can be tracked on a per table/partition level.
Making this partial change enables a transition period where changes
can be rolled back and WAL files can still be processed.
A future change will produce a new sequence number per table
ID.
Instead, the type responsible for initialising it handles namespaced
`Write` initialisation and management, as well as the failure paths that
may need handling. This commit introduces a `NamespaceDemultiplexer`
type with a generic implementation allowing fallible `async` lazy init
of any type from a given `NamespaceId`. This paves the way for catalog-aware
initialisation of `LineProtoWriter`s.
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).
I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!
https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html
This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
WAL read errors encountered by the new WAL WriteOpDecoder were being
discarded and presented as a "happy path" end of file to callers due
to a bug in handling a nested result type. This moves the test for
decoding a tail-corrupt WAL into the crate itself and asserts the
error is reported correctly.
Having a ginormous error enum returned for this method means that
the catch-all behaviour gets leaked into the error naming and
semantics of callers. The decoder is a new type and could benefit from
not adding to the existing error enum.
This adds a new crate with a type capable of converting
decoded WAL Write Op entries to line protocol and writing
the result to a namespaced destination. The wal crate
now exports a type which reads the sequenced wal ops and
decodes them as namespaced table batch writes.
There's no need to sub 1 from the batch length to shrink the buffer over
time - the capacity of the new batch will be the length of the last. A
large batch followed by a small batch will cause the pre-allocated next
batch to be small too.
Changes the WAL to maintain a SequenceNumberSet containing every ID
wrote to the currently open segment file.
The sets are derived from batched data for efficiency, rather than
recorded per write, to prevent any overhead in the hot path. The batch
set is merged with the file set off the hot path, in a separate I/O
thread (not the async runtime).
This change causes the WAL to pre-allocate the write batch buffer,
reducing the reallocations & copies that occur in the hot path (this
buffer can grow to be moderately large).
This should automatically size to the correct capacity and (slowly)
reduce buffer overrun.
Although not a problem in conventional usage, leaking this task prevents
the memory used by the wal (which can be substantial) from ever being
deallocated. In turn, this prevents the WAL writer I/O thread from
stopping too.
Eliminate buffer allocation (& growing) in the WAL file writer by
reusing a single buffer each time.
This implementation shrinks the buffer size down to 128KiB if it grows
above that amount to prevent one large write from consuming memory
forever more (128KiB should be plenty more than the common write size).
Each WAL entry is prepended by a two field header, followed by the
payload bytes. Previously a syscall was made for each header field, and
then another to write the payload bytes (or in reality, at least one
call is made).
This commit reduces the syscalls down to a single write call by building
the entire record in memory before calling write(). This adds 8 bytes to
the in-memory buffer size compared to prior to this commit.
This is effectively a reimplementation of a BufWriter but optimised for
our expected memory usage and (more importantly) capable of issuing the
fsync calls necessary for WAL durability.
Change the WAL buffer flusher to use a dedicated I/O thread instead of
performing serialisation & blocking file I/O on the async runtime
threads. This should reduce runtime blocking / latency variance on the
async threads.
The added overhead is 1 channel send, but this is per WAL batch of
writes (not per DML write, or worse, per file write). This impl also
amortises allocation of the serialisation buffer, rather than growing
one incrementally for each batch.
Deriving debug is highly encouraged so that Result::unwrap() and friends
can print the state of an object if it is causing a panic (it's
impossible to call unwrap() otherwise!)
The correctness of data checksumming is validated by the tests as a
reader property (corrupt checksum -> error), the actual value of the
checksum is irrelevant.
* feat: Updating to new services for all-in-one
* fix: Use correct shard id for ingester2
* fix: clippy
* fix: use wal directory
* fix: end to end tests
* fix: Update tracing cases for new ingest reality
* fix: update metrics test
* fix: Use rpc mode
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>