There's no need to sub 1 from the batch length to shrink the buffer over
time - the capacity of the new batch will be the length of the last. A
large batch followed by a small batch will cause the pre-allocated next
batch to be small too.
Changes the WAL to maintain a SequenceNumberSet containing every ID
wrote to the currently open segment file.
The sets are derived from batched data for efficiency, rather than
recorded per write, to prevent any overhead in the hot path. The batch
set is merged with the file set off the hot path, in a separate I/O
thread (not the async runtime).
This change causes the WAL to pre-allocate the write batch buffer,
reducing the reallocations & copies that occur in the hot path (this
buffer can grow to be moderately large).
This should automatically size to the correct capacity and (slowly)
reduce buffer overrun.
Although not a problem in conventional usage, leaking this task prevents
the memory used by the wal (which can be substantial) from ever being
deallocated. In turn, this prevents the WAL writer I/O thread from
stopping too.
Eliminate buffer allocation (& growing) in the WAL file writer by
reusing a single buffer each time.
This implementation shrinks the buffer size down to 128KiB if it grows
above that amount to prevent one large write from consuming memory
forever more (128KiB should be plenty more than the common write size).
Each WAL entry is prepended by a two field header, followed by the
payload bytes. Previously a syscall was made for each header field, and
then another to write the payload bytes (or in reality, at least one
call is made).
This commit reduces the syscalls down to a single write call by building
the entire record in memory before calling write(). This adds 8 bytes to
the in-memory buffer size compared to prior to this commit.
This is effectively a reimplementation of a BufWriter but optimised for
our expected memory usage and (more importantly) capable of issuing the
fsync calls necessary for WAL durability.
Change the WAL buffer flusher to use a dedicated I/O thread instead of
performing serialisation & blocking file I/O on the async runtime
threads. This should reduce runtime blocking / latency variance on the
async threads.
The added overhead is 1 channel send, but this is per WAL batch of
writes (not per DML write, or worse, per file write). This impl also
amortises allocation of the serialisation buffer, rather than growing
one incrementally for each batch.
Deriving debug is highly encouraged so that Result::unwrap() and friends
can print the state of an object if it is causing a panic (it's
impossible to call unwrap() otherwise!)
The correctness of data checksumming is validated by the tests as a
reader property (corrupt checksum -> error), the actual value of the
checksum is irrelevant.
* feat: Updating to new services for all-in-one
* fix: Use correct shard id for ingester2
* fix: clippy
* fix: use wal directory
* fix: end to end tests
* fix: Update tracing cases for new ingest reality
* fix: update metrics test
* fix: Use rpc mode
---------
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
* feat: optimize wal with batching
Simplified the wal writer so that it batches up write operations. Currently it waits 10ms between fsync calls. We can pull this out to a config variable later if we want, but I think this is good enough for now.
Also updated the reader to be a more simple blocking reader without the extra tasks and channels as that wasn't really getting us anything that I know of.
* chore: cleanup wal code for PR feedback
Rather than naming WAL files with a UUID, give them a number that
indicates the order they were created in so that they can be read back
in order.
Fixes#6227.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>