Commit Graph

40 Commits (66035ada480fb4b7d0ef02ea087ee209b819aa24)

Author SHA1 Message Date
Dom Dwyer 0c0a38c484 refactor: more verbose shard reset logs
Adds a little more context to the "shard reset" logs.
2022-10-19 12:28:02 +02:00
Dom Dwyer c63312ce12 refactor: use histogram to record TTBR
Changes the TTBR metric from a gauge to a histogram so that observations
maintain a time dimension.
2022-10-18 16:29:09 +02:00
Luke Bond 475c8a0704
fix: only emit ttbr metric for applied ops (#5854)
* fix: only emit ttbr metric for applied ops

* fix: move DmlApplyAction to s/w accessible

* chore: test for skipped ingest; comments and log improvements

* fix: fixed ingester test re skipping write

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-14 12:06:49 +00:00
Dom Dwyer 9c40d80032 refactor(ingester): log shard_id in op result
Include the shard ID in the op apply result to correlate it with other
log messages.
2022-10-13 15:41:48 +02:00
Dom Dwyer dbcbb5b824 refactor: include sequence numbers in apply() logs
Include the op sequence number in the error/success apply() log
messages.
2022-10-13 14:19:02 +02:00
Dom Dwyer c4f542bbe2 refactor(ingester): remove tombstone support
This commit removes tombstone support from the ingester, and deletes
associated code/helpers/tests. This commit does NOT remove tombstone
support from any other service, but MAY include removing overlapping
test coverage.

This also removes the tombstone support from the Ingester -> Querier RPC
response message.

This has the nice side effect of removing a whole lot of thread spawning
in the ingester tests for the Executor, speeding everything up!
2022-10-11 13:10:04 +02:00
Luke Bond fda1479db0
chore: add trace log to ingester to aid debugging (#5829) 2022-10-11 10:33:42 +00:00
Dom Dwyer 5f2f735c7e fix: spurious watermark < read offset panic
In staging we observed an ingester panic due to the write buffer stream
yielding an WriteBufferErrorKind::SequenceNumberAfterWatermark,
suggesting the ingester was attempting to read from an offset that
exceeds the current max write offset in Kafka (high watermark offset).

This turned out not to be the case - the partition had a single write at
offset 2, and the ingester was attempting to seek to offset 1. The first
read would fail (offset 1 does not exist) and the error handling did not
account for the high watermark not being correctly set (-1 in the
response).

I have no idea why rskafka returns this watermark / doesn't retry / etc
but this change will allow the ingesters to recover.
2022-09-28 15:22:34 +02:00
Dom Dwyer b873297fad refactor(ingester): limit visibility
Marks many internal data structures as non-pub.

Many remain as they're used across tests / from multiple callers
"peeking", but this limits the scope of false sharing in the future.
2022-09-27 14:27:32 +02:00
Dom Dwyer ee8cdb48af style(ingester): fmt imports & long strings
Rewrite the imports to be a consistent order; std, external, crate and
merge all crate-level imports into one use statement.
2022-09-14 14:20:19 +02:00
Dom Dwyer 2a19606456 feat(ingester): restrict partition row count
This limit restricts a single partition to containing at most N rows
before it is marked for persistence (note: being marked for persistence
does not currently prevent further ingest for that partition.)
2022-08-31 15:48:18 +02:00
Carol (Nichols || Goulding) dbd27f648f
refactor: Rename more mentions of Kafka to their other name where appropriate 2022-08-29 14:27:02 -04:00
Carol (Nichols || Goulding) 74c9529062
fix: Rename KafkaPartition to ShardIndex 2022-08-29 14:07:18 -04:00
Marco Neumann 6b8b922fe7
fix: do not loose data when Kafka reports that offset is above watermark (#5322)
* fix: do not loose data when Kafka reports that offset is above watermark

This can happen in certain cluster rebalance settings.

This is also linked to https://github.com/influxdata/rskafka/issues/147
but for the upstream issue I currently have no idea how to fix it, so
let's at least harden IOx against it.

Fixes #5128.

* refactor: panic for `SequenceNumberAfterWatermark`
2022-08-11 07:32:04 +00:00
Marco Neumann 9fbc95c3ad
feat: add sequencer reset count metric and log to ingester (#5286)
Split out from #5253.

Helps with #5128.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-03 13:00:36 +00:00
Andrew Lamb 8f5210ea3e
test: add test for "duration since production" in kafka `write_buffer` implementation (#5043)
* test: add test for timestamps in kafka write buffer

* refactor: move timestamp batching test to generic tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-07 10:27:27 +00:00
Markus Westerlind edf3f08e81 refactor: Replace all uses of lazy_static with once_cell
Went through and remove all lazy_static uses with once_cell (while waiting for the project to compile). There are still dependencies using lazy_static so it is still in the crate graph but at least there isn't an explicit dependency on it (and it is easier to update to `std::lazy::Lazy` once that is stable).
2022-06-29 16:22:02 +02:00
Dom Dwyer 75a3fd5e1e refactor: use propagated partition key in ingester
Changes the ingester to use the partition key derived in the router, and
transmitted over through the kafka API boundary.

This should have no observable behavioural change, but be more resilient
as we're no longer assuming the partitioning algorithm produces the same
value in both the router (where data is partitioned) and the ingester
(where data is persisted, segregated by partition key).

This is a pre-requisite to allowing the user to specify partitioning
schemes.
2022-06-21 15:57:30 +01:00
Andrew Lamb 74f4006580
fix(ingester): make ingester metrics start with `ingester` (#4870)
* fix(ingester): make ingester metrics start with `ingester`

* fix: Update ingester/src/stream_handler/handler.rs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:46:37 +00:00
Dom Dwyer 4df2964566 refactor: store PartitionKey in DmlWrite
Carry the PartitionKey in the DmlWrite, allowing the batch to be
associated with a specific partition key.
2022-06-15 15:48:54 +01:00
kodiakhq[bot] dd8d44e24f
Merge branch 'main' into cn/duration 2022-06-10 14:23:09 +00:00
Andrew Lamb 50697906b1
refactor: Make `DMLWrite::sequence_number` a `SequenceNumber` (#4817) 2022-06-09 19:36:37 +00:00
Carol (Nichols || Goulding) 1c7cbaf5ae
refactor: Use DurationHistogram in more places 2022-06-09 14:20:51 -04:00
Andrew Lamb dde3c3922c
refactor: use consistent spelling of serialize (#4717) 2022-05-27 14:42:59 +00:00
Carol (Nichols || Goulding) 6ce6a38094
fix: Make metric names potentially less confusing 2022-05-25 10:04:39 -04:00
Carol (Nichols || Goulding) 05bd9de4d3
test: Add a test for the sequence number skipping metric
Ok, so... this needed lots of... channels. Channels everywhere.

The stream method on TestWriteBufferStreamHandler previously assumed it
would only be called once. In a test where reset_to_earliest is called,
stream might be called again to get the reset stream.

We want to be able to control which of the streams gets which
operations, so that's why the macro now takes a vec of vec of
operations-- one vec of operations per expected call to stream, and the
stream will send all the operations in its vec.

The test thread needs to wait for the handler stream to consume the last
item from the last receiver stream, so when the
TestWriteBufferStreamHandler has set up the last expected call to
stream, pass back the last transmitter and have it wait until it's at
full expected capacity (which means all operations have been consumed by
the receiver).
2022-05-20 20:50:02 -04:00
Carol (Nichols || Goulding) bda231051a
feat: Record metrics when resetting the write buffer and skipping sequence numbers 2022-05-20 20:48:17 -04:00
Carol (Nichols || Goulding) bcbf7b4f46
refactor: Move error handling logic to be all together 2022-05-20 20:48:17 -04:00
Carol (Nichols || Goulding) ab72c93a5e
docs: Updating wrapping, content, and grammar of comments 2022-05-20 10:51:07 -04:00
Carol (Nichols || Goulding) c811bebdb7
feat: Add ingester CLI option to skip to oldest available WB seq num
The default behavior of the ingester is to panic if the min unpersisted
sequence number in the catalog is unknown to the write buffer due to the
retention policies having evicted that sequence number.

Specifying `--skip-to-oldest-available` changes this behavior to skip to
the oldest sequence number the write buffer does have available and go
from there.

Fixes #4624.
2022-05-20 10:51:07 -04:00
Carol (Nichols || Goulding) b3f97bdb9d
test: Capture existing behavior for unknown sequence number 2022-05-20 10:51:06 -04:00
Dom Dwyer 7f3473e19f refactor(ingester): emit per-op debugging info
Emit a TRACE level log containing the op offset & other helpful fields.

This will allow us to identify which messages were last successfully
decoded, and which caused errors so we can pull them from analysis.
2022-05-11 16:35:35 +01:00
Carol (Nichols || Goulding) 068096e7e1
fix: Rename data_types2 to data_types 2022-05-06 14:45:39 -04:00
Marco Neumann bd600bbac6
refactor: allow ingester to be integrated into query tests (#4427)
* refactor: improve `IngesterData` public interface

* feat: impl `Debug` for `Test{Namespace,Sequencer}`

* refactor: trait interface for `LifecyleHandle`

This is required to mock the lifecycle for query tests.

* refactor: trait for partitioner
2022-04-26 13:44:30 +00:00
二手掉包工程师 4b47d723b1
refactor: Rename time to iox_time (#4416)
Signed-off-by: hi-rustin <rustin.liu@gmail.com>

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-26 00:19:59 +00:00
Dom Dwyer 71a278ac7e refactor: accept !Sync write buffer streams
Removes the Sync bound SequencedStreamHandler input stream type, as the
BoxStream returned by the WriteBufferStreamHandler is not Sync.

This change means the SequencedStreamHandler is not Sync either, but is
still Send and therefore can be moved into tokio tasks.
2022-04-08 11:28:39 +01:00
Dom Dwyer aaa677dec8 docs: describe graceful shutdown behaviour 2022-04-05 11:31:55 +01:00
Dom Dwyer 8edefc415d refactor: rename ttbr -> write_time in tests 2022-04-05 11:31:55 +01:00
Dom Dwyer f15275cf96 feat: expose ingest sequencer errors
Instruments the SequencedStreamHandler with a series of new metrics that
record the various error classes observable in the stream handler.

These metrics are labelled with potential_data_loss=true where relevant
to surface potential data loss events for alerting & further review.
2022-04-05 11:31:55 +01:00
Dom Dwyer 083ff1f8e3 refactor: ingest stream handler
Refactors the stream_in_sequenced_entries() into a new impl in the
SequencedStreamHandler type, decoupling the reading / decoding of ops
from Kafka (and associated error handling) from the "what happens to
those ops" concern to ease testing, encapsulate the specifics of "how to
get an op" and improve flexibility.

This is intended to provide robust error handling within what is
reasonably possible (unexpected errors are always unexpected!) while
retaining the existing metrics and functionality. I've also separated
out code that exists in the current impl specifically to drive tests
from the prod code path, instead driving those behaviours through mocks.

As of this commit, the handler is not used - this commit simply adds the
new impl.
2022-04-05 11:31:54 +01:00