This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).
I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!
https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html
This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
The old tests used partial error string matching, with the whole error
message! So when I added more to the error message, the fixture tests
didn't fail.
This changes the tests to match the full string, and validate the
timestamps are included.
Include the minimum acceptable timestamp (the retention bound) and the
observed timestamp that exceeds this bound in the retention enforcement
write error response.
The replication flag defines the total number of copies of each write -
slightly less confusing than the additional copies it was previously,
and matches with the actual code.
Changes the UpstreamSnapshot to be suitable for concurrent use. This
type contains the core logic to enable a caller to uphold the
responsibility of ensuring replicated writes land on distinct ingesters
in the presence of concurrent replication.
The clients within the snapshot are returned to at most one concurrent
caller at a time, by tracking the state of each client as a FSM:
┌────────────────┐
┌─▶│ Available │
│ └────────────────┘
│ │
drop next()
│ │
│ ▼
│ ┌────────────────┐
└──│ Yielded │
└────────────────┘
│
remove
│
▼
┌────────────────┐
│ Used │
└────────────────┘
Once a client has been yielded it will not be yielded again until it is
dropped (transitioning the FSM from "yielded" to "available" again,
returning it to the candidate pool of clients) or removed (transitioning
to "used", permanently preventing it from being yielded to another
caller).
Changes then UpstreamSnapshot to return owned clients, instead of
references to those clients.
This will allow the snapshot to have a 'static lifetime, suitable for
use across tasks.
Because the number of candidate upstreams is checked to exceed the
number of desired data copies before starting the write loop, and
because the parallelism of the write loop matches the number of desired
data copies, it's not possible for any thread to observe an empty
snapshot.
This commit removes the unreachable error condition for clarity.
Adds a property-based test of the RPC write handler's replication logic,
ensuring:
1. If the number of healthy upstreams is 0, NoHealthyUpstreams is
returned and no requests are attempted.
2. Given N healthy upstreams (> 0) and a replication factor of R:
if N < R, "not enough replicas" is returned and no requests are
attempted.
3. Upstreams that return an error are retried until the entire
write succeeds or times out.
4. Writes are replicated to R distinct upstreams successfully, or
an error is returned.
5. One an upstream write is ack'd as successful, it is never
requested again.
6. An upstream reporting as unhealthy at the start of the write is
never requested (excluding probe requests).
These properties describe a mixture of invariants (don't replicate your
two copies of a write to the same ingester) and expected behaviour of
the replication logic (optimisations like "don't try writes when you
already know they'll fail").
This passes for the single-threaded replication logic used at the time
of this commit, and will be used to validate correctness of a concurrent
replication implementation - a concurrent approach should uphold these
properties the same way a single-threaded implementation does.
Renames NoUpstreams -> NoHealthyUpstreams as it's confusing because we
also have "not enough replicas" which could be no upstreams? This has a
slightly clearer meaning.
Refactors the From<BtreeMap> impl that accepted a &str name for
ColumnsByName construction, instead allowing only the owned String, and
updating the test that makes use of it appropriately.
So that the different kinds aren't mixed up. Also extracts the logic
having to do with which template takes precedence onto the
PartitionTemplate type itself.
This commit splits out the RPC-request-centric errors in RpcWriteError
into their own RpcWriteClientError type.
This improves the separation of concerns - an RpcWriteError comes from
the RPC write handler, whereas an RpcWriteClientError comes from an
underlying client. It's definitely less confusing!