This adds new implementations of QueryChunk for Enterprise:
- `ReplicaBufferChunk` (includes the node-id)
- `ReplicParquetChunk` (includes the node-id)
- `CompactedParquetChunk`
These are logged with a `trace!` when chunks are got from the buffer.
* simplify FieldValue types by making load generator functions should be
generic over RngCore and passing the RNG in to methods rather than
depending on it being available on every type instance that needs it
* expose influxdb3_load_generator as library crate
* export config, spec, and measurement types publicly to suppore use in
the antithesis-e2e crate
* fix bug that surfaced whenever the cardinality value was less than the
lines per sample value by forcing LP lines in a set of samples to be
distinct from one another with nanosecond increments
The compaction producer runs a background job to pick up new ingest
nodes that have been registered in the cluster.
A test was added to check that this works as expected.
refactor: change to Arc<str> for node snapshot marker
This adds a sleep so that the parquet cache has a little bit of time to
populate before we make another request to the query buffer. Sometimes
it does not populate and so we have a race condition where the new
request comes in and actually goes to object store. This is fine in
practice because it would also take time to fill the cache in production
as well. I haven't really seen the test fail since adding this, but
triggering it in the first place is really hard and in practice does not
happen all that often.
When starting up a new cluster in Enterprise we might have multiple
nodes starting at the same time. We might have an issue wherby we have
multiple catalogs with different UUIDs in their in memory
representation.
For example:
- Let's say we have node0 and node1
- node0 and node1 start at the same time and both check object storage
to see if there is a catalog to load
- They both see there is no catalog
- They both create a new one by generating a UUID and persisting it to
object storage
- Whichever is written second is now the one with the correct UUID in
their in memory representation while the other will not have the
correct one until restarted likely
This in practice isn't an issue today as Trevor notes in
https://github.com/influxdata/influxdb_pro/issues/600, but it could be
once we start using `--cluster-id` for licensing purposes. In order to
prevent this we instead make the write to object storage use the Put
mode. If it exists then the write will fail and the node that lost the
race will instead just load the other's catalog.
For example if node1 wins the race then node0 will load the catalog
created by node1 and use that UUID instead.
As this is hard to create a test for as it involves a race condition to
happen I have not included one as we could never really be sure it was
taken care of and we rely on the underlying object store we are writing
to to handle this for us. It's also not likely to happen given this is
only on a new cluster being initiated for the first time decreasing the
chances of it occurring in the first place.
The replication ttbr test was flaking due to timing issues. This fixes
the time providers used for the primary and replicated buffers in the
test, respectively, to guarantee the TTBR that is measured.
This removes the sleeps in some of the tests in last/disctinct cache
that were in place to wait for catalog updates to broadcast. These are
no longer needed because the catalog broadcast is ACK'd
This creates a CatalogUpdateMessage type that is used to send
CatalogUpdates; this type performs the send on the oneshot Sender so
that the consumer of the message does not need to do so.
Subscribers to the catalog get a CatalogSubscription, which uses the
CatalogUpdateMessage type to ACK the message broadcast from the catalog.
This means that catalog message broadcast can fail, but this commit does
not provide any means of rolling back a catalog update.
A test was added to check that it works.