The interplay between mutable_linger_seconds, late_arrive_window and persist_age_threshold_seconds can be tricky to reason about. I realized that the lifecycle rules can be simplified by removing mutable_linger_seconds and instead using late_arrive_window_seconds for the same purpose. Semantically, they basically mean the same thing. We want to give data around this amount of time to arrive before the system persists it, which gives it more of an opportunity to persist non-overlapping data.
When a partition goes cold for writes, after we've waiting past this window, we should compact and persist that partition. This removes one unnecessary knob from the lifecycle configuration and also removes the potential for conflicting configuration options.
A database on one IOx server can, exclusively:
- Not interact with Kafka at all
- Send writes to Kafka
- Read writes from Kafka
Notably, a database on a particular server will never write *and* read from Kafka at the same time.
Instead of waiting for the server ID to be set and then mark the server
as errored, directly check the object store on startup. This is
important so that we fail fast when Istio isn't up and running yet.
Before this change we loaded databases eagerly when a serverID was
passed on startup BEFORE starting up the gRPC server. Since loading
(esp. at its current state without checkpoints and with too many small
parquet files) can take very long, K8s thinks IOx is unhealthy. With
this change we are now loading databases in the server background worker
once a serverID is available. Until then we block all DB-related
interactions including adding new databases (since without inspecting
the object store there is now way we can check if the DB already
exists).
Furthermore we now load database no matter if the serverID was passed on
startup (via CLI or environment variable) or was set later via gRPC
call. Before this change the latter case was somewhat forgotten.
This changes the hierarchy from
```
database -> partition -> chunk -> table
```
to
```
database -> partition -> table -> chunk
```
Only the high-level APIs are changed for now. The chunk states (like
MutableBuffer and ReadBuffer) still multiplex tables, although they will
always only get a single table assigned (or no table if no data was
presented yet).
Closes#1256.
Rationale
---------
We use `u32` throughout the codebase to reference for interned dictionary strings.
We also use `u32` for other reasons and it would be nice to get some help from the compiler
to avoid mixing them up
* feat: Rework Db to use Catalog for chunk state
* docs: Update server/src/db.rs
* fix: fmt
* fix: fmt
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Management API + CLI command to close a chunk and move to read buffer
* refactor: Less copy-pasta
* fix: track only once, use `let _` instead of `.ok()`
* docs: Apply suggestions from code review
fix comments ( 🤦♀️ for copy/pasta)
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* docs: Update server/src/lib.rs
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* refactor: Use DatabaseName rather than impl Into<String>
* fix: Fixup logical merge conflicts
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* refactor: pull Scenario code out of main module
* refactor: break out http into tests
* refactor: use random org_id and bucket_id
* refactor: port read_api to be indepndent
* refactor: port last test
* refactor: convenience methods to create different clients in end-to-end tests
* fix: improve comments
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
The replication, query, and subscription concepts here are going to be signficiantly different. Thought it would be best to just remove this cruft for now to avoid confusion.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>