Since adding new features like "sequencer replay" or init retries would
make the current code too complex, a refactor is required:
Config:
The config struct now holds a `DatabaseState` which is a simple linear
state machine representing the different stages of the database init.
Init:
The init module now has a fixpoint-loop which looks at the state,
decides what to do based on it and repeats until either the DB is
initialized or an error occured. This also makes it easier to continue
the init process "in the middle", e.g. when the preserved catalog is
broken or the sequencer (e.g. Kafka) could not be reached.
When a DB is created AFTER the server is initialized, then we can assume
it is a new DB (because the rules file did not exist beforehand). We
shall treat it as a new DB with no data and should not try to load some
leftover / stale / whatever preserved catalog for it. How this catalog
came into existence we do not know and it was certainly not properly
managed by IOx. So we error if there is a catalog.
Furthermore the old implementation was kinda broken since it loaded the
perserved catalog "in-sync" with the gRPC call that issued the DB
creation (we only have a delayed init concept for DBs that are loaded on
instance startup). In production that would very likely provoke nasty
timeouts.
On top of that this new behavior will also be somewhat more sane when we
think about sequencer (e.g. Kafka) replays. We certainly do not wanna do
any replays for newly created DBs.
TLDR: New behavior for DBs created via gRPC is "new empty DB". This does
NOT affect DBs loaded on instance startup (aka existing DBs).
* fix: Do not sequence local writes
* fix: Update server/src/db.rs
Co-authored-by: Edd Robinson <me@edd.io>
* fix: review comments
* fix: restore passing sequence information down to mutable buffer
* fix: store min/max times even when there are no sequence numbers
Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Instead of (ab)using the transaction lock to prevent the cleanup job
from removing just-written parquet files, use a dedicated lock. This
will later allow us to write parquet files before starting a transaction
(i.e. w/o holding the transaction lock).
This will help with #1821.
This brings the persistence windows into the catalog partition. It adds a helper method on TableBatch to get the min and max times for a given write. Finally, it adds this logic to the db to update persistence windows on every write while the partition write lock is being held.
Change schema from
```text
<server_id>/<db_name>/data/<part_key>/<chunk_id>/<table_name>.parquet
```
to
```text
<server_id>/<db_name>/data/<table_name>/<part_key>/<chunk_id>.<uuid>.parquet
```
So parquet files will NEVER be overwritten. This is especially helpful
when dealing with old catalog leftovers (i.e. a parquet file that
belonged to an old but wiped catalog). It also simplifies the reasoning
about file references in the future and follows what other dataset
formats are usually doing (i.e. never replace files).
Also use `ChunkAddr` where it makes sense.
Don't mix commit+checkpoint in a single call so that the caller has to
reason about the error type and which of the two operations has failed.
Splitting it also makes it easier to create the correct checkpoint data.