This hooks the wal buffer up to the server. When segments go over their max size, they will be persisted if that configuration was set. A follow on commit will add persistence based on time (how long the segment has been open).
Adds serialization with compression and checksum for WAL buffer segments.
This required a weird structure where the flatbuffer bytes of ReplicatedWrite were kept as a raw payload. I did this because otherwise each of the replicated writes would have been rebuilt in the segment.
The other thing that isn't ideal is that deserializing a segment actually marshals it into a Rust struct as opposed to keeping the entire thing as raw flatbuffers. We could update this later to have a concept of an open segment (regular rust stuct) and closed segments that are just the flatbuffers.
Adds a new endpoint /iox/api/v1/id that accepts a JSON object in the form:
{"id":42}
And calls into the server's set_id method to assign the writer ID to the server.
* fix: do not overwrite databases
Do not overwrite an existing database when attempting to create a DB with an
existing name.
This should probably update the existing database config, but for the time being
this prevents the existing database from being silently dropped.
Fixes#643
* test: ensure duplicate DB names returns an error
Covers #643 with a test case.
Now you have to designate whether you're adding a directory or a file
name, with some assumptions based on paths coming from a cloud object
storage or the file system.
A notable difference: checking to see if "apple/b" is a prefix of
"apple/bear/cow.json" will now say no; only whole directories are
matched.
* feat: add create and get database to API
This commit is start of the IOx specific API. It puts everything under /iox/api/v1 as this is the beginning of the IOx API. Creating a database is done with a PUT and a GET request can retrieve the DatabaseRules details.
* feat: add defaults for DatabaseRules for create_database
* feat: add create and get database to API
This commit is start of the IOx specific API. It puts everything under /iox/api/v1 as this is the beginning of the IOx API. Creating a database is done with a PUT and a GET request can retrieve the DatabaseRules details.
This pulls the different backing implmenetations into their own modules. They're about to get more complex so it felt like it was time to separate them out rather than building towards a single multi-thousand line lib.rs. The error type is only defined in lib and imported by the individual modules, which I think makes it easier to work with.
* refactor: Create database with mutable buffer, read buffer and parquet files
* docs: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: rename planners to clarify what they are
* refactor: simplify traits
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* feat: implement partition chunk rollover + ids and timestamps
* feat: add last_write_timestamp
* refactor: Use DateTime<Utc> rather than Instant
* refactor: avoid use of structure to generate ids
* refactor: Update docs, remove unused field
* refactor: rename partition -> chunk
* feat: Introduce new partition, which is a holder for Chunks
* refactor: Remove use of wal from mutable database
* refactor: cleanups, remove last direct use of chunks
* fix: delete old benchmarks
* fix: clippy sacrifice
* docs: tidy up comments
* refactor: remove unused error types
* chore: remove commented out tests
This moves the HTTP API over to Routerify, which has the basic route parsing logic that will enable the API design for IOx.
I had a little trouble with the error handling in Routerify so I ended up creating a macro for constructing error responses in the HTTP API. I'm not sure what I think of this pattern so I'm interested in what others think. Another option would be to have two functions for each API endpoint. One which is x_handler with a Routerify function signature. Then another which is just x that has the Result<Response<Body>, ApplicationError> return type, which would make using the ? operator work in those functions. That would eliminate the need for the return_err macro.
I'm happy to refactor to that if people prefer it.
Moves the server ID value outside of the mutex-wrapped `Config` type and
into the `Server` struct, synchronising threads using an atomic instead.
This prevents any calls to `require_id()` from having to contend for a
read lock on the config mutex while maintaining lineraisaion of ID
values - all threads see the change as soon as it is set.
On x86 this synchronisation is actually completely "free" - memory
accesses are totally ordered, so this turns into a compile fence to
prevent LLVM changing program order at compile time with no runtime
overhead at all.
Previously the ID value was wrote as part of the config, but this is no
longer the case. It should be easy to add back in, but it appears to be
redundant as the ID is required to construct the path to the config when
loading in the first place.
* chore: Connect server crate to http routes and server command
This updates the http_routes and main server to use the Server crate. This is the first step in a larger effort to start hooking up the initial IOx API and get things running end to end with in-memory database, WAL buffer, and object storage.
For the time being, this disables the previous disk based WAL. Or rather, it uses the WriteBufferDb without it. That means that this IOx server has no persistence until later. Because of this, the restart in the end-to-end was removed.
Later PRs will add the WAL buffer and restart logic that loads from object store. We can opt to bring the local disk based WAL back later, but it will likely require some refactoring to work with how the WAL Buffer will operate.
* feat: Implement write buffer to Parquet snapshotting
This introduces snapshot to the server packages to manage snapshotting. It also introduces a new trait for representing a Partition. There is a very crude API wired up in http_routes for testing purposes. Follow on work will bring the server package into http_routes and rework the snapshot API.
Previosuly the $ORG and $BUCKET was joined as:
$ORG + "_" + $BUCKET
Which is fine unless either $ORG or $BUCKET includes a "_", such as:
$ORG = "org_a"
$BUCKET = "bucket"
and
$ORG = "org"
$BUCKET = "a_bucket"
This change continues to join $ORG and $BUCKET with an underscore, but
disallows underscores in either $ORG or $BUCKET. It appears these values
are non-zero u64s in the gRPC protocol converted to their base-10 string
representations for the DB name, so this seems safe to enforce.
In addition, this change introduces a `DatabaseName` type to avoid
passing bare strings around, and allow consuming code to ensure only
valid database names are provided at compile type. This type works with
both owned & borrowed content so doesn't force a string copy where we
can avoid it, and derefs to `str` to make it easier to use with existing
code.
I've been minimally invasive in pushing the `DatabaseName` through the
existing code and figured I'd see what the sentement is first.
Candidates for conversion from `str` to `DatabaseName` that seem to make
sense to me include:
- `DatabaseStore` trait
- `RemoteServer` trait
- Others? Basically anywhere other than the "edge" API inputs
Fixes#436 (thanks @zeebo)
* chore(server): add logs for dropped WAL segments
Added logging for dropped writes and old segments in rollover scenarios
Also including a dep on tracing and dev-dep on test_helpers
Refs: #466
* chore(server): Add more context to logs
Minor cleanup around remove_oldest_segment usage
Suggestions from @alamb's review