This commit introduces 4 new types in the paths module for the
influxdb3_write crate. They are:
- ParquetFilePath
- CatalogFilePath
- SegmentInfoFilePath
- SegmentWalFilePath
Each of these corresponds to an object store path and for the WAL file an
on disk path that we can use to address the needed files in a consistent way
and not need to have path construction be duplicated to address these files.
These types also Deref/AsRef to the object_store::path::Path type (or the
std::path::Path type for the Wal) so that they can be used in places that
expect the type such as various object_store/std::fs and so that we can use
the underlying type's methods without needing to implement them for each
type as they are just a thin wrapper around those types.
This commit adds some tests to make sure that the path construction
works as intended and also updates the `wal.rs` file to use the new
`SegmentWalFilePath` instead of just a `PathBuf`.
Closes: #24578
* feat: add basic wal implementation for Edge
This WAL implementation uses some of the code from the wal crate, but departs pretty significantly from it in many ways. For now it uses simple JSON encoding for the serialized ops, but we may want to switch that to Protobuf at some point in the future. This version of the wal doesn't have its own buffering. That will be implemented higher up in the BufferImpl, which will use the wal and SegmentWriter to make data in the buffer durable.
The write flow will be that writes will come into the buffer and validate/update against an in memory Catalog. Once validated, writes will get buffered up in memory and then flushed into the WAL periodically (likely every 10-20ms). After being flushed to the wal, the entire batch of writes will be put into the in memory queryable buffer. After that responses will be sent back to the clients. This should reduce the write lock pressure on the in-memory buffer considerably.
In this PR:
- Update the Wal, WalSegmentWriter, and WalSegmentReader traits to line up with new design/understanding
- Implement wal (mainly just a way to identify segment files in a directory)
- Implement WalSegmentWriter (write header, op batch with crc, and track sequence number in segment, re-open existing file)
- Implement WalSegmentReader
* refactor: make Wal return impl reader/writer
* refactor: clean up wal segment open
* fix: WriteBuffer and Wal usage
Turn wal and write buffer references into a concrete type, rather than dyn.
* fix: have wal loading ignore invalid files
We currently don't need or want to deploy influxdb as we're still
building out the Edge product. Maybe later for a demo, but for now it
just breaks CI and so this removes it.
This commit changes the circle-ci config to use influxdb3 rather than
iox in our ci config script as the repo is influxdb not influxdb_iox.
While we could probably strip out a lot more here as a first attempt to
get this to build release images and push them on main this will do just
fine.
Now that we're transitioning the repo code to have influxdb3 Edge not
IOx be what's here, we can update the Dockerfile to build influxdb3.
This is mostly just updating which version of Rust to use, changing the
command that's run when docker runs the container to serve, and changing
influxdb_iox to influxdb3 everywhere in the file.
* fix: build, upgrade rustc, and deps
This commit upgrades Rust to 1.75.0, the latest release. We also
upgraded our dependencies to stay up to date and to clear out any
uneeded deps from the lockfile. In order to make sure everything works
this also fixes the build by upgrading the workspace-hack crate using
cargo hikari and removing the `workspace.lint` that was in
influxdb3_write that didn't need to be there, probably from a merge
issue.
With this we can build influxdb3 as our default on main, but this alone
is not enough to fix CI and will be addressed in future commits.
* fix: warnings for influxdb3 build
This commit fixes the warnings emitted by `cargo build` when compiling
influxdb3. Mainly it adds needed lifetimes and removes uneccesary
imports and functions calls.
* fix: all of the clippy lints
This for the most part just applies suggested fixes by clippy with a few
exceptions:
- Generated type crates had additional allows added since we can't
control what code gets made
- Things that couldn't be automatically fixed were done so manually in
particular adding a Send bound for traits that created a Future that
should be Send
We also had to fix a build issue by adding a feature for tokio-compat
due to the upgrade of deps. The workspace crate was updated accordingly.
* fix: failing test due to rust panic message change
Inbetween rustc 1.72 and rustc 1.75 the way that error messages were
displayed when panicing changed. One of our tests depended on the output
of that behavior and this commit updates the error message to the new
form so that tests will pass.
* fix: broken cargo doc link
* fix: cargo formatting run
* fix: add workspace-hack to influxdb3 crates
This was the last change needed to make sure that the workspace-hack
crate CI lint would pass.
* fix: remove tests that can not run anymore
We removed iox code from this code base and as a result some tests
cannot be run anymore and so this commit removes them from the code base
so that we can get a green build.
* WIP: basic influxdb3 command and http server
* WIP: write lp, buffer, query out
* WIP: test write & query on influxdb3_server, fix warnings
* WIP: pull write buffer and catalog into separate crate
* WIP: sketch out types used for write: buffer, wal, persister
* WIP: remove a bunch of old IOx stuff and fmt
Before switching to rust-based IOx, influxdb was a Go project which
dependabot tracked. After the switch, dependabot would issue alerts for
go files that no longer exist. Tell dependabot to ignore "gomod"
- Extract some shared values
- Remove an unneeded Arc::clone
- Change expects that don't provide much clarity to unwraps
- Give the test a more distinctive and less redundant name
Isolate the actual client from the query planning parts
(`Ingester{Chunk,Partition}`) so we can hook up the V2 client in #8350.
The PR looks large, but it just moves code around and decouples the
error handling.
Adds initialisation code to the routers to instantiate an
AntiEntropyActor, pre-populate the Merkle Search Tree during schema
warmup, and maintain it at runtime.
Allow an owned, compact content summary snapshot of the merkle search
tree state to be read from the MST actor.
This snapshot describes the structure of the MST in a compact/efficient
representation suitable for exchanging over the network between peers.