ec943081c7
The motivations are: 1. The API uses a SINGLE predicate and adds that to many chunks. With `Arc<Vec<...>>` you gain nothing, with `Vec<Arc<...>>` the predicate is only stored once (in many vectors) 2. While we currently add predicates blindly to all chunks, we can be way smarter in the future and prune out tables, partitions or even single chunks (based on statistics). With that, it will be rare that many chunks share the exact same set of predicates. 3. It would be nice if we could de-duplicate predicates when writing them to the preserved catalog without needing to repeat the pruning discussed in point 2. This is way easier to implement whan chunks exists in `Arc`s. 4. As a side-note: the `Arc<Vec<...>>` wasn't really cloned around but instead was created many time. So the new version should be more memory efficient out of the box. |
||
---|---|---|
.. | ||
src | ||
Cargo.toml | ||
README.md |
README.md
IOx Query Layer
The IOx query layer is responsible for translating query requests from different query languages and planning and executing them against Chunks stored across various IOx storage systems.
Query Frontends
- SQL
- Storage gRPC
- Flux (possibly in the future)
- InfluxQL (possibly in the future)
- Others (possibly in the future)
Sources of Chunk data
- ReadBuffer
- MutableBuffer
- Parquet Files
- Others (possibly in the future, like Remote Chunk?)
The goal is to use the shared query / plan representation in order to avoid N*M combinations of language and Chunk source.
Thus query planning is implemented in terms of traits, and those traits are implemented by different chunk implementations.
Among other things, this means that this crate should not depend directly on the ReadBuffer or the MutableBuffer.
┌───────────────┐ ┌────────────────┐ ┌──────────────┐ ┌──────────────┐
│Mutable Buffer │ │ Read Buffer │ │Parquet Files │ ... │Future Source │
│ │ │ │ │ │ │ │
└───────────────┘ └────────────────┘ └──────────────┘ └──────────────┘
▲ ▲ ▲ ▲
└───────────────────┴─────────┬──────────┴─────────────────────┘
│
│
┌─────────────────────────────────┐
│ Shared Common │
│ Predicate, Plans, Execution │
└─────────────────────────────────┘
▲
│
│
┌──────────────────────┼─────────────────────────┐
│ │ │
│ │ │
│ │ │
┌───────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ SQL Frontend │ │ gRPC Storage │ ... │ Future Frontend │
│ │ │ Frontend │ │ (e.g. InfluxQL) │
└───────────────────┘ └──────────────────┘ └──────────────────┘
We are trying to avoid ending up with something like this:
┌─────────────────────────────────────────────────┐
│ │
▼ │
┌────────────┐ │
│Read Buffer │ ┌────────────────────────┤
┌──────────┼────────────┼─────┬────────────┼────────────────────────┤
│ └────────────┘ │ ▼ │
▼ ▲ │ ┌──────────────┐ │
┌───────────────┐ │ │ │Parquet Files │ │
│Mutable Buffer │ │ ├───▶│ │... │
│ │◀────────┼───────────┤ └──────────────┘ ┌─────────────┼┐
└───────────────┘ │ │ ▲ │Future Source││
▲ │ ├────────────┼─────────▶│ ││◀─┐
│ │ │ │ └─────────────┼┘ │
│ │ │ │ │ │
│ │ │ │ │ │
│ ┌──────────┘ │ │ │ │
│ │ │ │ │ │
│ ├──────────────────────┼────────────┘ │ │
└──────┤ │ │ │
│ │ │ │
│ │ │ │
│ │ │ │
│ │ │ │
│ │ │ │
│ │ │ │
│ │ │ │
┌───────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │
│ SQL Frontend │ │ gRPC Storage │ ... │ Future Frontend │ │ │
│ │ │ Frontend │ │ (e.g. InfluxQL) │──┴───┘
└───────────────────┘ └──────────────────┘ └──────────────────┘