influxdb/iox_query
Sam Arnold 05657ea068
fix: optimizations for metadata fetch and chunk pruning (#5467)
* fix: hoist repeated computation out of chunk creation

We have hundreds of chunks per table, so it is beneficial to only
do common work once.

* chore: remove TableCache as it is no longer used

* fix: prune chunks both before and after metadata fetch

Fetching the metadata for all the chunks in a table is expensive,
especially when we have a narrow time range query that only
needs a few chunks.

* chore: fix clippy

* fix: fix up some last tests

* fix: review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-29 14:59:05 +00:00
..
src fix: optimizations for metadata fetch and chunk pruning (#5467) 2022-08-29 14:59:05 +00:00
Cargo.toml chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360) 2022-08-09 17:30:44 +00:00
README.md ci: fix cargo deny (#4629) 2022-05-18 09:38:35 +00:00

README.md

IOx Query Layer

The IOx query layer is responsible for translating query requests from different query languages and planning and executing them against Chunks stored across various IOx storage systems.

Query Frontends

  • SQL
  • Storage gRPC
  • Flux (possibly in the future)
  • InfluxQL (possibly in the future)
  • Others (possibly in the future)

Sources of Chunk data

  • ReadBuffer
  • MutableBuffer
  • Parquet Files
  • Others (possibly in the future, like Remote Chunk?)

The goal is to use the shared query / plan representation in order to avoid N*M combinations of language and Chunk source.

Thus query planning is implemented in terms of traits, and those traits are implemented by different chunk implementations.

Among other things, this means that this crate should not depend directly on the ReadBuffer or the MutableBuffer.

┌───────────────┐  ┌────────────────┐    ┌──────────────┐      ┌──────────────┐
│Mutable Buffer │  │  Read Buffer   │    │Parquet Files │  ... │Future Source │
│               │  │                │    │              │      │              │
└───────────────┘  └────────────────┘    └──────────────┘      └──────────────┘
        ▲                   ▲                    ▲                     ▲
        └───────────────────┴─────────┬──────────┴─────────────────────┘
                                      │
                                      │
                     ┌─────────────────────────────────┐
                     │          Shared Common          │
                     │   Predicate, Plans, Execution   │
                     └─────────────────────────────────┘
                                      ▲
                                      │
                                      │
               ┌──────────────────────┼─────────────────────────┐
               │                      │                         │
               │                      │                         │
               │                      │                         │
     ┌───────────────────┐  ┌──────────────────┐      ┌──────────────────┐
     │   SQL Frontend    │  │   gRPC Storage   │ ...  │ Future Frontend  │
     │                   │  │     Frontend     │      │ (e.g. InfluxQL)  │
     └───────────────────┘  └──────────────────┘      └──────────────────┘

We are trying to avoid ending up with something like this:

                          ┌─────────────────────────────────────────────────┐
                          │                                                 │
                          ▼                                                 │
                   ┌────────────┐                                           │
                   │Read Buffer │                  ┌────────────────────────┤
        ┌──────────┼────────────┼─────┬────────────┼────────────────────────┤
        │          └────────────┘     │            ▼                        │
        ▼                 ▲           │    ┌──────────────┐                 │
┌───────────────┐         │           │    │Parquet Files │                 │
│Mutable Buffer │         │           ├───▶│              │...              │
│               │◀────────┼───────────┤    └──────────────┘   ┌─────────────┼┐
└───────────────┘         │           │            ▲          │Future Source││
        ▲                 │           ├────────────┼─────────▶│             ││◀─┐
        │                 │           │            │          └─────────────┼┘  │
        │                 │           │            │                        │   │
        │                 │           │            │                        │   │
        │      ┌──────────┘           │            │                        │   │
        │      │                      │            │                        │   │
        │      ├──────────────────────┼────────────┘                        │   │
        └──────┤                      │                                     │   │
               │                      │                                     │   │
               │                      │                                     │   │
               │                      │                                     │   │
               │                      │                                     │   │
               │                      │                                     │   │
               │                      │                                     │   │
               │                      │                                     │   │
     ┌───────────────────┐  ┌──────────────────┐      ┌──────────────────┐  │   │
     │   SQL Frontend    │  │   gRPC Storage   │ ...  │ Future Frontend  │  │   │
     │                   │  │     Frontend     │      │ (e.g. InfluxQL)  │──┴───┘
     └───────────────────┘  └──────────────────┘      └──────────────────┘