# InfluxDB IOx -- Query Processing This document illustrates query processing for SQL and InfluxQL. > **Note** > > There is another query interface called InfluxRPC (implemented in [`iox_query_influxrpc`]) which mostly reflects the old TSM storage API. The planning there works significantly different and this is NOT part of this document. ## Basic Flow 1. Query arrives from the user (e.g. SQL, InfluxQL) 2. The query engine creates a [`LogicalPlan`] by consulting the Catalog to find: - Tables referenced in the query, and their schema and column details 3. The query engine creates a [`ExecutionPlan`] by determining the Chunks that contain data: 1. Contacts the ingester for any unpersisted data 2. Consults the catalog for the name/location of parquet files 3. Prunes (discards at this step) any parquet files 4. Starts the [`ExecutionPlan`] and streams the results back to the client Some objects cached, especially the schema information, information about parquet file existence and parquet file content. A graphical representation may look like this: ```mermaid flowchart LR classDef intermediate color:#020A47,fill:#D6F622,stroke-width:0 classDef processor color:#FFFFFF,fill:#D30971,stroke-width:0 classDef systemIO color:#020A47,fill:#5EE4E4,stroke-width:0 Query[Query Text]:::systemIO LogicalPlanner[Logical Planner]:::processor LogicalPlan[Logical Plan]:::intermediate PhysicalPlanner[Physical Planner]:::processor ExecutionPlan[Execution Plan]:::intermediate QueryExec[QueryExecution]:::processor Result[Result]:::systemIO Query --> LogicalPlanner --> LogicalPlan --> PhysicalPlanner --> ExecutionPlan --> QueryExec --> Result ``` ## Code Organization The IOx query layer is responsible for translating query requests from different query languages and planning and executing them against chunks stored across various IOx storage systems. Query Frontends: - SQL - InfluxQL - Others (possibly in the future) Sources of chunk data: - Ingester Data - Parquet Files - Others (possibly in the future) The goal is to use the shared query / plan representation in order to avoid N*M combinations of language and chunk source. While each frontend has their own plan construction and each chunk may be lowered to a different [`ExecutionPlan`], the frontends and the chunks sources should not interact directly. This is achieved by first creating a [`LogicalPlan`] from the frontend without knowing the chunk sources and only during physical planning -- i.e. when the [`ExecutionPlan`] is constructed -- the chunks are transformed into appropriate [`DataFusion`] nodes. So we should end up with roughly this picture: ```mermaid flowchart TB classDef out color:#020A47,fill:#9394FF,stroke-width:0 classDef intermediate color:#020A47,fill:#D6F622,stroke-width:0 classDef in color:#020A47,fill:#5EE4E4,stroke-width:0 SQL[SQL]:::in InfluxQL[InfluxQL]:::in OtherIn["Other (possibly in the future)"]:::in LogicalPlan[Logical Plan]:::intermediate IngesterData[Ingester Data]:::out ParquetFile[Parquet File]:::out OtherOut["Other (possibly in the future)"]:::out SQL --> LogicalPlan InfluxQL --> LogicalPlan OtherIn --> LogicalPlan LogicalPlan --> IngesterData LogicalPlan --> ParquetFile LogicalPlan --> OtherOut ``` We are trying to avoid ending up with something like this: ```mermaid flowchart TB classDef out color:#020A47,fill:#9394FF,stroke-width:0 classDef intermediate color:#020A47,fill:#D6F622,stroke-width:0 classDef in color:#020A47,fill:#5EE4E4,stroke-width:0 SQL[SQL]:::in InfluxQL[InfluxQL]:::in OtherIn["Other (possibly in the future)"]:::in IngesterData[Ingester Data]:::out ParquetFile[Parquet File]:::out OtherOut["Other (possibly in the future)"]:::out SQL --> IngesterData SQL --> ParquetFile SQL --> OtherOut InfluxQL --> IngesterData InfluxQL --> ParquetFile InfluxQL --> OtherOut OtherIn --> IngesterData OtherIn --> ParquetFile OtherIn --> OtherOut ``` ## Frontend We accept queries via an [Apache Arrow Flight] based native protocol (see [`service_grpc_flight::FlightService`]), or via the standard [Apache Arrow Flight SQL]. Note that we stream data back to the client while [DataFusion] is still executing the query. This way we can emit rather large results without large buffer usage. Also see: - ["Flight SQL"] ## Logical Planning Logical planning transforms the query text into a [`LogicalPlan`]. The steps are the following: 1. Parse text representation is parsed into some intermediate representation 2. Lower intermediate representation into [`LogicalPlan`] 3. Apply logical optimizer passes to the [`LogicalPlan`] ### SQL For SQL queries, we just use [`datafusion-sql`] to generate the [`LogicalPlan`] from the query text. ### InfluxQL For InfluxQL queries, we use [`iox_query_influxql`] to generate the [`LogicalPlan`] from the query text. ### Logical Optimizer We have a few logical optimizer passes that are specific to IOx. These can be split into two categories: *optimizing* and *functional*. The *optimizing* only change to plan to make it run faster. They do not implement any functionality. These passes are: - [`influx_regex_to_datafusion_regex`]: Replaces InfluxDB-specific regex operator with [DataFusion] regex operator. The *functional* passes implement features that are NOT offered by [DataFusion] by transforming the [`LogicalPlan`] accordingly. These passes are: - [`handle_gapfill`]: enables gap-filling semantics for SQL queries that contain calls to `DATE_BIN_GAPFILL()` and related functions like `LOCF()`. The IOx-specific passes are executed AFTER the [DataFusion] builtin passes. ## Physical Planning Physical planning transforms the [`LogicalPlan`] into a [`ExecutionPlan`]. These are the steps: 1. [DataFusion] lowers [`LogicalPlan`] to [`ExecutionPlan`] - While doing so it calls IOx code to transform table scans into concrete physical operators 2. Apply physical optimizer passes to the [`ExecutionPlan`] For more details, see: - ["IOx Physical Plan Construction"] - ["Ingester ⇔ Querier Query Protocol"] - ["Deduplication"] ## Data Flow This is a detailled data flow from the querier point of view: ```mermaid flowchart TB classDef cache color:#020A47,fill:#9394FF,stroke-width:0 classDef external color:#FFFFFF,fill:#9B2AFF,stroke-width:0 classDef intermediate color:#020A47,fill:#D6F622,stroke-width:0 classDef processor color:#FFFFFF,fill:#D30971,stroke-width:0 classDef systemIO color:#020A47,fill:#5EE4E4,stroke-width:0 NamespaceName[Namespace Name]:::systemIO SqlQuery[SQL Query]:::systemIO Result[Result]:::systemIO Catalog[/Catalog/]:::external Ingester[/Ingester/]:::external ObjectStore[/Object Store/]:::external NamespaceCache[Namespace Cache]:::cache OSCache[Object Store Cache]:::cache ParquetCache[Parquet File Cache]:::cache PartitionCache[Partition Cache]:::cache ProjectedSchemaCache[Projected Schema Cache]:::cache CachedNamespace[Cached Namespace]:::intermediate LogicalPlan[Logical Plan]:::intermediate ExecutionPlan[Execution Plan]:::intermediate ParquetBytes[Parquet Bytes]:::intermediate LogicalPlanner[LogicalPlanner]:::processor PhysicalPlanner[PhysicalPlanner]:::processor QueryExec[Query Execution]:::processor %% help layout engine a bit ProjectedSchemaCache --- PartitionCache linkStyle 0 stroke-width:0px Catalog --> NamespaceCache Catalog --> ParquetCache Catalog --> PartitionCache ObjectStore --> OSCache OSCache --> ParquetBytes NamespaceName --> NamespaceCache NamespaceCache --> CachedNamespace SqlQuery --> LogicalPlanner LogicalPlanner --> LogicalPlan CachedNamespace --> CachedTable LogicalPlan --> IngesterRequest IngesterRequest --> Ingester Ingester --> IngesterResponse ParquetCache --> ParquetFileMD1 PartitionCache --> ColumnRanges PartitionCache --> SortKey ProjectedSchemaCache --> ProjectedSchema subgraph table [Querier Table] ArrowSchema[ArrowSchema]:::intermediate CachedTable[Cached Table]:::intermediate ColumnRanges[Column Ranges]:::intermediate IngesterChunks[Ingester Chunks]:::intermediate IngesterRequest[Ingester Request]:::intermediate IngesterResponse[Ingester Partitions]:::intermediate IngesterWatermark[Ingester Watermark]:::intermediate ParquetChunks[Parquet Chunks]:::intermediate ParquetFileMD1[Parquet File MD]:::intermediate ParquetFileMD2[Parquet File MD]:::intermediate ProjectedSchema[ProjectedSchema]:::intermediate SortKey[SortKey]:::intermediate QueryChunks1[Query Chunks]:::intermediate QueryChunks2[Query Chunks]:::intermediate ChunkAdapter[ChunkAdapter]:::processor IngesterDecoder[Ingester Decoder]:::processor PreFilter[Pre-filter]:::processor Pruning[Pruning]:::processor CachedTable --> ArrowSchema ColumnRanges --> IngesterDecoder IngesterResponse --> IngesterDecoder IngesterDecoder --> IngesterChunks IngesterDecoder --> IngesterWatermark ParquetFileMD1 --> PreFilter PreFilter --> ParquetFileMD2 ParquetFileMD2 --> ChunkAdapter ColumnRanges --> ChunkAdapter SortKey --> ChunkAdapter ProjectedSchema --> ChunkAdapter ChunkAdapter --> ParquetChunks IngesterChunks --> QueryChunks1 ParquetChunks --> QueryChunks1 QueryChunks1 --> Pruning Pruning --> QueryChunks2 end style table color:#020A47,fill:#00000000,stroke:#020A47,stroke-dasharray:20 ArrowSchema --> LogicalPlanner CachedTable --> PartitionCache CachedTable --> ProjectedSchemaCache IngesterChunks -.-> PartitionCache ParquetFileMD2 -.-> PartitionCache IngesterWatermark -.-> ParquetCache LogicalPlan -.-> NamespaceCache ParquetFileMD1 -.-> NamespaceCache QueryChunks2 --> PhysicalPlanner LogicalPlan --> PhysicalPlanner PhysicalPlanner --> ExecutionPlan ExecutionPlan --> QueryExec ParquetBytes --> QueryExec QueryExec --> Result ``` Legend: ```mermaid flowchart TB classDef cache color:#020A47,fill:#9394FF,stroke-width:0 classDef external color:#FFFFFF,fill:#9B2AFF,stroke-width:0 classDef intermediate color:#020A47,fill:#D6F622,stroke-width:0 classDef processor color:#FFFFFF,fill:#D30971,stroke-width:0 classDef systemIO color:#020A47,fill:#5EE4E4,stroke-width:0 classDef helper color:#020A47,fill:#020A47,stroke-width:0 n_c[Cache]:::cache n_e[/External System/]:::external n_i[Intermediate Result]:::intermediate n_p[Processor]:::processor n_s[System Input and Output]:::systemIO a((xxx)):::helper -->|data flow| b((xxx)):::helper c((xxx)):::helper -.->|cache invalidation| d((xxx)):::helper ``` ## Caches Each querier process has a set of in-memory caches. These are: | Name | Pool | Backing System | Key | Value | Invalidation / TTL / Refreshes | Notes | | ---- | ---- | -------------- | --- | ----- | ------------------------------ | ----- | | Namespace | Metadata | Catalog | Namespace Name | `CachedNamespace` | refresh policy, TTL, invalidation by unknown table/columns | Unknown entries NOT cached (assumes upstream DDoS protection) | | Object Store | Data | Object Store | Path | Raw object store bytes for the entire object | -- | | | Parquet File | Metadata | Catalog | Table ID | Parquet files (all the data that the catalog has, i.e. the entire row) for all files that are NOT marked for deletion. | TTL, but no refresh yet (see #5718), can be invalided by ingester watermark. | | | Partition | Metadata | Catalog | Partition ID | `CachedPartition` | Invalided if ingester data or any parquet files has columns that are NOT covered by the sort key. | Needs `CachedTable` for access | | Projected Schema | Metadata | Querier | Table ID, Column IDs | `ProjectedSchema` | -- | Needs `CachedTable` for access | Note that ALL caches have a LRU eviction policy bound to the specified pool. ### Cached Objects The following objects are stored within the aforementioned caches. #### `CachedNamespace` - namespace ID - retention policy - map from `Arc`ed table name to `Arc`ed `CachedTable` #### `CachedPartition` - sort key - column ranges (decoded from partition key using the partition template) #### `CachedTable` - table ID - schema - column ID => colum name map - column name => column ID map (i.e. the reverse of the above) - column IDs of primary key columns - partition template #### `ProjectedSchema` Arrow schema projected from the table schema for a specific subset of columns (since some chunks do not contain all the columns). Mostly done to optimize memory usage, i.e. some form of interning. [Apache Arrow Flight]: https://arrow.apache.org/docs/format/Flight.html [Apache Arrow Flight SQL]: https://arrow.apache.org/docs/format/FlightSql.html [DataFusion]: https://arrow.apache.org/datafusion/ [`datafusion-sql`]: https://github.com/apache/arrow-datafusion/tree/main/datafusion/sql ["Deduplication"]: ./dedup_and_sort.md [`ExecutionPlan`]: https://docs.rs/datafusion/26.0.0/datafusion/physical_plan/trait.ExecutionPlan.html ["Flight SQL"]: ./flightsql.md [`handle_gapfill`]: https://github.com/influxdata/influxdb_iox/blob/main/iox_query/src/logical_optimizer/handle_gapfill.rs [`influx_regex_to_datafusion_regex`]: https://github.com/influxdata/influxdb_iox/blob/main/iox_query/src/logical_optimizer/influx_regex_to_datafusion_regex.rs ["Ingester ⇔ Querier Query Protocol"]: ./ingester_querier_protocol.md ["IOx Physical Plan Construction"]: ./physical_plan_construction.md [`iox_query_influxql`]: https://github.com/influxdata/influxdb_iox/tree/main/iox_query_influxql [`iox_query_influxrpc`]: https://github.com/influxdata/influxdb_iox/tree/main/iox_query_influxrpc [`LogicalPlan`]: https://docs.rs/datafusion-expr/26.0.0/datafusion_expr/logical_plan/enum.LogicalPlan.html [`service_grpc_flight::FlightService`]: https://github.com/influxdata/influxdb_iox/blob/74a48a8f63c2b8602adf3a52fdd49ee009ebfa0b/service_grpc_flight/src/lib.rs#L246-L406