influxdb/docs/ingester_querier_protocol.md

# Ingester ⇔ Querier Query Protocol
This document describes the query protocol that the querier uses to request data from the ingesters.

The protocol is based on [Apache Flight]. We however only support a single request type: `DoGet`.

## Request (Querier ⇒ Ingester)

The `DoGet` ticket contains a [Protocol Buffer] message
`influxdata.iox.ingester.v1.IngesterQueryRequest` (see our `generated_types` crate). This message
contains:

- **namespace ID:** The catalog namespace ID of the query.
- **table ID:** The catalog table ID that we request.
- **columns:** List of columns that the querier wants. If the ingester does NOT know about a
  specified column, it may just ignore that column (i.e. the resulting data is the intersection of
  the request and the ingester data).
- **predicate:** Predicate for row-filtering on the ingester side.

The request does NOT contain a selection of partitions. The ingester must respond with all
partitions it knows for that specified namespace-table combination.

## Response (Ingester ⇒ Querier)

The goal of the response is to stream the following ingester data hierarchy:

- For each partition **(A)**:
- Persistence Information:
  - Sequence number of max. persisted parquet file
- For each snapshot (contains _persisting_ data) **(B)**:
  - Record batches with following operations applied **(C)**:
    - selection (i.e. row filter via predicates)
    - projection (i.e. column filter)

This is mapped to the following stream of Flight messages:

- **A:** `None` message type with app metadata. The app metadata is a [Protocol Buffer] of
  `influxdata.iox.ingester.v1.IngesterQueryResponseMetadata`. This message contains:
  - partition id
  - Sequence number of max. persisted parquet file
- **B:** `Schema` message that announces the snapshot schema. No app metadata is attached. The
  snapshot belongs to the partition that was just announced. Transmitting a schema resets the
  dictionary information.
- **Between B and C:** `DictionaryBatch` messages that set the dictionary information for the next
  record batch.
- **C:** `RecordBatch` message that uses the last schema and the current dictionary state. The
  batch belongs to the snapshot that was just announced.

The protocol is stateful and therefore the order of the messages is important. A specific partition
and snapshot may only be announced once.

All other messages types (at the time of writing these are `Tensor` and `SparseTensor`) are
unsupported.

## Example

Imagine the following ingester state:

- partition P1:
  - max. persisted parquet file at `sequence_number=10`
  - snapshots C1 and C2
- partition P2:
  - max. persisted parquet file at `sequence_number=1`
  - snapshot C3
- partition P3:
  - no persisted parquet file
  - no snapshots (all deleted)
- partition P4:
  - no persisted parquet file
  - snapshot C4

This results in the following response stream:

1. `None` for P1:
   - `partition_id=1`
   - `parquet_max_sequence_number=10`
2. `Schema` for C1
3. zero, one, or multiple `RecordBatch`es for C1
4. `Schema` for C2
5. zero, one, or multiple `RecordBatch`es for C2
6. `None` for P2:
   - `partition_id=2`
   - `parquet_max_sequence_number=1`
7. `Schema` for C3
8. zero, one, or multiple `RecordBatch`es for C3
9. `None` for P4:
   - `partition_id=4`
   - `parquet_max_sequence_number=None`
7. `Schema` for C4
8. zero, one, or multiple `RecordBatch`es for C4

Note that P3 was skipped because there was no unpersisted data.

[Apache Flight]: https://arrow.apache.org/docs/Format/Flight.html
[Protocol Buffer]: https://developers.google.com/protocol-buffers
refactor: use new ingester<>querier wire protocol (#4867) * refactor: use new ingester<>querier wire protocol Use and document the new and more flexible ingester<>querier wire protocol. Note that the ingester does NOT stream the response data yet, but the internal data structures would allow that. A follow-up change will adjust the ingester code to stream the data. Ref #4849. * fix: typos Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: clarify naming and public interface * test: add schema assertion to `ingester_response_to_record_batches` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> 2022-06-16 08:02:28 +00:00			`# Ingester ⇔ Querier Query Protocol`
			`This document describes the query protocol that the querier uses to request data from the ingesters.`

			The protocol is based on [Apache Flight]. We however only support a single request type: `DoGet`.

			`## Request (Querier ⇒ Ingester)`
fix: Remove unused parquet_max_sequence_number metadata 2023-02-27 20:03:31 +00:00
fix: Rename KafkaPartition to ShardIndex 2022-08-19 21:19:28 +00:00			The `DoGet` ticket contains a [Protocol Buffer] message
			`influxdata.iox.ingester.v1.IngesterQueryRequest` (see our `generated_types` crate). This message
			`contains:`
refactor: use new ingester<>querier wire protocol (#4867) * refactor: use new ingester<>querier wire protocol Use and document the new and more flexible ingester<>querier wire protocol. Note that the ingester does NOT stream the response data yet, but the internal data structures would allow that. A follow-up change will adjust the ingester code to stream the data. Ref #4849. * fix: typos Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: clarify naming and public interface * test: add schema assertion to `ingester_response_to_record_batches` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> 2022-06-16 08:02:28 +00:00
fix: Update docs on the ingester-querier protocol. I believe this fixes #6049. 2022-11-14 18:36:46 +00:00			`- namespace ID: The catalog namespace ID of the query.`
			`- table ID: The catalog table ID that we request.`
fix: Rename KafkaPartition to ShardIndex 2022-08-19 21:19:28 +00:00			`- columns: List of columns that the querier wants. If the ingester does NOT know about a`
			`specified column, it may just ignore that column (i.e. the resulting data is the intersection of`
			`the request and the ingester data).`
refactor: use new ingester<>querier wire protocol (#4867) * refactor: use new ingester<>querier wire protocol Use and document the new and more flexible ingester<>querier wire protocol. Note that the ingester does NOT stream the response data yet, but the internal data structures would allow that. A follow-up change will adjust the ingester code to stream the data. Ref #4849. * fix: typos Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: clarify naming and public interface * test: add schema assertion to `ingester_response_to_record_batches` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> 2022-06-16 08:02:28 +00:00			`- predicate: Predicate for row-filtering on the ingester side.`

fix: Remove unused parquet_max_sequence_number metadata 2023-02-27 20:03:31 +00:00			`The request does NOT contain a selection of partitions. The ingester must respond with all`
			`partitions it knows for that specified namespace-table combination.`
refactor: use new ingester<>querier wire protocol (#4867) * refactor: use new ingester<>querier wire protocol Use and document the new and more flexible ingester<>querier wire protocol. Note that the ingester does NOT stream the response data yet, but the internal data structures would allow that. A follow-up change will adjust the ingester code to stream the data. Ref #4849. * fix: typos Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: clarify naming and public interface * test: add schema assertion to `ingester_response_to_record_batches` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> 2022-06-16 08:02:28 +00:00
			`## Response (Ingester ⇒ Querier)`
fix: Remove unused parquet_max_sequence_number metadata 2023-02-27 20:03:31 +00:00
refactor: use new ingester<>querier wire protocol (#4867) * refactor: use new ingester<>querier wire protocol Use and document the new and more flexible ingester<>querier wire protocol. Note that the ingester does NOT stream the response data yet, but the internal data structures would allow that. A follow-up change will adjust the ingester code to stream the data. Ref #4849. * fix: typos Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: clarify naming and public interface * test: add schema assertion to `ingester_response_to_record_batches` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> 2022-06-16 08:02:28 +00:00			`The goal of the response is to stream the following ingester data hierarchy:`

fix: Remove unused parquet_max_sequence_number metadata 2023-02-27 20:03:31 +00:00			`- For each partition (A):`
			`- Persistence Information:`
			`- Sequence number of max. persisted parquet file`
			`- For each snapshot (contains _persisting_ data) (B):`
			`- Record batches with following operations applied (C):`
			`- selection (i.e. row filter via predicates)`
			`- projection (i.e. column filter)`
refactor: use new ingester<>querier wire protocol (#4867) * refactor: use new ingester<>querier wire protocol Use and document the new and more flexible ingester<>querier wire protocol. Note that the ingester does NOT stream the response data yet, but the internal data structures would allow that. A follow-up change will adjust the ingester code to stream the data. Ref #4849. * fix: typos Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: clarify naming and public interface * test: add schema assertion to `ingester_response_to_record_batches` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> 2022-06-16 08:02:28 +00:00
			`This is mapped to the following stream of Flight messages:`

			- A: `None` message type with app metadata. The app metadata is a [Protocol Buffer] of
			`influxdata.iox.ingester.v1.IngesterQueryResponseMetadata`. This message contains:
			`- partition id`
			`- Sequence number of max. persisted parquet file`
fix: Remove unused parquet_max_sequence_number metadata 2023-02-27 20:03:31 +00:00			- B: `Schema` message that announces the snapshot schema. No app metadata is attached. The
			`snapshot belongs to the partition that was just announced. Transmitting a schema resets the`
			`dictionary information.`
			- Between B and C: `DictionaryBatch` messages that set the dictionary information for the next
			`record batch.`
			- C: `RecordBatch` message that uses the last schema and the current dictionary state. The
			`batch belongs to the snapshot that was just announced.`
refactor: use new ingester<>querier wire protocol (#4867) * refactor: use new ingester<>querier wire protocol Use and document the new and more flexible ingester<>querier wire protocol. Note that the ingester does NOT stream the response data yet, but the internal data structures would allow that. A follow-up change will adjust the ingester code to stream the data. Ref #4849. * fix: typos Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: clarify naming and public interface * test: add schema assertion to `ingester_response_to_record_batches` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> 2022-06-16 08:02:28 +00:00
fix: Remove unused parquet_max_sequence_number metadata 2023-02-27 20:03:31 +00:00			`The protocol is stateful and therefore the order of the messages is important. A specific partition`
			`and snapshot may only be announced once.`
refactor: use new ingester<>querier wire protocol (#4867) * refactor: use new ingester<>querier wire protocol Use and document the new and more flexible ingester<>querier wire protocol. Note that the ingester does NOT stream the response data yet, but the internal data structures would allow that. A follow-up change will adjust the ingester code to stream the data. Ref #4849. * fix: typos Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: clarify naming and public interface * test: add schema assertion to `ingester_response_to_record_batches` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> 2022-06-16 08:02:28 +00:00
fix: Remove unused parquet_max_sequence_number metadata 2023-02-27 20:03:31 +00:00			All other messages types (at the time of writing these are `Tensor` and `SparseTensor`) are
			`unsupported.`
refactor: use new ingester<>querier wire protocol (#4867) * refactor: use new ingester<>querier wire protocol Use and document the new and more flexible ingester<>querier wire protocol. Note that the ingester does NOT stream the response data yet, but the internal data structures would allow that. A follow-up change will adjust the ingester code to stream the data. Ref #4849. * fix: typos Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: clarify naming and public interface * test: add schema assertion to `ingester_response_to_record_batches` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> 2022-06-16 08:02:28 +00:00
			`## Example`
fix: Remove unused parquet_max_sequence_number metadata 2023-02-27 20:03:31 +00:00
refactor: use new ingester<>querier wire protocol (#4867) * refactor: use new ingester<>querier wire protocol Use and document the new and more flexible ingester<>querier wire protocol. Note that the ingester does NOT stream the response data yet, but the internal data structures would allow that. A follow-up change will adjust the ingester code to stream the data. Ref #4849. * fix: typos Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: clarify naming and public interface * test: add schema assertion to `ingester_response_to_record_batches` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> 2022-06-16 08:02:28 +00:00			`Imagine the following ingester state:`

fix: Remove unused parquet_max_sequence_number metadata 2023-02-27 20:03:31 +00:00			`- partition P1:`
			- max. persisted parquet file at `sequence_number=10`
			`- snapshots C1 and C2`
			`- partition P2:`
			- max. persisted parquet file at `sequence_number=1`
			`- snapshot C3`
			`- partition P3:`
			`- no persisted parquet file`
			`- no snapshots (all deleted)`
			`- partition P4:`
			`- no persisted parquet file`
			`- snapshot C4`
refactor: use new ingester<>querier wire protocol (#4867) * refactor: use new ingester<>querier wire protocol Use and document the new and more flexible ingester<>querier wire protocol. Note that the ingester does NOT stream the response data yet, but the internal data structures would allow that. A follow-up change will adjust the ingester code to stream the data. Ref #4849. * fix: typos Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: clarify naming and public interface * test: add schema assertion to `ingester_response_to_record_batches` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> 2022-06-16 08:02:28 +00:00
			`This results in the following response stream:`

			1. `None` for P1:
			- `partition_id=1`
			- `parquet_max_sequence_number=10`
			2. `Schema` for C1
			3. zero, one, or multiple `RecordBatch`es for C1
			4. `Schema` for C2
			5. zero, one, or multiple `RecordBatch`es for C2
			6. `None` for P2:
			- `partition_id=2`
			- `parquet_max_sequence_number=1`
			7. `Schema` for C3
			8. zero, one, or multiple `RecordBatch`es for C3
			9. `None` for P4:
			- `partition_id=4`
			- `parquet_max_sequence_number=None`
			7. `Schema` for C4
			8. zero, one, or multiple `RecordBatch`es for C4

			`Note that P3 was skipped because there was no unpersisted data.`

			`[Apache Flight]: https://arrow.apache.org/docs/Format/Flight.html`
			`[Protocol Buffer]: https://developers.google.com/protocol-buffers`