chore(clustered): Component scaling recommendations:

Add suggestions from @reidkaufmann
in https://github.com/influxdata/DAR/issues/472
pull/5885/head
Jason Stirnaman 2025-03-13 14:04:45 -05:00
parent 8491969abd
commit 89ae11464b
2 changed files with 67 additions and 42 deletions

View File

@ -466,19 +466,29 @@ helm upgrade \
### Router
The Router can be scaled both [vertically](#vertical-scaling) and
The [Router](/influxdb3/clustered/reference/internals/storage-engine/#router) can be scaled both [vertically](#vertical-scaling) and
[horizontally](#horizontal-scaling).
Horizontal scaling increases write throughput and is typically the most
- **Recommended**: Horizontal scaling increases write throughput and is typically the most
effective scaling strategy for the Router.
Vertical scaling (specifically increased CPU) improves the Router's ability to
- Vertical scaling (specifically increased CPU) improves the Router's ability to
parse incoming line protocol with lower latency.
#### Router latency
Latency of the Routers write endpoint is directly impacted by:
- Ingester latency--the router calls the Ingester during a client write request
- Catalog latency during schema validation
### Ingester
The Ingester can be scaled both [vertically](#vertical-scaling) and
The [Ingester](/influxdb3/clustered/reference/internals/storage-engine/#ingester) can be scaled both [vertically](#vertical-scaling) and
[horizontally](#horizontal-scaling).
Vertical scaling increases write throughput and is typically the most effective
scaling strategy for the Ingester.
- **Recommended**: Vertical scaling is typically the most effective scaling strategy for the Ingester.
Compared to horizontal scaling, vertical scaling not only increases write throughput but also lessens query, catalog, and compaction overheads as well as Object store costs.
- Horizontal scaling can help distribute write load but comes with additional coordination overhead.
#### Ingester storage volume
@ -541,50 +551,62 @@ ingesterStorage:
### Querier
The Querier can be scaled both [vertically](#vertical-scaling) and
The [Querier](/influxdb3/clustered/reference/internals/storage-engine/#querier) can be scaled both [vertically](#vertical-scaling) and
[horizontally](#horizontal-scaling).
Horizontal scaling increases query throughput to handle more concurrent queries.
Vertical scaling improves the Queriers ability to process computationally
intensive queries.
- **Recommended**: [Vertical scaling](#vertical-scaling) improves the Querier's ability to process concurrent or computationally
intensive queries, and increases the effective cache capacity.
- Horizontal scaling increases query throughput to handle more concurrent queries.
Consider horizontal scaling if vertical scaling doesn't adequately address
concurrency demands or reaches the hardware limits of your underlying nodes.
### Compactor
The Compactor can be scaled both [vertically](#vertical-scaling) and
[horizontally](#horizontal-scaling).
Because compaction is a compute-heavy process, vertical scaling (especially
increasing the available CPU) is the most effective scaling strategy for the
Compactor. Horizontal scaling increases compaction throughput, but not as
- **Recommended**: Maintain **1 Compactor pod** and use [vertical scaling](#vertical-scaling) (especially
increasing the available CPU) for the Compactor.
- Because compaction is a compute-heavy process, horizontal scaling increases compaction throughput, but not as
efficiently as vertical scaling.
### Garbage collector
The Garbage collector is not designed for distributed load and should _not_ be
scaled horizontally. It is a lightweight process that typically doesn't require
significant system resources. [Vertical scaling](#vertical-scaling) should only
be considered if you observe consistently high CPU usage or if the container
The [Garbage collector](/influxdb3/clustered/reference/internals/storage-engine/#garbage-collector) is a lightweight process that typically doesn't require
significant system resources.
- Don't horizontally scale the Garbage collector; it isn't designed for distributed load.
- Consider [vertical scaling](#vertical-scaling) only if you observe consistently high CPU usage or if the container
regularly runs out of memory.
### Catalog store
The Catalog store is a PostgreSQL-compatible database that persistently stores metadata.
Scaling strategies depend on your chosen PostgreSQL implementation.
All support [vertical scaling](#vertical-scaling), and most support
[horizontal scaling](#horizontal-scaling) for redundancy and failover.
The [Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store) is a PostgreSQL-compatible database that stores critical metadata for your InfluxDB cluster.
An underprovisioned Catalog store can cause write outages and system-wide performance issues.
- Scaling strategies depend on your specific PostgreSQL implementation
- All PostgreSQL implementations support [vertical scaling](#vertical-scaling)
- Most implementations support [horizontal scaling](#horizontal-scaling) for improved redundancy and failover
### Catalog service
The Catalog service should maintain exactly
3 replicas for optimal redundancy.
Additional replicas are discouraged; favor vertical scaling instead if performance improvements are needed.
The [Catalog service](/influxdb3/clustered/reference/internals/storage-engine/#catalog-service) (iox-shared-catalog statefulset) caches
and manages access to the Catalog store.
- **Recommended**: Maintain **exactly 3 replicas** of the Catalog service for optimal redundancy. Additional replicas are discouraged.
- If performance improvements are needed, use [vertical scaling](#vertical-scaling).
> [!Note]
> #### Managing Catalog components
>
> The [Catalog service](/influxdb3/clustered/reference/internals/storage-engine/#catalog-service) is managed through the
> `AppInstance` resource, while the [Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store)
> is managed separately according to your PostgreSQL implementation.
### Object store
Scaling strategies available for the Object store depend on the underlying
object storage services used to run the object store. Most support
The [Object store](/influxdb3/clustered/reference/internals/storage-engine/#object-store)
contains time series data in Parquet format.
Scaling strategies depend on the underlying object storage services used.
Most services support
[horizontal scaling](#horizontal-scaling) for redundancy, failover, and
increased capacity.

View File

@ -50,8 +50,13 @@ queries, and is optimized to reduce storage cost.
The Router (also known as the Ingest Router) parses incoming line
protocol and then routes it to [Ingesters](#ingester).
To ensure write durability, the Router replicates data to two or more of the
available Ingesters.
The Router processes incoming write requests through the following steps:
- Queries the [Catalog](#catalog) to determine persistence locations and verify schema compatibility
- Validates syntax and schema compatibility for each data point in the request,
and either accepts or [rejects points](/influxdb3/clustered/write-data/troubleshoot/#troubleshoot-rejected-points)
- Returns a [response](/influxdb3/clustered/write-data/troubleshoot/) to the client
- Replicates data to two or more available Ingesters for write durability
### Ingester
@ -59,11 +64,6 @@ The Ingester processes line protocol submitted in write requests and persists
time series data to the [Object store](#object-store).
In this process, the Ingester does the following:
- Queries the [Catalog](#catalog) to identify where data should be persisted and
to ensure the schema of the line protocol is compatible with the
[schema](/influxdb3/clustered/reference/glossary/#schema) of persisted data.
- Accepts or [rejects](/influxdb3/clustered/write-data/troubleshoot/#troubleshoot-rejected-points)
points in the write request and generates a [response](/influxdb3/clustered/write-data/troubleshoot/).
- Processes line protocol and persists time series data to the
[Object store](#object-store) in Apache Parquet format. Each Parquet file
represents a _partition_--a logical grouping of data.
@ -93,11 +93,12 @@ At query time, the querier:
3. Queries the [Catalog service](#catalog-service) to retrieve [Catalog store](#catalog-store)
information about partitions in the [Object store](#object-store)
that contain the queried data.
4. Reads partition Parquet files that contain the queried data and scans each
4. Retrieves any needed Parquet files (not already cached) from the Object store.
5. Reads partition Parquet files that contain the queried data and scans each
row to filter data that matches predicates in the query plan.
5. Performs any additional operations (for example: deduplicating, merging, and sorting)
specified in the query plan.
6. Returns the query result to the client.
6. Performs any additional operations (for example: deduplicating, merging, and sorting)
specified in the query plan.
7. Returns the query result to the client.
### Catalog
@ -105,6 +106,8 @@ InfluxDB's catalog system consists of two distinct components: the [Catalog stor
and the [Catalog service](#catalog-service).
> [!Note]
> #### Managing Catalog components
>
> The Catalog service is managed through the `AppInstance` resource, while the Catalog store
> is managed separately according to your PostgreSQL implementation.
@ -127,10 +130,10 @@ and manages access to the Catalog store.
### Object store
The Object store contains time series data in [Apache Parquet](https://parquet.apache.org/) format.
Each Parquet file represents a partition.
By default, InfluxDB partitions tables by day, but you can
[customize the partitioning strategy](/influxdb3/clustered/admin/custom-partitions/).
Data in each Parquet file is sorted, encoded, and compressed.
A partition may contain multiple parquet files which are subject to compaction.
By default, InfluxDB partitions tables by day, but you can
[customize the partitioning strategy](/influxdb3/clustered/admin/custom-partitions/)
### Compactor