chore(clustered): Component scaling recommendations:

Add suggestions from @reidkaufmann in https://github.com/influxdata/DAR/issues/472
2025-03-13 14:04:45 -05:00 · 2025-03-13 14:04:45 -05:00 · 89ae11464b
parent 8491969abd
commit 89ae11464b
2 changed files with 67 additions and 42 deletions
--- a/content/influxdb3/clustered/admin/scale-cluster.md
+++ b/content/influxdb3/clustered/admin/scale-cluster.md
@ -466,19 +466,29 @@ helm upgrade \

 ### Router

-The Router can be scaled both [vertically](#vertical-scaling) and
+The [Router](/influxdb3/clustered/reference/internals/storage-engine/#router) can be scaled both [vertically](#vertical-scaling) and
 [horizontally](#horizontal-scaling).
-Horizontal scaling increases write throughput and is typically the most
+
+- **Recommended**: Horizontal scaling increases write throughput and is typically the most
 effective scaling strategy for the Router.
-Vertical scaling (specifically increased CPU) improves the Router's ability to
+- Vertical scaling (specifically increased CPU) improves the Router's ability to
 parse incoming line protocol with lower latency.

+#### Router latency
+
+Latency of the Router’s write endpoint is directly impacted by:
+
+- Ingester latency--the router calls the Ingester during a client write request
+- Catalog latency during schema validation
+
 ### Ingester

-The Ingester can be scaled both [vertically](#vertical-scaling) and
+The [Ingester](/influxdb3/clustered/reference/internals/storage-engine/#ingester) can be scaled both [vertically](#vertical-scaling) and
 [horizontally](#horizontal-scaling).
-Vertical scaling increases write throughput and is typically the most effective
-scaling strategy for the Ingester.
+
+- **Recommended**: Vertical scaling is typically the most effective scaling strategy for the Ingester.
+Compared to horizontal scaling, vertical scaling not only increases write throughput but also lessens query, catalog, and compaction overheads as well as Object store costs.
+- Horizontal scaling can help distribute write load but comes with additional coordination overhead.

 #### Ingester storage volume

@ -541,50 +551,62 @@ ingesterStorage:

 ### Querier

-The Querier can be scaled both [vertically](#vertical-scaling) and
+The [Querier](/influxdb3/clustered/reference/internals/storage-engine/#querier) can be scaled both [vertically](#vertical-scaling) and
 [horizontally](#horizontal-scaling).
-Horizontal scaling increases query throughput to handle more concurrent queries.
-Vertical scaling improves the Querier’s ability to process computationally
-intensive queries.
+
+- **Recommended**: [Vertical scaling](#vertical-scaling) improves the Querier's ability to process concurrent or computationally 
+intensive queries, and increases the effective cache capacity.
+- Horizontal scaling increases query throughput to handle more concurrent queries. 
+Consider horizontal scaling if vertical scaling doesn't adequately address
+concurrency demands or reaches the hardware limits of your underlying nodes.

 ### Compactor

-The Compactor can be scaled both [vertically](#vertical-scaling) and
-[horizontally](#horizontal-scaling).
-Because compaction is a compute-heavy process, vertical scaling (especially
-increasing the available CPU) is the most effective scaling strategy for the
-Compactor. Horizontal scaling increases compaction throughput, but not as
+- **Recommended**: Maintain **1 Compactor pod** and use [vertical scaling](#vertical-scaling) (especially
+increasing the available CPU) for the Compactor.
+- Because compaction is a compute-heavy process, horizontal scaling increases compaction throughput, but not as
 efficiently as vertical scaling.

 ### Garbage collector

-The Garbage collector is not designed for distributed load and should _not_ be
-scaled horizontally. It is a lightweight process that typically doesn't require
-significant system resources. [Vertical scaling](#vertical-scaling) should only
-be considered if you observe consistently high CPU usage or if the container
+The [Garbage collector](/influxdb3/clustered/reference/internals/storage-engine/#garbage-collector) is a lightweight process that typically doesn't require
+significant system resources. 
+
+- Don't horizontally scale the Garbage collector; it isn't designed for distributed load.
+- Consider [vertical scaling](#vertical-scaling) only if you observe consistently high CPU usage or if the container
 regularly runs out of memory.

 ### Catalog store

-The Catalog store is a PostgreSQL-compatible database that persistently stores metadata. 
-Scaling strategies depend on your chosen PostgreSQL implementation.
-All support [vertical scaling](#vertical-scaling), and most support
-[horizontal scaling](#horizontal-scaling) for redundancy and failover.
+The [Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store) is a PostgreSQL-compatible database that stores critical metadata for your InfluxDB cluster.
+An underprovisioned Catalog store can cause write outages and system-wide performance issues.
+
+- Scaling strategies depend on your specific PostgreSQL implementation
+- All PostgreSQL implementations support [vertical scaling](#vertical-scaling)
+- Most implementations support [horizontal scaling](#horizontal-scaling) for improved redundancy and failover
+

 ### Catalog service

-The Catalog service should maintain exactly 
-3 replicas for optimal redundancy.
-Additional replicas are discouraged; favor vertical scaling instead if performance improvements are needed.
+The [Catalog service](/influxdb3/clustered/reference/internals/storage-engine/#catalog-service) (iox-shared-catalog statefulset) caches 
+and manages access to the Catalog store.
+
+- **Recommended**: Maintain **exactly 3 replicas** of the Catalog service for optimal redundancy. Additional replicas are discouraged.
+- If performance improvements are needed, use [vertical scaling](#vertical-scaling).

 > [!Note]
+> #### Managing Catalog components
+> 
 > The [Catalog service](/influxdb3/clustered/reference/internals/storage-engine/#catalog-service) is managed through the
 > `AppInstance` resource, while the [Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store) 
 > is managed separately according to your PostgreSQL implementation.

 ### Object store

-Scaling strategies available for the Object store depend on the underlying
-object storage services used to run the object store. Most support
+The [Object store](/influxdb3/clustered/reference/internals/storage-engine/#object-store)
+contains time series data in Parquet format.
+
+Scaling strategies depend on the underlying object storage services used.
+Most services support
 [horizontal scaling](#horizontal-scaling) for redundancy, failover, and
 increased capacity.
--- a/content/influxdb3/clustered/reference/internals/storage-engine.md
+++ b/content/influxdb3/clustered/reference/internals/storage-engine.md
@ -50,8 +50,13 @@ queries, and is optimized to reduce storage cost.

 The Router (also known as the Ingest Router) parses incoming line
 protocol and then routes it to [Ingesters](#ingester).
-To ensure write durability, the Router replicates data to two or more of the
-available Ingesters.
+The Router processes incoming write requests through the following steps:
+
+- Queries the [Catalog](#catalog) to determine persistence locations and verify schema compatibility
+- Validates syntax and schema compatibility for each data point in the request,
+and either accepts or [rejects points](/influxdb3/clustered/write-data/troubleshoot/#troubleshoot-rejected-points)
+- Returns a [response](/influxdb3/clustered/write-data/troubleshoot/) to the client
+- Replicates data to two or more available Ingesters for write durability

 ### Ingester

@ -59,11 +64,6 @@ The Ingester processes line protocol submitted in write requests and persists
 time series data to the [Object store](#object-store).
 In this process, the Ingester does the following:

- Queries the [Catalog](#catalog) to identify where data should be persisted and
-  to ensure the schema of the line protocol is compatible with the
-  [schema](/influxdb3/clustered/reference/glossary/#schema) of persisted data.
- Accepts or [rejects](/influxdb3/clustered/write-data/troubleshoot/#troubleshoot-rejected-points)
-  points in the write request and generates a [response](/influxdb3/clustered/write-data/troubleshoot/).
 - Processes line protocol and persists time series data to the
  [Object store](#object-store) in Apache Parquet format. Each Parquet file
  represents a _partition_--a logical grouping of data.
@ -93,11 +93,12 @@ At query time, the querier:
 3.  Queries the [Catalog service](#catalog-service) to retrieve [Catalog store](#catalog-store)
    information about partitions in the [Object store](#object-store)
    that contain the queried data.
-4.  Reads partition Parquet files that contain the queried data and scans each
+4.  Retrieves any needed Parquet files (not already cached) from the Object store.
+5.  Reads partition Parquet files that contain the queried data and scans each
    row to filter data that matches predicates in the query plan.
-5.  Performs any additional operations (for example: deduplicating, merging, and sorting)
-    specified in the query plan.
-6.  Returns the query result to the client.
+6.  Performs any additional operations (for example: deduplicating, merging, and sorting)
+    specified in the query plan. 
+7.  Returns the query result to the client.

 ### Catalog

@ -105,6 +106,8 @@ InfluxDB's catalog system consists of two distinct components: the [Catalog stor
 and the [Catalog service](#catalog-service).

 > [!Note]
+> #### Managing Catalog components
+> 
 > The Catalog service is managed through the `AppInstance` resource, while the Catalog store 
 > is managed separately according to your PostgreSQL implementation.

@ -127,10 +130,10 @@ and manages access to the Catalog store.
 ### Object store

 The Object store contains time series data in [Apache Parquet](https://parquet.apache.org/) format.
-Each Parquet file represents a partition.
-By default, InfluxDB partitions tables by day, but you can
-[customize the partitioning strategy](/influxdb3/clustered/admin/custom-partitions/).
 Data in each Parquet file is sorted, encoded, and compressed.
+A partition may contain multiple parquet files which are subject to compaction.
+By default, InfluxDB partitions tables by day, but you can
+[customize the partitioning strategy](/influxdb3/clustered/admin/custom-partitions/)

 ### Compactor