Merge pull request #5885 from influxdata/jts/dar-472-catalog-terminology

fix(clustered): Closes https://github.com/influxdata/DAR/issues/472. …
pull/5894/head^2
Jason Stirnaman 2025-03-14 17:37:43 -05:00 committed by GitHub
commit 271b0a2d89
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
9 changed files with 144 additions and 155 deletions

View File

@ -12,21 +12,21 @@ weight: 105
influxdb3/clustered/tags: [backup, restore]
---
InfluxDB Clustered automatically stores snapshots of the InfluxDB Catalog that
InfluxDB Clustered automatically stores snapshots of the InfluxDB Catalog store that
you can use to restore your cluster to a previous state. The snapshotting
functionality is optional and is disabled by default.
Enable snapshots to ensure you can recover
in case of emergency.
With InfluxDB Clustered snapshots enabled, each hour, InfluxDB uses the `pg_dump`
utility included with the InfluxDB Garbage Collector to export an SQL blob or
“snapshot” from the InfluxDB Catalog and store it in the object store.
The Catalog is a PostgreSQL-compatible relational database that stores metadata
utility included with the InfluxDB Garbage collector to export an SQL blob or
“snapshot” from the InfluxDB Catalog store to the Object store.
The Catalog store is a PostgreSQL-compatible relational database that stores metadata
for your time series data, such as schema data types, Parquet file locations, and more.
The Catalog snapshots act as recovery points for your InfluxDB cluster that
reference all Parquet files that existed in the object store at the time of the
snapshot. When a snapshot is restored to the Catalog, the Compactor
The Catalog store snapshots act as recovery points for your InfluxDB cluster that
reference all Parquet files that existed in the Object store at the time of the
snapshot. When a snapshot is restored to the Catalog store, the Compactor
“[soft deletes](#soft-delete)” any Parquet files not listed in the snapshot.
> [!Note]
@ -34,7 +34,7 @@ snapshot. When a snapshot is restored to the Catalog, the Compactor
>
> For example, if you have Parquet files A, B, C, and D, and you restore to a
> snapshot that includes B and C, but not A and D, then A and D are soft-deleted, but remain in object
> storage until they are no longer referenced in any Catalog snapshot.
> storage until they are no longer referenced in any Catalog store snapshot.
- [Soft delete](#soft-delete)
- [Hard delete](#hard-delete)
- [Recovery Point Objective (RPO)](#recovery-point-objective-rpo)
@ -75,8 +75,8 @@ The InfluxDB Clustered snapshot strategy RPO allows for the following maximum da
## Recovery Time Objective (RTO)
RTO is the maximum amount of downtime allowed for an InfluxDB cluster after a failure.
RTO varies depending on the size of your Catalog database, network speeds
between the client machine and the Catalog database, cluster load, the status
RTO varies depending on the size of your Catalog store, network speeds
between the client machine and the Catalog store, cluster load, the status
of your underlying hosting provider, and other factors.
## Data written just before a snapshot may not be present after restoring
@ -94,14 +94,14 @@ present after restoring to that snapshot.
### Automate object synchronization to an external S3-compatible bucket
Syncing objects to an external S3-compatible bucket ensures an up-to-date backup
in case your object store becomes unavailable. Recovery point snapshots only
back up the InfluxDB Catalog. If data referenced in a Catalog snapshot does not
exist in the object store, the recovery process does not restore the missing data.
in case your Object store becomes unavailable. Recovery point snapshots only
back up the InfluxDB Catalog store. If data referenced in a Catalog store snapshot does not
exist in the Object store, the recovery process does not restore the missing data.
### Enable short-term object versioning
If your object storage provider supports it, consider enabling short-term
object versioning on your object store--for example, 1-2 days to protect against errant writes or deleted objects.
object versioning on your Object store--for example, 1-2 days to protect against errant writes or deleted objects.
With object versioning enabled, as objects are updated, the object store
retains distinct versions of each update that can be used to “rollback” newly
written or updated Parquet files to previous versions.
@ -140,7 +140,7 @@ spec:
#### INFLUXDB_IOX_CREATE_CATALOG_BACKUP_DATA_SNAPSHOT_FILES
Enable hourly Catalog snapshotting. The default is `'false'`. Set to `'true'`:
Enable hourly Catalog store snapshotting. The default is `'false'`. Set to `'true'`:
```yaml
INFLUXDB_IOX_CREATE_CATALOG_BACKUP_DATA_SNAPSHOT_FILES: 'true'
@ -217,22 +217,20 @@ written on or around the beginning of the next hour.
## Restore to a recovery point
Use the following process to restore your InfluxDB cluster to a recovery point
using Catalog snapshots:
using Catalog store snapshots:
1. **Install prerequisites:**
- `kubectl` CLI for managing your Kubernetes deployment.
- `psql` CLI to interact with the PostgreSQL-compatible Catalog database with
the appropriate Data Source Name (DSN) and connection credentials.
- A client to interact with your InfluxDB clusters object store.
Supported clients depend on your object storage provider.
- `psql` CLI configured with your Data Source Name and credentials for interacting with the PostgreSQL-compatible Catalog store database.
- A client from your object storage provider for interacting with your InfluxDB cluster's Object store.
2. **Retrieve the recovery point snapshot from your object store.**
InfluxDB Clustered stores hourly and daily snapshots in the
`/catalog_backup_file_lists` path in object storage. Download the snapshot
that you would like to use as the recovery point. If your primary object
store is unavailable, download the snapshot from your replicated object store.
that you would like to use as the recovery point. If your primary Object
store is unavailable, download the snapshot from your replicated Object store.
> [!Important]
> When creating and storing a snapshot, the last artifact created is the

View File

@ -22,19 +22,12 @@ resources available to each component.
- [Scaling strategies](#scaling-strategies)
- [Vertical scaling](#vertical-scaling)
- [Horizontal scaling](#horizontal-scaling)
- [Scale your cluster as a whole](#scale-your-cluster-as-a-whole)
- [Scale components in your cluster](#scale-components-in-your-cluster)
- [Horizontally scale a component](#horizontally-scale-a-component)
- [Vertically scale a component](#vertically-scale-a-component)
- [Apply your changes](#apply-your-changes)
- [Scale your cluster as a whole](#scale-your-cluster-as-a-whole)
- [Recommended scaling strategies per component](#recommended-scaling-strategies-per-component)
- [Ingester](#ingester)
- [Querier](#querier)
- [Router](#router)
- [Compactor](#compactor)
- [Garbage collector](#garbage-collector)
- [Catalog](#catalog)
- [Object store](#object-store)
## Scaling strategies
@ -59,6 +52,14 @@ throughput a system can manage, but also provides additional redundancy and fail
{{< html-diagram/scaling-strategy "horizontal" >}}
## Scale your cluster as a whole
Scaling your entire InfluxDB Cluster is done by scaling your Kubernetes cluster
and is managed outside of InfluxDB. The process of scaling your entire Kubernetes
cluster depends on your underlying Kubernetes provider. You can also use
[Kubernetes autoscaling](https://kubernetes.io/docs/concepts/cluster-administration/cluster-autoscaling/)
to automatically scale your cluster as needed.
## Scale components in your cluster
The following components of your InfluxDB cluster are scaled by modifying
@ -69,6 +70,7 @@ properties in your `AppInstance` resource:
- Compactor
- Router
- Garbage collector
- Catalog service
> [!Note]
> #### Scale your Catalog and Object store
@ -448,14 +450,6 @@ helm upgrade \
{{% /code-tab-content %}}
{{< /code-tabs-wrapper >}}
## Scale your cluster as a whole
Scaling your entire InfluxDB Cluster is done by scaling your Kubernetes cluster
and is managed outside of InfluxDB. The process of scaling your entire Kubernetes
cluster depends on your underlying Kubernetes provider. You can also use
[Kubernetes autoscaling](https://kubernetes.io/docs/concepts/cluster-administration/cluster-autoscaling/)
to automatically scale your cluster as needed.
## Recommended scaling strategies per component
- [Router](#router)
@ -463,24 +457,35 @@ to automatically scale your cluster as needed.
- [Querier](#querier)
- [Compactor](#compactor)
- [Garbage collector](#garbage-collector)
- [Catalog](#catalog)
- [Catalog store](#catalog-store)
- [Catalog service](#catalog-service)
- [Object store](#object-store)
### Router
The Router can be scaled both [vertically](#vertical-scaling) and
The [Router](/influxdb3/clustered/reference/internals/storage-engine/#router) can be scaled both [vertically](#vertical-scaling) and
[horizontally](#horizontal-scaling).
Horizontal scaling increases write throughput and is typically the most
- **Recommended**: Horizontal scaling increases write throughput and is typically the most
effective scaling strategy for the Router.
Vertical scaling (specifically increased CPU) improves the Router's ability to
- Vertical scaling (specifically increased CPU) improves the Router's ability to
parse incoming line protocol with lower latency.
#### Router latency
Latency of the Routers write endpoint is directly impacted by:
- Ingester latency--the router calls the Ingester during a client write request
- Catalog latency during schema validation
### Ingester
The Ingester can be scaled both [vertically](#vertical-scaling) and
The [Ingester](/influxdb3/clustered/reference/internals/storage-engine/#ingester) can be scaled both [vertically](#vertical-scaling) and
[horizontally](#horizontal-scaling).
Vertical scaling increases write throughput and is typically the most effective
scaling strategy for the Ingester.
- **Recommended**: Vertical scaling is typically the most effective scaling strategy for the Ingester.
Compared to horizontal scaling, vertical scaling not only increases write throughput but also lessens query, catalog, and compaction overheads as well as Object store costs.
- Horizontal scaling can help distribute write load but comes with additional coordination overhead.
#### Ingester storage volume
@ -543,37 +548,62 @@ ingesterStorage:
### Querier
The Querier can be scaled both [vertically](#vertical-scaling) and
The [Querier](/influxdb3/clustered/reference/internals/storage-engine/#querier) can be scaled both [vertically](#vertical-scaling) and
[horizontally](#horizontal-scaling).
Horizontal scaling increases query throughput to handle more concurrent queries.
Vertical scaling improves the Queriers ability to process computationally
intensive queries.
- **Recommended**: [Vertical scaling](#vertical-scaling) improves the Querier's ability to process concurrent or computationally
intensive queries, and increases the effective cache capacity.
- Horizontal scaling increases query throughput to handle more concurrent queries.
Consider horizontal scaling if vertical scaling doesn't adequately address
concurrency demands or reaches the hardware limits of your underlying nodes.
### Compactor
The Compactor can be scaled both [vertically](#vertical-scaling) and
[horizontally](#horizontal-scaling).
Because compaction is a compute-heavy process, vertical scaling (especially
increasing the available CPU) is the most effective scaling strategy for the
Compactor. Horizontal scaling increases compaction throughput, but not as
- **Recommended**: Maintain **1 Compactor pod** and use [vertical scaling](#vertical-scaling) (especially
increasing the available CPU) for the Compactor.
- Because compaction is a compute-heavy process, horizontal scaling increases compaction throughput, but not as
efficiently as vertical scaling.
### Garbage collector
The Garbage collector can be scaled [vertically](#vertical-scaling). It is a
light-weight process that typically doesn't require many system resources, but
if you begin to see high resource consumption on the garbage collector, you can
scale it vertically to address the added workload.
The [Garbage collector](/influxdb3/clustered/reference/internals/storage-engine/#garbage-collector) is a lightweight process that typically doesn't require
significant system resources.
### Catalog
- Don't horizontally scale the Garbage collector; it isn't designed for distributed load.
- Consider [vertical scaling](#vertical-scaling) only if you observe consistently high CPU usage or if the container
regularly runs out of memory.
Scaling strategies available for the Catalog depend on the PostgreSQL-compatible
database used to run the catalog. All support [vertical scaling](#vertical-scaling).
Most support [horizontal scaling](#horizontal-scaling) for redundancy and failover.
### Catalog store
The [Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store) is a PostgreSQL-compatible database that stores critical metadata for your InfluxDB cluster.
An underprovisioned Catalog store can cause write outages and system-wide performance issues.
- Scaling strategies depend on your specific PostgreSQL implementation
- All PostgreSQL implementations support [vertical scaling](#vertical-scaling)
- Most implementations support [horizontal scaling](#horizontal-scaling) for improved redundancy and failover
### Catalog service
The [Catalog service](/influxdb3/clustered/reference/internals/storage-engine/#catalog-service) (iox-shared-catalog statefulset) caches
and manages access to the Catalog store.
- **Recommended**: Maintain **exactly 3 replicas** of the Catalog service for optimal redundancy. Additional replicas are discouraged.
- If performance improvements are needed, use [vertical scaling](#vertical-scaling).
> [!Note]
> #### Managing Catalog components
>
> The [Catalog service](/influxdb3/clustered/reference/internals/storage-engine/#catalog-service) is managed through the
> `AppInstance` resource, while the [Catalog store](/influxdb3/clustered/reference/internals/storage-engine/#catalog-store)
> is managed separately according to your PostgreSQL implementation.
### Object store
Scaling strategies available for the Object store depend on the underlying
object storage services used to run the object store. Most support
The [Object store](/influxdb3/clustered/reference/internals/storage-engine/#object-store)
contains time series data in Parquet format.
Scaling strategies depend on the underlying object storage services used.
Most services support
[horizontal scaling](#horizontal-scaling) for redundancy, failover, and
increased capacity.

View File

@ -62,13 +62,13 @@ Updating your InfluxDB cluster is as simple as re-applying your app-instance wit
The word safely here means being able to redeploy your cluster while still being able to use the tokens youve created, and being able to write/query to the database youve previously created.
All of the important state in InfluxDB 3 lives in the Catalog (the Postgres equivalent database) and the Object Store (the S3 compatible store). These should be treated with the utmost care.
All of the important state in InfluxDB 3 lives in the Catalog store (the Postgres equivalent database) and the Object Store (the S3 compatible store). These should be treated with the utmost care.
If a full redeploy of your cluster needs to happen, the namespace containing the Influxdb instance can be deleted **_as long as your Catalog and Object Store are not in this namespace_**. Then, the influxdb AppInstance can be redeployed. It is possible the operator may need to be removed and reinstalled. In that case, deleting the namespace that the operator is deployed into and redeploying is acceptable.
If a full redeploy of your cluster needs to happen, the namespace containing the Influxdb instance can be deleted **_as long as your Catalog store and Object Store are not in this namespace_**. Then, the influxdb AppInstance can be redeployed. It is possible the operator may need to be removed and reinstalled. In that case, deleting the namespace that the operator is deployed into and redeploying is acceptable.
### Backing up your data
The Catalog and Object store contain all of the important state for InfluxDB 3. They should be the primary focus of backups. Following the industry standard best practices for your chosen Catalog implementation and Object Store implementation should provide sufficient backups. In our Cloud products, we do daily backups of our Catalog, in addition to automatic snapshots, and we preserve our Object Store files for 100 days after they have been soft-deleted.
The Catalog store and Object store contain all of the important state for InfluxDB 3. They should be the primary focus of backups. Following the industry standard best practices for your chosen Catalog store implementation and Object Store implementation should provide sufficient backups. In our Cloud products, we do daily backups of our Catalog, in addition to automatic snapshots, and we preserve our Object Store files for 100 days after they have been soft-deleted.
### Recovering your data

View File

@ -17,7 +17,7 @@ following:
- Ingress to your cluster
- Connection to your Object store
- Connection to your Catalog (PostgreSQL-compatible) database
- Connection to your Catalog store (PostgreSQL-compatible) database
> [!Note]
> If using self-signed certs,
@ -176,8 +176,8 @@ objectStore:
Refer to your PostreSQL-compatible database provider's documentation for
installing TLS certificates and ensuring secure connections.
If currently using an unsecure connection to your Catalog database, update your
Catalog data source name (DSN) to **remove the `sslmode=disable` query parameter**:
If currently using an unsecure connection to your Catalog store database, update your
Catalog store data source name (DSN) to **remove the `sslmode=disable` query parameter**:
{{% code-callout "\?sslmode=disable" "magenta delete" %}}
```txt

View File

@ -99,7 +99,7 @@ following sizing for {{% product-name %}} components:
{{% tab-content %}}
<!--------------------------------- BEGIN AWS --------------------------------->
- **Catalog (PostgreSQL-compatible database) (x1):**
- **Catalog store (PostgreSQL-compatible database) (x1):**
- _[See below](#postgresql-compatible-database-requirements)_
- **Ingesters and Routers (x3):**
- EC2 m6i.2xlarge (8 CPU, 32 GB RAM)
@ -116,7 +116,7 @@ following sizing for {{% product-name %}} components:
{{% tab-content %}}
<!--------------------------------- BEGIN GCP --------------------------------->
- **Catalog (PostgreSQL-compatible database) (x1):**
- **Catalog store (PostgreSQL-compatible database) (x1):**
- _[See below](#postgresql-compatible-database-requirements)_
- **Ingesters and Routers (x3):**
- GCE c2-standard-8 (8 CPU, 32 GB RAM)
@ -133,7 +133,7 @@ following sizing for {{% product-name %}} components:
{{% tab-content %}}
<!-------------------------------- BEGIN Azure -------------------------------->
- **Catalog (PostgreSQL-compatible database) (x1):**
- **Catalog store (PostgreSQL-compatible database) (x1):**
- _[See below](#postgresql-compatible-database-requirements)_
- **Ingesters and Routers (x3):**
- Standard_D8s_v3 (8 CPU, 32 GB RAM)
@ -150,7 +150,7 @@ following sizing for {{% product-name %}} components:
{{% tab-content %}}
<!------------------------------- BEGIN ON-PREM ------------------------------->
- **Catalog (PostgreSQL-compatible database) (x1):**
- **Catalog store (PostgreSQL-compatible database) (x1):**
- CPU: 4-8 cores
- RAM: 16-32 GB
- **Ingesters and Routers (x3):**

View File

@ -77,8 +77,8 @@ including the following:
- CPU and memory resources set on each type of InfluxDB pod
- The number of pods in each InfluxDB StatefulSet and Deployment
- The type of object store used and how it is hosted
- How the Catalog (PostgreSQL-compatible database) is hosted
- Indicate if either the Object store or the Catalog is shared by more than one InfluxDB
- How the Catalog store (PostgreSQL-compatible database) is hosted
- Indicate if either the Object store or the Catalog store is shared by more than one InfluxDB
Clustered product
- If so, describe the network-level topology of your setup

View File

@ -50,17 +50,13 @@ queries, and is optimized to reduce storage cost.
The Router (also known as the Ingest Router) parses incoming line
protocol and then routes it to [Ingesters](#ingester).
To ensure write durability, the Router replicates data to two or more of the
available Ingesters.
The Router processes incoming write requests through the following steps:
##### Router scaling strategies
The Router can be scaled both [vertically](/influxdb3/clustered/admin/scale-cluster/#vertical-scaling)
and [horizontally](/influxdb3/clustered/admin/scale-cluster/#horizontal-scaling).
Horizontal scaling increases write throughput and is typically the most
effective scaling strategy for the Router.
Vertical scaling (specifically increased CPU) improves the Router's ability to
parse incoming line protocol with lower latency.
- Queries the [Catalog](#catalog) to determine persistence locations and verify schema compatibility
- Validates syntax and schema compatibility for each data point in the request,
and either accepts or [rejects points](/influxdb3/clustered/write-data/troubleshoot/#troubleshoot-rejected-points)
- Returns a [response](/influxdb3/clustered/write-data/troubleshoot/) to the client
- Replicates data to two or more available Ingesters for write durability
### Ingester
@ -68,11 +64,6 @@ The Ingester processes line protocol submitted in write requests and persists
time series data to the [Object store](#object-store).
In this process, the Ingester does the following:
- Queries the [Catalog](#catalog) to identify where data should be persisted and
to ensure the schema of the line protocol is compatible with the
[schema](/influxdb3/clustered/reference/glossary/#schema) of persisted data.
- Accepts or [rejects](/influxdb3/clustered/write-data/troubleshoot/#troubleshoot-rejected-points)
points in the write request and generates a [response](/influxdb3/clustered/write-data/troubleshoot/).
- Processes line protocol and persists time series data to the
[Object store](#object-store) in Apache Parquet format. Each Parquet file
represents a _partition_--a logical grouping of data.
@ -82,13 +73,6 @@ In this process, the Ingester does the following:
- Maintains a short-term [write-ahead log (WAL)](/influxdb3/clustered/reference/internals/durability/)
to prevent data loss in case of a service interruption.
##### Ingester scaling strategies
The Ingester can be scaled both [vertically](/influxdb3/clustered/admin/scale-cluster/#vertical-scaling)
and [horizontally](/influxdb3/clustered/admin/scale-cluster/#horizontal-scaling).
Vertical scaling increases write throughput and is typically the most
effective scaling strategy for the Ingester.
### Querier
The Querier handles query requests and returns query results for requests.
@ -106,55 +90,50 @@ At query time, the querier:
- include recently written, [yet-to-be-persisted](/influxdb3/clustered/reference/internals/durability/#data-ingest)
data in query results
3. Queries the [Catalog](#catalog) to find partitions in the [Object store](#object-store)
3. Queries the [Catalog service](#catalog-service) to retrieve [Catalog store](#catalog-store)
information about partitions in the [Object store](#object-store)
that contain the queried data.
4. Reads partition Parquet files that contain the queried data and scans each
4. Retrieves any needed Parquet files (not already cached) from the Object store.
5. Reads partition Parquet files that contain the queried data and scans each
row to filter data that matches predicates in the query plan.
5. Performs any additional operations (for example: deduplicating, merging, and sorting)
specified in the query plan.
6. Returns the query result to the client.
##### Querier scaling strategies
The Querier can be scaled both [vertically](/influxdb3/clustered/admin/scale-cluster/#vertical-scaling)
and [horizontally](/influxdb3/clustered/admin/scale-cluster/#horizontal-scaling).
Horizontal scaling increases query throughput to handle more concurrent queries.
Vertical scaling improves the Querier's ability to process computationally
intensive queries.
6. Performs any additional operations (for example: deduplicating, merging, and sorting)
specified in the query plan.
7. Returns the query result to the client.
### Catalog
The Catalog is a PostgreSQL-compatible relational database that stores metadata
InfluxDB's catalog system consists of two distinct components: the [Catalog store](#catalog-store)
and the [Catalog service](#catalog-service).
> [!Note]
> #### Managing Catalog components
>
> The Catalog service is managed through the `AppInstance` resource, while the Catalog store
> is managed separately according to your PostgreSQL implementation.
#### Catalog store
The Catalog store is a PostgreSQL-compatible relational database that stores metadata
related to your time series data including schema information and physical
locations of partitions in the [Object store](#object-store).
It fulfills the following roles:
- Provides information about the schema of written data.
- Tells the [Ingester](#ingester) what partitions to persist data to.
- Tells the [Querier](#querier) what partitions contain the queried data.
- Tells the [Querier](#querier) what partitions contain the queried data.
##### Catalog scaling strategies
#### Catalog service
Scaling strategies available for the Catalog depend on the PostgreSQL-compatible
database used to run the catalog. All support
[vertical scaling](/influxdb3/clustered/admin/scale-cluster/#vertical-scaling).
Most support [horizontal scaling](/influxdb3/clustered/admin/scale-cluster/#horizontal-scaling)
for redundancy and failover.
The Catalog service (iox-shared-catalog statefulset) is an IOx component that caches
and manages access to the Catalog store.
### Object store
The Object store contains time series data in [Apache Parquet](https://parquet.apache.org/) format.
Each Parquet file represents a partition.
By default, InfluxDB partitions tables by day, but you can
[customize the partitioning strategy](/influxdb3/clustered/admin/custom-partitions/).
Data in each Parquet file is sorted, encoded, and compressed.
##### Object store scaling strategies
Scaling strategies available for the Object store depend on the underlying
object storage services used to run the object store.
Most support [horizontal scaling](/influxdb3/clustered/admin/scale-cluster/#horizontal-scaling)
for redundancy, failover, and increased capacity.
A partition may contain multiple parquet files which are subject to compaction.
By default, InfluxDB partitions tables by day, but you can
[customize the partitioning strategy](/influxdb3/clustered/admin/custom-partitions/)
### Compactor
@ -162,26 +141,8 @@ The Compactor processes and compresses partitions in the [Object store](#object-
to continually optimize storage.
It then updates the [Catalog](#catalog) with locations of compacted data.
##### Compactor scaling strategies
The Compactor can be scaled both [vertically](/influxdb3/clustered/admin/scale-cluster/#vertical-scaling)
and [horizontally](/influxdb3/clustered/admin/scale-cluster/#horizontal-scaling).
Because compaction is a compute-heavy process, vertical scaling (especially
increasing the available CPU) is the most effective scaling strategy for the Compactor.
Horizontal scaling increases compaction throughput, but not as efficiently as
vertical scaling.
### Garbage collector
The Garbage collector runs background jobs that evict expired or deleted data,
remove obsolete compaction files, and reclaim space in both the [Catalog](#catalog) and the
[Object store](#object-store).
##### Garbage collector scaling strategies
The Garbage collector is not designed for distributed load and should _not_ be
scaled horizontally. The Garbage collector does not perform CPU- or
memory-intensive work, so [vertical scaling](/influxdb3/clustered/admin/scale-cluster/#vertical-scaling)
should only be considered only if you observe very high CPU usage or
if the container regularly runs out of memory.

View File

@ -50,7 +50,7 @@ Prometheus CPU limit was set to an integer instead of a string.
#### Database Engine
- Upgrade DataFusion
- Add the ability to restore a cluster from a Catalog snapshot.
- Add the ability to restore a cluster from a Catalog store snapshot.
---

View File

@ -34,15 +34,15 @@ InfluxDB cluster.
## Tune garbage collection
Once data falls outside of a database's retention period, the garbage collection
service can remove all artifacts associated with the data from the Catalog and Object store.
service can remove all artifacts associated with the data from the Catalog store and Object store.
Tune the garbage collector cutoff period to ensure that data is removed in a timely manner.
Use the following environment variables to tune the garbage collector:
- `INFLUXDB_IOX_GC_OBJECTSTORE_CUTOFF`: the age at which Parquet files not
referenced in the Catalog become eligible for deletion from Object storage.
referenced in the Catalog store become eligible for deletion from Object storage.
The default is `30d`.
- `INFLUXDB_IOX_GC_PARQUETFILE_CUTOFF`: how long to retain rows in the Catalog
- `INFLUXDB_IOX_GC_PARQUETFILE_CUTOFF`: how long to retain rows in the Catalog store
that reference Parquet files marked for deletion. The default is `30d`.
These values tune how aggressive the garbage collector can be. A shorter duration
@ -68,7 +68,7 @@ Use the following scenarios as a guide for different use cases:
When only the most recent data is important and backups are not required, use a
very low cutoff point for the garbage collector.
Using a low value means that the garbage collection service will promptly delete
files from the Object store and remove rows associated rows from the Catalog.
files from the Object store and remove associated rows from the Catalog store.
This results in a lean Catalog with lower operational overhead and less files
in the Object store.
@ -101,8 +101,8 @@ Object store (provided by your object store provider), use a low cutoff point
for the garbage collector service. Your object versioning policy ensures expired
files are kept for the specified backup window time.
Object versioning maintains Parquet files in Objects storage after data expires,
but allows the Catalog to remove references to the Parquet files.
Object versioning maintains Parquet files in Object storage after data expires,
but allows the Catalog store to remove references to the Parquet files.
Non-current objects should be configured to be expired as soon as possible, but
retained long enough to satisfy your organization's backup policy.
@ -156,7 +156,7 @@ If you cannot make use of object versioning policies but still requires a backup
window, configure the garbage collector to retain Parquet files for as long as
your backup period requires.
This will likely result in higher operational costs as the Catalog maintains
This will likely result in higher operational costs as the Catalog store maintains
more references to associated Parquet files and the Parquet files persist for
longer in the Object store.