Add data durability documentation to Dedicated and Serverless (#5105)

* added data durability to dedicated

* Update content/influxdb/cloud-dedicated/reference/internals/durability.md

Co-authored-by: Jason Stirnaman <stirnamanj@gmail.com>

* ported data durability doc from dedicated to serverless

* fixed serverless durability frontmatter

---------

Co-authored-by: Jason Stirnaman <stirnamanj@gmail.com>
pull/5099/head
Scott Anderson 2023-08-24 15:01:42 -06:00 committed by GitHub
parent 75adee41c8
commit 27cddb7773
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 176 additions and 0 deletions

View File

@ -0,0 +1,88 @@
---
title: InfluxDB Cloud Dedicated data durability
description: >
InfluxDB Cloud Dedicated replicates all time series data in the storage tier across
multiple availability zones within a cloud region and automatically creates backups
that can be used to restore data in the event of a node failure or data corruption.
weight: 102
menu:
influxdb_cloud_dedicated:
name: Data durability
parent: InfluxDB internals
influxdb/cloud-dedicated/tags: [backups, internals]
related:
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html, AWS S3 Data Durabililty
---
{{< cloud-name >}} replicates all time series data in the storage tier across
multiple availability zones within a cloud region and automatically creates backups
that can be used to restore data in the event of a node failure or data corruption.
## Data storage
In {{< cloud-name >}}, all measurements are stored in
[Apache Parquet](https://parquet.apache.org/) files that represent a
point-in-time snapshot of the data. The Parquet files are immutable and are
never replaced nor modified. Parquet files are stored in object storage.
<span id="influxdb-catalog"></span>
The _InfluxDB catalog_ is a relational, PostreSQL-compatible database that
contains references to all Parquet files in object storage and is used as an
index to find the appropriate Parquet files for a particular set of data.
### Data deletion
When data is deleted or when the retention period is reached for data within
a database, the associated Parquet files are marked as deleted _in the catalog_,
but the actual Parquet files are _not removed from object storage_.
All queries filter out data that has been marked as deleted.
Parquet files remain in object storage for approximately 100 days after the
youngest data in the Parquet file ages out of retention.
## Data ingest
When data is written to {{< cloud-name >}}, the data is first written to a
Write-Ahead-Log (WAL) on locally-attached storage on the ingester node before
the write request is acknowledged. After acknowledging the write request, the
ingester holds the data in memory temporarily and then writes the contents of
the WAL to Parquet files in object storage and updates the InfluxDB catalog to
reference the newly created Parquet files. If an ingester is gracefully shut
down (for example, during a new software deployment), it flushes the contents of
the WAL to the Parquet files before shutting down.
## Backups
{{< cloud-name >}} implements to following data backup strategies:
- **Backup of WAL file**: The WAL file is written on locally-attached storage.
If an ingester process fails, the new ingester simply reads the WAL file on
startup and continues normal operation. WAL files are maintained until their
contents have been written to the Parquet files in object storage.
For added protection, ingesters can be configured for write replication, where
each measurement is written to two different WAL files before acknowledging
the write.
- **Backup of Parquet files**: Parquet files are stored in object storage where
they are redundantly stored on multiple devices across a minimum of three
availability zones in a cloud Region. Parquet files associated with each
database are kept in object storage for the duration of database retention period
plus an additional time period (approximately 100 days).
- **Backup of catalog**: InfluxData keeps a transaction log of all recent updates
to the [InfluxDB catalog](#influxdb-catalog) and generates a daily backup of
the catalog. Backups are preserved for at least 100 days in object storage across a minimum
of three availability zones.
## Recovery
InfluxData can perform the following recovery operations:
- **Recovery after ingester failure**: If an ingester fails, a new ingester is
started up and reads from the WAL file for the recently ingested data.
- **Recovery of Parquet files**: {{< cloud-name >}} uses the provided object
storage-provided data durability to recover Parquet files.
- **Recovery of the catalog**: InfluxData can restore the InfluxDB catalog to
the most recent daily backup of the catalog and then reapply any transactions
that occurred since the interruption.

View File

@ -0,0 +1,88 @@
---
title: InfluxDB Cloud Serverless data durability
description: >
InfluxDB Cloud Serverless replicates all time series data in the storage tier across
multiple availability zones within a cloud region and automatically creates backups
that can be used to restore data in the event of a node failure or data corruption.
weight: 102
menu:
influxdb_cloud_serverless:
name: Data durability
parent: InfluxDB Cloud internals
influxdb/cloud-dedicated/tags: [backups, internals]
related:
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html, AWS S3 Data Durabililty
---
{{< cloud-name >}} replicates all time series data in the storage tier across
multiple availability zones within a cloud region and automatically creates backups
that can be used to restore data in the event of a node failure or data corruption.
## Data storage
In {{< cloud-name >}}, all measurements are stored in
[Apache Parquet](https://parquet.apache.org/) files that represent a
point-in-time snapshot of the data. The Parquet files are immutable and are
never replaced nor modified. Parquet files are stored in object storage.
<span id="influxdb-catalog"></span>
The _InfluxDB catalog_ is a relational, PostreSQL-compatible database that
contains references to all Parquet files in object storage and is used as an
index to find the appropriate Parquet files for a particular set of data.
### Data deletion
When data is deleted or when the retention period is reached for data within
a database, the associated Parquet files are marked as deleted _in the catalog_,
but the actual Parquet files are _not removed from object storage_.
All queries filter out data that has been marked as deleted.
Parquet files remain in object storage for approximately 100 days after the
youngest data in the Parquet file ages out of retention.
## Data ingest
When data is written to {{< cloud-name >}}, the data is first written to a
Write-Ahead-Log (WAL) on locally-attached storage on the ingester node before
the write request is acknowledged. After acknowledging the write request, the
ingester holds the data in memory temporarily and then writes the contents of
the WAL to Parquet files in object storage and updates the InfluxDB catalog to
reference the newly created Parquet files. If an ingester is gracefully shut
down (for example, during a new software deployment), it flushes the contents of
the WAL to the Parquet files before shutting down.
## Backups
{{< cloud-name >}} implements to following data backup strategies:
- **Backup of WAL file**: The WAL file is written on locally-attached storage.
If an ingester process fails, the new ingester simply reads the WAL file on
startup and continues normal operation. WAL files are maintained until their
contents have been written to the Parquet files in object storage.
For added protection, ingesters can be configured for write replication, where
each measurement is written to two different WAL files before acknowledging
the write.
- **Backup of Parquet files**: Parquet files are stored in object storage where
they are redundantly stored on multiple devices across a minimum of three
availability zones in a cloud Region. Parquet files associated with each
database are kept in object storage for the duration of database retention period
plus an additional time period (approximately 100 days).
- **Backup of catalog**: InfluxData keeps a transaction log of all recent updates
to the [InfluxDB catalog](#influxdb-catalog) and generates a daily backup of
the catalog. Backups are preserved for at least 100 days in object storage across a minimum
of three availability zones.
## Recovery
InfluxData can perform the following recovery operations:
- **Recovery after ingester failure**: If an ingester fails, a new ingester is
started up and reads from the WAL file for the recently ingested data.
- **Recovery of Parquet files**: {{< cloud-name >}} uses the provided object
storage-provided data durability to recover Parquet files.
- **Recovery of the catalog**: InfluxData can restore the InfluxDB catalog to
the most recent daily backup of the catalog and then reapply any transactions
that occurred since the interruption.