chore(v1): Cautions, risks, and mitigations for using truncate-shards with future data
- Closes influxdata/DAR/issues/534 - Contact Support for assistance - Add risks and technical details to truncate-shard command - Add cautions to rebalance guide - Add planning guidance for future data in schema_and_data_layoutpull/6370/head^2
parent
2bc9e1736d
commit
bba78ea40b
|
|
@ -40,11 +40,20 @@ cluster, and they use the
|
|||
[`influxd-ctl` tool](/enterprise_influxdb/v1/tools/influxd-ctl/) available on
|
||||
all meta nodes.
|
||||
|
||||
{{% warn %}}
|
||||
Before you begin, stop writing historical data to InfluxDB.
|
||||
Historical data have timestamps that occur at anytime in the past.
|
||||
Performing a rebalance while writing historical data can lead to data loss.
|
||||
{{% /warn %}}
|
||||
> [!Warning]
|
||||
> #### Stop writing data before rebalancing
|
||||
>
|
||||
> Before you begin, stop writing historical data to InfluxDB.
|
||||
> Historical data have timestamps that occur at anytime in the past.
|
||||
> Performing a rebalance while writing historical data can lead to data loss.
|
||||
|
||||
> [!Caution]
|
||||
> #### Risks of rebalancing with future data
|
||||
>
|
||||
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
|
||||
> can lead to overlapping shards and data duplication.
|
||||
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
|
||||
> or [contact InfluxData support](https://support.influxdata.com).
|
||||
|
||||
## Rebalance Procedure 1: Rebalance a cluster to create space
|
||||
|
||||
|
|
@ -67,6 +76,14 @@ Hot shards are shards that are currently receiving writes.
|
|||
Performing any action on a hot shard can lead to data inconsistency within the
|
||||
cluster which requires manual intervention from the user.
|
||||
|
||||
> [!Caution]
|
||||
> #### Risks of rebalancing with future data
|
||||
>
|
||||
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
|
||||
> can lead to overlapping shards and data duplication.
|
||||
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
|
||||
> or [contact InfluxData support](https://support.influxdata.com).
|
||||
|
||||
To prevent data inconsistency, truncate hot shards before moving any shards
|
||||
across data nodes.
|
||||
The command below creates a new hot shard which is automatically distributed
|
||||
|
|
@ -298,6 +315,14 @@ Hot shards are shards that are currently receiving writes.
|
|||
Performing any action on a hot shard can lead to data inconsistency within the
|
||||
cluster which requires manual intervention from the user.
|
||||
|
||||
> [!Caution]
|
||||
> #### Risks of rebalancing with future data
|
||||
>
|
||||
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
|
||||
> can lead to overlapping shards and data duplication.
|
||||
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
|
||||
> or [contact InfluxData support](https://support.influxdata.com).
|
||||
|
||||
To prevent data inconsistency, truncate hot shards before copying any shards
|
||||
to the new data node.
|
||||
The command below creates a new hot shard which is automatically distributed
|
||||
|
|
|
|||
|
|
@ -16,6 +16,7 @@ We recommend the following design guidelines for most use cases:
|
|||
- [Where to store data (tag or field)](#where-to-store-data-tag-or-field)
|
||||
- [Avoid too many series](#avoid-too-many-series)
|
||||
- [Use recommended naming conventions](#use-recommended-naming-conventions)
|
||||
- [Writing data with future timestamps](#writing-data-with-future-timestamps)
|
||||
- [Shard Group Duration Management](#shard-group-duration-management)
|
||||
|
||||
## Where to store data (tag or field)
|
||||
|
|
@ -209,6 +210,38 @@ from(bucket:"<database>/<retention_policy>")
|
|||
> SELECT mean("temp") FROM "weather_sensor" WHERE region = 'north'
|
||||
```
|
||||
|
||||
## Writing data with future timestamps
|
||||
|
||||
When designing schemas for applications that write data with future timestamps--such as forecast data from machine learning models, predictions, or scheduled events--consider the following implications for InfluxDB Enterprise v1 cluster operations and data integrity.
|
||||
|
||||
### Understanding future data behavior
|
||||
|
||||
InfluxDB Enterprise v1 creates shards based on time ranges.
|
||||
When you write data with future timestamps, InfluxDB creates shards that cover future time periods.
|
||||
|
||||
> [!Caution]
|
||||
> #### Risks of rebalancing with future data
|
||||
>
|
||||
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
|
||||
> can lead to overlapping shards and data duplication.
|
||||
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
|
||||
> or [contact InfluxData support](https://support.influxdata.com).
|
||||
|
||||
### Use separate databases for future data
|
||||
|
||||
When planning for data that contains future timestamps, consider isolating it in dedicated databases to:
|
||||
|
||||
- Minimize impact on real-time data operations
|
||||
- Allow targeted maintenance operations on current vs. future data
|
||||
- Simplify backup and recovery strategies for different data types
|
||||
|
||||
```sql
|
||||
# Example: Separate databases for different data types
|
||||
CREATE DATABASE "realtime_metrics"
|
||||
CREATE DATABASE "ml_forecasts"
|
||||
CREATE DATABASE "scheduled_predictions"
|
||||
```
|
||||
|
||||
## Shard group duration management
|
||||
|
||||
### Shard group duration overview
|
||||
|
|
|
|||
|
|
@ -17,6 +17,14 @@ The `influxd-ctl truncate-shards` command truncates all shards that are currentl
|
|||
being written to (also known as "hot" shards) and creates new shards to write
|
||||
new data to.
|
||||
|
||||
> [!Caution]
|
||||
> #### Overlapping shards with forecast and future data
|
||||
>
|
||||
> Running `truncate-shards` on shards containing future timestamps can create
|
||||
> overlapping shards with duplicate data points.
|
||||
>
|
||||
> [Understand the risks with future data](#understand-the-risks-with-future-data).
|
||||
|
||||
## Usage
|
||||
|
||||
```sh
|
||||
|
|
@ -40,3 +48,34 @@ _Also see [`influxd-ctl` global flags](/enterprise_influxdb/v1/tools/influxd-ctl
|
|||
```bash
|
||||
influxd-ctl truncate-shards -delay 3m
|
||||
```
|
||||
|
||||
## Understand the risks with future data
|
||||
|
||||
> [!Important]
|
||||
> If you need to rebalance shards that contain future data, contact [InfluxData support](https://www.influxdata.com/contact/) for assistance.
|
||||
|
||||
When you write data points with timestamps in the future (for example, forecast data from machine learning models),
|
||||
the `truncate-shards` command behaves differently and can cause data duplication issues.
|
||||
|
||||
### How truncate-shards normally works
|
||||
|
||||
For shards containing current data:
|
||||
1. The command creates an artificial stop point in the shard at the truncation timestamp
|
||||
2. Creates a new shard starting from the truncation point
|
||||
3. Example: A one-week shard (Sunday to Saturday) becomes:
|
||||
- Shard A: Sunday to truncation point (Wednesday 2pm)
|
||||
- Shard B: Truncation point (Wednesday 2pm) to Saturday
|
||||
|
||||
This works correctly because the meta nodes understand the boundaries and route queries appropriately.
|
||||
|
||||
### The problem with future data
|
||||
|
||||
For shards containing future timestamps:
|
||||
1. The truncation doesn't cleanly split the shard at a point in time
|
||||
2. Instead, it creates overlapping shards that cover the same time period
|
||||
3. Example: If you're writing September forecast data in August:
|
||||
- Original shard: September 1-7
|
||||
- After truncation:
|
||||
- Shard A: September 1-7 (with data up to truncation)
|
||||
- Shard B: September 1-7 (for new data after truncation)
|
||||
- **Result**: Duplicate data points for the same timestamps
|
||||
|
|
|
|||
Loading…
Reference in New Issue