chore(v1): Cautions, risks, and mitigations for using truncate-shards with future data

- Closes influxdata/DAR/issues/534
- Contact Support for assistance
- Add risks and technical details to truncate-shard command
- Add cautions to rebalance guide
- Add planning guidance for future data in schema_and_data_layout
pull/6370/head^2
Jason Stirnaman 2025-09-08 17:04:56 -05:00 committed by Jason Stirnaman
parent 2bc9e1736d
commit bba78ea40b
3 changed files with 102 additions and 5 deletions

View File

@ -40,11 +40,20 @@ cluster, and they use the
[`influxd-ctl` tool](/enterprise_influxdb/v1/tools/influxd-ctl/) available on
all meta nodes.
{{% warn %}}
Before you begin, stop writing historical data to InfluxDB.
Historical data have timestamps that occur at anytime in the past.
Performing a rebalance while writing historical data can lead to data loss.
{{% /warn %}}
> [!Warning]
> #### Stop writing data before rebalancing
>
> Before you begin, stop writing historical data to InfluxDB.
> Historical data have timestamps that occur at anytime in the past.
> Performing a rebalance while writing historical data can lead to data loss.
> [!Caution]
> #### Risks of rebalancing with future data
>
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
> can lead to overlapping shards and data duplication.
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
> or [contact InfluxData support](https://support.influxdata.com).
## Rebalance Procedure 1: Rebalance a cluster to create space
@ -67,6 +76,14 @@ Hot shards are shards that are currently receiving writes.
Performing any action on a hot shard can lead to data inconsistency within the
cluster which requires manual intervention from the user.
> [!Caution]
> #### Risks of rebalancing with future data
>
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
> can lead to overlapping shards and data duplication.
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
> or [contact InfluxData support](https://support.influxdata.com).
To prevent data inconsistency, truncate hot shards before moving any shards
across data nodes.
The command below creates a new hot shard which is automatically distributed
@ -298,6 +315,14 @@ Hot shards are shards that are currently receiving writes.
Performing any action on a hot shard can lead to data inconsistency within the
cluster which requires manual intervention from the user.
> [!Caution]
> #### Risks of rebalancing with future data
>
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
> can lead to overlapping shards and data duplication.
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
> or [contact InfluxData support](https://support.influxdata.com).
To prevent data inconsistency, truncate hot shards before copying any shards
to the new data node.
The command below creates a new hot shard which is automatically distributed

View File

@ -16,6 +16,7 @@ We recommend the following design guidelines for most use cases:
- [Where to store data (tag or field)](#where-to-store-data-tag-or-field)
- [Avoid too many series](#avoid-too-many-series)
- [Use recommended naming conventions](#use-recommended-naming-conventions)
- [Writing data with future timestamps](#writing-data-with-future-timestamps)
- [Shard Group Duration Management](#shard-group-duration-management)
## Where to store data (tag or field)
@ -209,6 +210,38 @@ from(bucket:"<database>/<retention_policy>")
> SELECT mean("temp") FROM "weather_sensor" WHERE region = 'north'
```
## Writing data with future timestamps
When designing schemas for applications that write data with future timestamps--such as forecast data from machine learning models, predictions, or scheduled events--consider the following implications for InfluxDB Enterprise v1 cluster operations and data integrity.
### Understanding future data behavior
InfluxDB Enterprise v1 creates shards based on time ranges.
When you write data with future timestamps, InfluxDB creates shards that cover future time periods.
> [!Caution]
> #### Risks of rebalancing with future data
>
> Truncating shards that contain data with future timestamps (such as forecast or prediction data)
> can lead to overlapping shards and data duplication.
> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data)
> or [contact InfluxData support](https://support.influxdata.com).
### Use separate databases for future data
When planning for data that contains future timestamps, consider isolating it in dedicated databases to:
- Minimize impact on real-time data operations
- Allow targeted maintenance operations on current vs. future data
- Simplify backup and recovery strategies for different data types
```sql
# Example: Separate databases for different data types
CREATE DATABASE "realtime_metrics"
CREATE DATABASE "ml_forecasts"
CREATE DATABASE "scheduled_predictions"
```
## Shard group duration management
### Shard group duration overview

View File

@ -17,6 +17,14 @@ The `influxd-ctl truncate-shards` command truncates all shards that are currentl
being written to (also known as "hot" shards) and creates new shards to write
new data to.
> [!Caution]
> #### Overlapping shards with forecast and future data
>
> Running `truncate-shards` on shards containing future timestamps can create
> overlapping shards with duplicate data points.
>
> [Understand the risks with future data](#understand-the-risks-with-future-data).
## Usage
```sh
@ -40,3 +48,34 @@ _Also see [`influxd-ctl` global flags](/enterprise_influxdb/v1/tools/influxd-ctl
```bash
influxd-ctl truncate-shards -delay 3m
```
## Understand the risks with future data
> [!Important]
> If you need to rebalance shards that contain future data, contact [InfluxData support](https://www.influxdata.com/contact/) for assistance.
When you write data points with timestamps in the future (for example, forecast data from machine learning models),
the `truncate-shards` command behaves differently and can cause data duplication issues.
### How truncate-shards normally works
For shards containing current data:
1. The command creates an artificial stop point in the shard at the truncation timestamp
2. Creates a new shard starting from the truncation point
3. Example: A one-week shard (Sunday to Saturday) becomes:
- Shard A: Sunday to truncation point (Wednesday 2pm)
- Shard B: Truncation point (Wednesday 2pm) to Saturday
This works correctly because the meta nodes understand the boundaries and route queries appropriately.
### The problem with future data
For shards containing future timestamps:
1. The truncation doesn't cleanly split the shard at a point in time
2. Instead, it creates overlapping shards that cover the same time period
3. Example: If you're writing September forecast data in August:
- Original shard: September 1-7
- After truncation:
- Shard A: September 1-7 (with data up to truncation)
- Shard B: September 1-7 (for new data after truncation)
- **Result**: Duplicate data points for the same timestamps