diff --git a/content/enterprise_influxdb/v1/administration/manage/clusters/rebalance.md b/content/enterprise_influxdb/v1/administration/manage/clusters/rebalance.md index db0b48532..b6652e831 100644 --- a/content/enterprise_influxdb/v1/administration/manage/clusters/rebalance.md +++ b/content/enterprise_influxdb/v1/administration/manage/clusters/rebalance.md @@ -40,11 +40,20 @@ cluster, and they use the [`influxd-ctl` tool](/enterprise_influxdb/v1/tools/influxd-ctl/) available on all meta nodes. -{{% warn %}} -Before you begin, stop writing historical data to InfluxDB. -Historical data have timestamps that occur at anytime in the past. -Performing a rebalance while writing historical data can lead to data loss. -{{% /warn %}} +> [!Warning] +> #### Stop writing data before rebalancing +> +> Before you begin, stop writing historical data to InfluxDB. +> Historical data have timestamps that occur at anytime in the past. +> Performing a rebalance while writing historical data can lead to data loss. + +> [!Caution] +> #### Risks of rebalancing with future data +> +> Truncating shards that contain data with future timestamps (such as forecast or prediction data) +> can lead to overlapping shards and data duplication. +> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data) +> or [contact InfluxData support](https://support.influxdata.com). ## Rebalance Procedure 1: Rebalance a cluster to create space @@ -67,6 +76,14 @@ Hot shards are shards that are currently receiving writes. Performing any action on a hot shard can lead to data inconsistency within the cluster which requires manual intervention from the user. +> [!Caution] +> #### Risks of rebalancing with future data +> +> Truncating shards that contain data with future timestamps (such as forecast or prediction data) +> can lead to overlapping shards and data duplication. +> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data) +> or [contact InfluxData support](https://support.influxdata.com). + To prevent data inconsistency, truncate hot shards before moving any shards across data nodes. The command below creates a new hot shard which is automatically distributed @@ -298,6 +315,14 @@ Hot shards are shards that are currently receiving writes. Performing any action on a hot shard can lead to data inconsistency within the cluster which requires manual intervention from the user. +> [!Caution] +> #### Risks of rebalancing with future data +> +> Truncating shards that contain data with future timestamps (such as forecast or prediction data) +> can lead to overlapping shards and data duplication. +> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data) +> or [contact InfluxData support](https://support.influxdata.com). + To prevent data inconsistency, truncate hot shards before copying any shards to the new data node. The command below creates a new hot shard which is automatically distributed diff --git a/content/enterprise_influxdb/v1/concepts/schema_and_data_layout.md b/content/enterprise_influxdb/v1/concepts/schema_and_data_layout.md index c60e8ccef..febf8c2dc 100644 --- a/content/enterprise_influxdb/v1/concepts/schema_and_data_layout.md +++ b/content/enterprise_influxdb/v1/concepts/schema_and_data_layout.md @@ -16,6 +16,7 @@ We recommend the following design guidelines for most use cases: - [Where to store data (tag or field)](#where-to-store-data-tag-or-field) - [Avoid too many series](#avoid-too-many-series) - [Use recommended naming conventions](#use-recommended-naming-conventions) + - [Writing data with future timestamps](#writing-data-with-future-timestamps) - [Shard Group Duration Management](#shard-group-duration-management) ## Where to store data (tag or field) @@ -209,6 +210,38 @@ from(bucket:"/") > SELECT mean("temp") FROM "weather_sensor" WHERE region = 'north' ``` +## Writing data with future timestamps + +When designing schemas for applications that write data with future timestamps--such as forecast data from machine learning models, predictions, or scheduled events--consider the following implications for InfluxDB Enterprise v1 cluster operations and data integrity. + +### Understanding future data behavior + +InfluxDB Enterprise v1 creates shards based on time ranges. +When you write data with future timestamps, InfluxDB creates shards that cover future time periods. + +> [!Caution] +> #### Risks of rebalancing with future data +> +> Truncating shards that contain data with future timestamps (such as forecast or prediction data) +> can lead to overlapping shards and data duplication. +> For more information, see [`truncate-shards` and future data](/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards/#understand-the-risks-with-future-data) +> or [contact InfluxData support](https://support.influxdata.com). + +### Use separate databases for future data + +When planning for data that contains future timestamps, consider isolating it in dedicated databases to: + +- Minimize impact on real-time data operations +- Allow targeted maintenance operations on current vs. future data +- Simplify backup and recovery strategies for different data types + +```sql +# Example: Separate databases for different data types +CREATE DATABASE "realtime_metrics" +CREATE DATABASE "ml_forecasts" +CREATE DATABASE "scheduled_predictions" +``` + ## Shard group duration management ### Shard group duration overview diff --git a/content/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards.md b/content/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards.md index f7dffef50..fce401ac2 100644 --- a/content/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards.md +++ b/content/enterprise_influxdb/v1/tools/influxd-ctl/truncate-shards.md @@ -17,6 +17,14 @@ The `influxd-ctl truncate-shards` command truncates all shards that are currentl being written to (also known as "hot" shards) and creates new shards to write new data to. +> [!Caution] +> #### Overlapping shards with forecast and future data +> +> Running `truncate-shards` on shards containing future timestamps can create +> overlapping shards with duplicate data points. +> +> [Understand the risks with future data](#understand-the-risks-with-future-data). + ## Usage ```sh @@ -40,3 +48,34 @@ _Also see [`influxd-ctl` global flags](/enterprise_influxdb/v1/tools/influxd-ctl ```bash influxd-ctl truncate-shards -delay 3m ``` + +## Understand the risks with future data + +> [!Important] +> If you need to rebalance shards that contain future data, contact [InfluxData support](https://www.influxdata.com/contact/) for assistance. + +When you write data points with timestamps in the future (for example, forecast data from machine learning models), +the `truncate-shards` command behaves differently and can cause data duplication issues. + +### How truncate-shards normally works + +For shards containing current data: +1. The command creates an artificial stop point in the shard at the truncation timestamp +2. Creates a new shard starting from the truncation point +3. Example: A one-week shard (Sunday to Saturday) becomes: + - Shard A: Sunday to truncation point (Wednesday 2pm) + - Shard B: Truncation point (Wednesday 2pm) to Saturday + +This works correctly because the meta nodes understand the boundaries and route queries appropriately. + +### The problem with future data + +For shards containing future timestamps: +1. The truncation doesn't cleanly split the shard at a point in time +2. Instead, it creates overlapping shards that cover the same time period +3. Example: If you're writing September forecast data in August: + - Original shard: September 1-7 + - After truncation: + - Shard A: September 1-7 (with data up to truncation) + - Shard B: September 1-7 (for new data after truncation) + - **Result**: Duplicate data points for the same timestamps