fix: add note to distributed about duplicate points on flush (#6330)

2025-08-22 11:08:48 -06:00 · 2025-08-22 11:08:48 -06:00 · d49d69ba26
parent 2494c90cb8
commit d49d69ba26
5 changed files with 66 additions and 264 deletions
--- a/content/influxdb3/cloud-dedicated/write-data/best-practices/optimize-writes.md
+++ b/content/influxdb3/cloud-dedicated/write-data/best-practices/optimize-writes.md
@ -416,9 +416,23 @@ The following example creates sample data for two series (the combination of mea

 ### Avoid sending duplicate data

-Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup) to filter data whose field values are exact repetitions of previous values.
+When writing duplicate points (points with the same timestamp and tag set),
+InfluxDB deduplicates the data by creating a union of the duplicate points.
 Deduplicating your data can reduce your write payload size and resource usage.

+> [!Important]
+> #### Write ordering for duplicate points
+>
+> InfluxDB attempts to honor write ordering for duplicate points, with the most
+> recently written point taking precedence. However, when data is flushed from
+> the in-memory buffer to Parquet files—typically every 15 minutes, but
+> sometimes sooner—this ordering is not guaranteed if duplicate points are flushed
+> at the same time. As a result, the last written duplicate point may not always
+> be retained in storage.
+
+Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup)
+to filter data whose field values are exact repetitions of previous values.
+
 The following example shows how to use Telegraf to remove points that repeat field values, and then write the data to InfluxDB:

 1.  In your terminal, enter the following command to create the sample data file and calculate the number of seconds between the earliest timestamp and _now_.
--- a/content/influxdb3/cloud-serverless/write-data/best-practices/optimize-writes.md
+++ b/content/influxdb3/cloud-serverless/write-data/best-practices/optimize-writes.md
@ -430,9 +430,23 @@ The following example creates sample data for two series (the combination of mea

 ### Avoid sending duplicate data

-Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup) to filter data whose field values are exact repetitions of previous values.
+When writing duplicate points (points with the same timestamp and tag set),
+InfluxDB deduplicates the data by creating a union of the duplicate points.
 Deduplicating your data can reduce your write payload size and resource usage.

+> [!Important]
+> #### Write ordering for duplicate points
+>
+> InfluxDB attempts to honor write ordering for duplicate points, with the most
+> recently written point taking precedence. However, when data is flushed from
+> the in-memory buffer to Parquet files—typically every 15 minutes, but
+> sometimes sooner—this ordering is not guaranteed if duplicate points are flushed
+> at the same time. As a result, the last written duplicate point may not always
+> be retained in storage.
+
+Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup)
+to filter data whose field values are exact repetitions of previous values.
+
 The following example shows how to use Telegraf to remove points that repeat field values, and then write the data to InfluxDB:

 1.  In your terminal, enter the following command to create the sample data file and calculate the number of seconds between the earliest timestamp and _now_.
--- a/content/influxdb3/clustered/reference/syntax/line-protocol.md
+++ b/content/influxdb3/clustered/reference/syntax/line-protocol.md
@ -2,7 +2,7 @@
 title: Line protocol reference
 description: >
  InfluxDB uses line protocol to write data points.
-  It is a text-based format that provides the measurement, tag set, field set, and timestamp of a data point.
+  It is a text-based format that provides the table, tag set, field set, and timestamp of a data point.
 menu:
  influxdb3_clustered:
    name: Line protocol
@ -11,261 +11,8 @@ weight: 102
 influxdb3/clustered/tags: [write, line protocol, syntax]
 related:
  - /influxdb3/clustered/write-data/
+source: /shared/v3-line-protocol.md
 ---

-InfluxDB uses line protocol to write data points.
-It is a text-based format that provides the measurement, tag set, field set, and timestamp of a data point.
-
- [Elements of line protocol](#elements-of-line-protocol)
- [Data types and format](#data-types-and-format)
- [Quotes](#quotes)
- [Special characters](#special-characters)
- [Comments](#comments)
- [Naming restrictions](#naming-restrictions)
- [Duplicate points](#duplicate-points)
-
-```js
-// Syntax
-<measurement>[,<tag_key>=<tag_value>[,<tag_key>=<tag_value>]] <field_key>=<field_value>[,<field_key>=<field_value>] [<timestamp>]
-
-// Example
-myMeasurement,tag1=value1,tag2=value2 fieldKey="fieldValue" 1556813561098000000
-```
-
-Lines separated by the newline character `\n` represent a single point
-in InfluxDB. Line protocol is whitespace sensitive.
-
-> [!Note]
-> Line protocol does not support the newline character `\n` in tag or field values.
-
-## Elements of line protocol
-
-{{< influxdb/line-protocol commas=false whitespace=false >}}
-
-### Measurement
-({{< req >}})
-The measurement name.
-InfluxDB accepts one measurement per point.
-_Measurement names are case-sensitive and subject to [naming restrictions](#naming-restrictions)._
-
-_**Data type:** [String](#string)_
-
-
-### Tag set
-_**Optional**_ –
-All tag key-value pairs for the point.
-Key-value relationships are denoted with the `=` operand.
-Multiple tag key-value pairs are comma-delimited.
-_Tag keys and tag values are case-sensitive.
-Tag keys are subject to [naming restrictions](#naming-restrictions).
-Tag values cannot be empty; instead, omit the tag from the tag set._
-
-_**Key data type:** [String](#string)_  
-_**Value data type:** [String](#string)_
-
-### Field set
-({{< req >}})
-All field key-value pairs for the point.
-Points must have at least one field.
-_Field keys and string values are case-sensitive.
-Field keys are subject to [naming restrictions](#naming-restrictions)._
-
-_**Key data type:** [String](#string)_  
-_**Value data type:** [Float](#float) | [Integer](#integer) | [UInteger](#uinteger) | [String](#string) | [Boolean](#boolean)_
-
-> [!Note]
-> _Always double quote string field values. More on quotes [below](#quotes)._
-> 
-> ```sh
-> measurementName fieldKey="field string value" 1556813561098000000
-> ```
-
-### Timestamp
-_**Optional**_ –
-The [unix timestamp](/influxdb/v2/reference/glossary/#unix-timestamp) for the data point.
-InfluxDB accepts one timestamp per point.
-If no timestamp is provided, InfluxDB uses the system time (UTC) of its host machine.
-
-_**Data type:** [Unix timestamp](#unix-timestamp)_
-
-> [!Note]
-> #### Important notes about timestamps
->
-> - To ensure a data point includes the time a metric is observed (not received by InfluxDB),
->   include the timestamp.
-> - If your timestamps are not in nanoseconds, specify the precision of your timestamps
->   when [writing the data to InfluxDB](/influxdb/v2/write-data/#timestamp-precision).
-
-### Whitespace
-Whitespace in line protocol determines how InfluxDB interprets the data point.
-The **first unescaped space** delimits the measurement and the tag set from the field set.
-The **second unescaped space** delimits the field set from the timestamp.
-
-{{< influxdb/line-protocol elements=false commas=false >}}
-
-## Data types and format
-
-### Float
-IEEE-754 64-bit floating-point numbers.
-Default numerical type.
-_InfluxDB supports scientific notation in float field values._
-
-##### Float field value examples
-```js
-myMeasurement fieldKey=1.0
-myMeasurement fieldKey=1
-myMeasurement fieldKey=-1.234456e+78
-```
-
-### Integer
-Signed 64-bit integers.
-Trailing `i` on the number specifies an integer.
-
-| Minimum integer         | Maximum integer        |
-| ---------------         | ---------------        |
-| `-9223372036854775808i` | `9223372036854775807i` |
-
-##### Integer field value examples
-```js
-myMeasurement fieldKey=1i
-myMeasurement fieldKey=12485903i
-myMeasurement fieldKey=-12485903i
-```
-
-### UInteger
-Unsigned 64-bit integers.
-Trailing `u` on the number specifies an unsigned integer.
-
-| Minimum uinteger | Maximum uinteger        |
-| ---------------- | ----------------        |
-| `0u`             | `18446744073709551615u` |
-
-##### UInteger field value examples
-```js
-myMeasurement fieldKey=1u
-myMeasurement fieldKey=12485903u
-```
-
-### String
-Plain text string.
-Length limit 64KB.
-
-##### String example
-```sh
-# String measurement name, field key, and field value
-myMeasurement fieldKey="this is a string"
-```
-
-### Boolean
-Stores `true` or `false` values.
-
-| Boolean value | Accepted syntax                     |
-|:-------------:|:---------------                     |
-| True          | `t`, `T`, `true`, `True`, `TRUE`    |
-| False         | `f`, `F`, `false`, `False`, `FALSE` |
-
-##### Boolean field value examples
-```js
-myMeasurement fieldKey=true
-myMeasurement fieldKey=false
-myMeasurement fieldKey=t
-myMeasurement fieldKey=f
-myMeasurement fieldKey=TRUE
-myMeasurement fieldKey=FALSE
-```
-
-> [!Note]
-> Do not quote boolean field values.
-> Quoted field values are interpreted as strings.
-
-### Unix timestamp
-Unix timestamp in a [specified precision](/influxdb/v2/reference/glossary/#unix-timestamp).
-Default precision is nanoseconds (`ns`).
-
-| Minimum timestamp      | Maximum timestamp     |
-| -----------------      | -----------------     |
-| `-9223372036854775806` | `9223372036854775806` |
-
-##### Unix timestamp example
-```js
-myMeasurementName fieldKey="fieldValue" 1556813561098000000
-```
-
-## Quotes
-Line protocol supports single and double quotes as described in the following table:
-
-| Element     | Double quotes                           | Single quotes                           |
-| :------     | :------------:                          |:-------------:                          |
-| Measurement | _Limited_ <sup class="required">*</sup> | _Limited_ <sup class="required">*</sup> |
-| Tag key     | _Limited_ <sup class="required">*</sup> | _Limited_ <sup class="required">*</sup> |
-| Tag value   | _Limited_ <sup class="required">*</sup> | _Limited_ <sup class="required">*</sup> |
-| Field key   | _Limited_ <sup class="required">*</sup> | _Limited_ <sup class="required">*</sup> |
-| Field value | **Strings only**                        | Never                                   |
-| Timestamp   | Never                                   | Never                                   |
-
-<sup class="required">\*</sup> _Line protocol accepts double and single quotes in
-measurement names, tag keys, tag values, and field keys, but interprets them as
-part of the name, key, or value._
-
-## Special Characters
-Line protocol supports special characters in [string elements](#string).
-In the following contexts, it requires escaping certain characters with a backslash (`\`):
-
-| Element     | Escape characters         |
-|:-------     |:-----------------         |
-| Measurement | Comma, Space              |
-| Tag key     | Comma, Equals Sign, Space |
-| Tag value   | Comma, Equals Sign, Space |
-| Field key   | Comma, Equals Sign, Space |
-| Field value | Double quote, Backslash   |
-
-You do not need to escape other special characters.
-
-##### Examples of special characters in line protocol
-```sh
-# Measurement name with spaces
-my\ Measurement fieldKey="string value"
-
-# Double quotes in a string field value
-myMeasurement fieldKey="\"string\" within a string"
-
-# Tag keys and values with spaces
-myMeasurement,tag\ Key1=tag\ Value1,tag\ Key2=tag\ Value2 fieldKey=100
-
-# Emojis
-myMeasurement,tagKey=🍭 fieldKey="Launch 🚀" 1556813561098000000
-```
-
-### Escaping backslashes
-Line protocol supports both literal backslashes and backslashes as an escape character.
-With two contiguous backslashes, the first is interpreted as an escape character.
-For example:
-
-| Backslashes | Interpreted as |
-|:-----------:|:-------------:|
-| `\`         | `\`           |
-| `\\`        | `\`           |
-| `\\\`       | `\\`          |
-| `\\\\`      | `\\`          |
-| `\\\\\`     | `\\\`         |
-| `\\\\\\`    | `\\\`         |
-
-## Comments
-Line protocol interprets `#` at the beginning of a line as a comment character
-and ignores all subsequent characters until the next newline `\n`.
-
-```sh
-# This is a comment
-myMeasurement fieldKey="string value" 1556813561098000000
-```
-
-## Naming restrictions
-Measurement names, tag keys, and field keys cannot begin with an underscore `_`.
-The `_` namespace is reserved for InfluxDB system use.
-
-## Duplicate points
-A point is uniquely identified by the measurement name, tag set, and timestamp.
-If you submit line protocol with the same measurement, tag set, and timestamp,
-but with a different field set, the field set becomes the union of the old
-field set and the new field set, where any conflicts favor the new field set.
-
+<!-- The content of this file is at 
+// SOURCE content/shared/v3-line-protocol.md-->
--- a/content/influxdb3/clustered/write-data/best-practices/optimize-writes.md
+++ b/content/influxdb3/clustered/write-data/best-practices/optimize-writes.md
@ -14,7 +14,8 @@ related:
  - /influxdb3/clustered/write-data/use-telegraf/
 ---

-Use these tips to optimize performance and system overhead when writing data to InfluxDB.
+Use these tips to optimize performance and system overhead when writing data to
+{{% product-name %}}.

 - [Batch writes](#batch-writes)
 - [Sort tags by key](#sort-tags-by-key)
@ -422,9 +423,23 @@ The following example creates sample data for two series (the combination of mea

 ### Avoid sending duplicate data

-Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup) to filter data whose field values are exact repetitions of previous values.
+When writing duplicate points (points with the same timestamp and tag set),
+InfluxDB deduplicates the data by creating a union of the duplicate points.
 Deduplicating your data can reduce your write payload size and resource usage.

+> [!Important]
+> #### Write ordering for duplicate points
+>
+> InfluxDB attempts to honor write ordering for duplicate points, with the most
+> recently written point taking precedence. However, when data is flushed from
+> the in-memory buffer to Parquet files—typically every 15 minutes, but
+> sometimes sooner—this ordering is not guaranteed if duplicate points are flushed
+> at the same time. As a result, the last written duplicate point may not always
+> be retained in storage.
+
+Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup)
+to filter data whose field values are exact repetitions of previous values.
+
 The following example shows how to use Telegraf to remove points that repeat field values, and then write the data to InfluxDB:

 1.  In your terminal, enter the following command to create the sample data file and calculate the number of seconds between the earliest timestamp and _now_.
--- a/content/shared/v3-line-protocol.md
+++ b/content/shared/v3-line-protocol.md
@ -44,7 +44,7 @@ _**Data type:** [String](#string)_

 ### Tag set

-_**Optional**_ –
+(_**Optional**_)
 All tag key-value pairs for the point.
 Key-value relationships are denoted with the `=` operand.
 Multiple tag key-value pairs are comma-delimited.
@ -75,8 +75,8 @@ _**Value data type:** [Float](#float) | [Integer](#integer) | [UInteger](#uinteg

 ### Timestamp

-_**Optional**_ –
-The [unix timestamp](/influxdb3/version/reference/glossary/#unix-timestamp) for the data point.
+(_**Optional**_)
+The [Unix timestamp](/influxdb3/version/reference/glossary/#unix-timestamp) for the data point.
 InfluxDB accepts one timestamp per point.
 If no timestamp is provided, InfluxDB uses the system time (UTC) of its host machine.

@ -282,3 +282,15 @@ A point is uniquely identified by the table name, tag set, and timestamp.
 If you submit line protocol with the same table, tag set, and timestamp,
 but with a different field set, the field set becomes the union of the old
 field set and the new field set, where any conflicts favor the new field set.
+
+{{% show-in "cloud-dedicated,clustered" %}}
+> [!Important]
+> #### Write ordering for duplicate points
+>
+> {{% product-name %}} attempts to honor write ordering for duplicate points,
+> with the most recently written point taking precedence. However, when data is
+> flushed from the in-memory buffer to Parquet files—typically every 15 minutes,
+> but sometimes sooner—this ordering is not guaranteed if duplicate points are
+> flushed at the same time. As a result, the last written duplicate point may
+> not always be retained in storage.
+{{% /show-in %}}