From d49d69ba2612485984df21984f4c9ed6d1367a41 Mon Sep 17 00:00:00 2001 From: Scott Anderson Date: Fri, 22 Aug 2025 11:08:48 -0600 Subject: [PATCH] fix: add note to distributed about duplicate points on flush (#6330) --- .../best-practices/optimize-writes.md | 16 +- .../best-practices/optimize-writes.md | 16 +- .../reference/syntax/line-protocol.md | 261 +----------------- .../best-practices/optimize-writes.md | 19 +- content/shared/v3-line-protocol.md | 18 +- 5 files changed, 66 insertions(+), 264 deletions(-) diff --git a/content/influxdb3/cloud-dedicated/write-data/best-practices/optimize-writes.md b/content/influxdb3/cloud-dedicated/write-data/best-practices/optimize-writes.md index 71917fe4c..8f0de2ca7 100644 --- a/content/influxdb3/cloud-dedicated/write-data/best-practices/optimize-writes.md +++ b/content/influxdb3/cloud-dedicated/write-data/best-practices/optimize-writes.md @@ -416,9 +416,23 @@ The following example creates sample data for two series (the combination of mea ### Avoid sending duplicate data -Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup) to filter data whose field values are exact repetitions of previous values. +When writing duplicate points (points with the same timestamp and tag set), +InfluxDB deduplicates the data by creating a union of the duplicate points. Deduplicating your data can reduce your write payload size and resource usage. +> [!Important] +> #### Write ordering for duplicate points +> +> InfluxDB attempts to honor write ordering for duplicate points, with the most +> recently written point taking precedence. However, when data is flushed from +> the in-memory buffer to Parquet files—typically every 15 minutes, but +> sometimes sooner—this ordering is not guaranteed if duplicate points are flushed +> at the same time. As a result, the last written duplicate point may not always +> be retained in storage. + +Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup) +to filter data whose field values are exact repetitions of previous values. + The following example shows how to use Telegraf to remove points that repeat field values, and then write the data to InfluxDB: 1. In your terminal, enter the following command to create the sample data file and calculate the number of seconds between the earliest timestamp and _now_. diff --git a/content/influxdb3/cloud-serverless/write-data/best-practices/optimize-writes.md b/content/influxdb3/cloud-serverless/write-data/best-practices/optimize-writes.md index bb4c9fe92..ed01029d2 100644 --- a/content/influxdb3/cloud-serverless/write-data/best-practices/optimize-writes.md +++ b/content/influxdb3/cloud-serverless/write-data/best-practices/optimize-writes.md @@ -430,9 +430,23 @@ The following example creates sample data for two series (the combination of mea ### Avoid sending duplicate data -Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup) to filter data whose field values are exact repetitions of previous values. +When writing duplicate points (points with the same timestamp and tag set), +InfluxDB deduplicates the data by creating a union of the duplicate points. Deduplicating your data can reduce your write payload size and resource usage. +> [!Important] +> #### Write ordering for duplicate points +> +> InfluxDB attempts to honor write ordering for duplicate points, with the most +> recently written point taking precedence. However, when data is flushed from +> the in-memory buffer to Parquet files—typically every 15 minutes, but +> sometimes sooner—this ordering is not guaranteed if duplicate points are flushed +> at the same time. As a result, the last written duplicate point may not always +> be retained in storage. + +Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup) +to filter data whose field values are exact repetitions of previous values. + The following example shows how to use Telegraf to remove points that repeat field values, and then write the data to InfluxDB: 1. In your terminal, enter the following command to create the sample data file and calculate the number of seconds between the earliest timestamp and _now_. diff --git a/content/influxdb3/clustered/reference/syntax/line-protocol.md b/content/influxdb3/clustered/reference/syntax/line-protocol.md index 7947d9e93..87ff87707 100644 --- a/content/influxdb3/clustered/reference/syntax/line-protocol.md +++ b/content/influxdb3/clustered/reference/syntax/line-protocol.md @@ -2,7 +2,7 @@ title: Line protocol reference description: > InfluxDB uses line protocol to write data points. - It is a text-based format that provides the measurement, tag set, field set, and timestamp of a data point. + It is a text-based format that provides the table, tag set, field set, and timestamp of a data point. menu: influxdb3_clustered: name: Line protocol @@ -11,261 +11,8 @@ weight: 102 influxdb3/clustered/tags: [write, line protocol, syntax] related: - /influxdb3/clustered/write-data/ +source: /shared/v3-line-protocol.md --- -InfluxDB uses line protocol to write data points. -It is a text-based format that provides the measurement, tag set, field set, and timestamp of a data point. - -- [Elements of line protocol](#elements-of-line-protocol) -- [Data types and format](#data-types-and-format) -- [Quotes](#quotes) -- [Special characters](#special-characters) -- [Comments](#comments) -- [Naming restrictions](#naming-restrictions) -- [Duplicate points](#duplicate-points) - -```js -// Syntax -[,=[,=]] =[,=] [] - -// Example -myMeasurement,tag1=value1,tag2=value2 fieldKey="fieldValue" 1556813561098000000 -``` - -Lines separated by the newline character `\n` represent a single point -in InfluxDB. Line protocol is whitespace sensitive. - -> [!Note] -> Line protocol does not support the newline character `\n` in tag or field values. - -## Elements of line protocol - -{{< influxdb/line-protocol commas=false whitespace=false >}} - -### Measurement -({{< req >}}) -The measurement name. -InfluxDB accepts one measurement per point. -_Measurement names are case-sensitive and subject to [naming restrictions](#naming-restrictions)._ - -_**Data type:** [String](#string)_ - - -### Tag set -_**Optional**_ – -All tag key-value pairs for the point. -Key-value relationships are denoted with the `=` operand. -Multiple tag key-value pairs are comma-delimited. -_Tag keys and tag values are case-sensitive. -Tag keys are subject to [naming restrictions](#naming-restrictions). -Tag values cannot be empty; instead, omit the tag from the tag set._ - -_**Key data type:** [String](#string)_ -_**Value data type:** [String](#string)_ - -### Field set -({{< req >}}) -All field key-value pairs for the point. -Points must have at least one field. -_Field keys and string values are case-sensitive. -Field keys are subject to [naming restrictions](#naming-restrictions)._ - -_**Key data type:** [String](#string)_ -_**Value data type:** [Float](#float) | [Integer](#integer) | [UInteger](#uinteger) | [String](#string) | [Boolean](#boolean)_ - -> [!Note] -> _Always double quote string field values. More on quotes [below](#quotes)._ -> -> ```sh -> measurementName fieldKey="field string value" 1556813561098000000 -> ``` - -### Timestamp -_**Optional**_ – -The [unix timestamp](/influxdb/v2/reference/glossary/#unix-timestamp) for the data point. -InfluxDB accepts one timestamp per point. -If no timestamp is provided, InfluxDB uses the system time (UTC) of its host machine. - -_**Data type:** [Unix timestamp](#unix-timestamp)_ - -> [!Note] -> #### Important notes about timestamps -> -> - To ensure a data point includes the time a metric is observed (not received by InfluxDB), -> include the timestamp. -> - If your timestamps are not in nanoseconds, specify the precision of your timestamps -> when [writing the data to InfluxDB](/influxdb/v2/write-data/#timestamp-precision). - -### Whitespace -Whitespace in line protocol determines how InfluxDB interprets the data point. -The **first unescaped space** delimits the measurement and the tag set from the field set. -The **second unescaped space** delimits the field set from the timestamp. - -{{< influxdb/line-protocol elements=false commas=false >}} - -## Data types and format - -### Float -IEEE-754 64-bit floating-point numbers. -Default numerical type. -_InfluxDB supports scientific notation in float field values._ - -##### Float field value examples -```js -myMeasurement fieldKey=1.0 -myMeasurement fieldKey=1 -myMeasurement fieldKey=-1.234456e+78 -``` - -### Integer -Signed 64-bit integers. -Trailing `i` on the number specifies an integer. - -| Minimum integer | Maximum integer | -| --------------- | --------------- | -| `-9223372036854775808i` | `9223372036854775807i` | - -##### Integer field value examples -```js -myMeasurement fieldKey=1i -myMeasurement fieldKey=12485903i -myMeasurement fieldKey=-12485903i -``` - -### UInteger -Unsigned 64-bit integers. -Trailing `u` on the number specifies an unsigned integer. - -| Minimum uinteger | Maximum uinteger | -| ---------------- | ---------------- | -| `0u` | `18446744073709551615u` | - -##### UInteger field value examples -```js -myMeasurement fieldKey=1u -myMeasurement fieldKey=12485903u -``` - -### String -Plain text string. -Length limit 64KB. - -##### String example -```sh -# String measurement name, field key, and field value -myMeasurement fieldKey="this is a string" -``` - -### Boolean -Stores `true` or `false` values. - -| Boolean value | Accepted syntax | -|:-------------:|:--------------- | -| True | `t`, `T`, `true`, `True`, `TRUE` | -| False | `f`, `F`, `false`, `False`, `FALSE` | - -##### Boolean field value examples -```js -myMeasurement fieldKey=true -myMeasurement fieldKey=false -myMeasurement fieldKey=t -myMeasurement fieldKey=f -myMeasurement fieldKey=TRUE -myMeasurement fieldKey=FALSE -``` - -> [!Note] -> Do not quote boolean field values. -> Quoted field values are interpreted as strings. - -### Unix timestamp -Unix timestamp in a [specified precision](/influxdb/v2/reference/glossary/#unix-timestamp). -Default precision is nanoseconds (`ns`). - -| Minimum timestamp | Maximum timestamp | -| ----------------- | ----------------- | -| `-9223372036854775806` | `9223372036854775806` | - -##### Unix timestamp example -```js -myMeasurementName fieldKey="fieldValue" 1556813561098000000 -``` - -## Quotes -Line protocol supports single and double quotes as described in the following table: - -| Element | Double quotes | Single quotes | -| :------ | :------------: |:-------------: | -| Measurement | _Limited_ * | _Limited_ * | -| Tag key | _Limited_ * | _Limited_ * | -| Tag value | _Limited_ * | _Limited_ * | -| Field key | _Limited_ * | _Limited_ * | -| Field value | **Strings only** | Never | -| Timestamp | Never | Never | - -\* _Line protocol accepts double and single quotes in -measurement names, tag keys, tag values, and field keys, but interprets them as -part of the name, key, or value._ - -## Special Characters -Line protocol supports special characters in [string elements](#string). -In the following contexts, it requires escaping certain characters with a backslash (`\`): - -| Element | Escape characters | -|:------- |:----------------- | -| Measurement | Comma, Space | -| Tag key | Comma, Equals Sign, Space | -| Tag value | Comma, Equals Sign, Space | -| Field key | Comma, Equals Sign, Space | -| Field value | Double quote, Backslash | - -You do not need to escape other special characters. - -##### Examples of special characters in line protocol -```sh -# Measurement name with spaces -my\ Measurement fieldKey="string value" - -# Double quotes in a string field value -myMeasurement fieldKey="\"string\" within a string" - -# Tag keys and values with spaces -myMeasurement,tag\ Key1=tag\ Value1,tag\ Key2=tag\ Value2 fieldKey=100 - -# Emojis -myMeasurement,tagKey=🍭 fieldKey="Launch 🚀" 1556813561098000000 -``` - -### Escaping backslashes -Line protocol supports both literal backslashes and backslashes as an escape character. -With two contiguous backslashes, the first is interpreted as an escape character. -For example: - -| Backslashes | Interpreted as | -|:-----------:|:-------------:| -| `\` | `\` | -| `\\` | `\` | -| `\\\` | `\\` | -| `\\\\` | `\\` | -| `\\\\\` | `\\\` | -| `\\\\\\` | `\\\` | - -## Comments -Line protocol interprets `#` at the beginning of a line as a comment character -and ignores all subsequent characters until the next newline `\n`. - -```sh -# This is a comment -myMeasurement fieldKey="string value" 1556813561098000000 -``` - -## Naming restrictions -Measurement names, tag keys, and field keys cannot begin with an underscore `_`. -The `_` namespace is reserved for InfluxDB system use. - -## Duplicate points -A point is uniquely identified by the measurement name, tag set, and timestamp. -If you submit line protocol with the same measurement, tag set, and timestamp, -but with a different field set, the field set becomes the union of the old -field set and the new field set, where any conflicts favor the new field set. - + diff --git a/content/influxdb3/clustered/write-data/best-practices/optimize-writes.md b/content/influxdb3/clustered/write-data/best-practices/optimize-writes.md index 9e9dff460..b19502238 100644 --- a/content/influxdb3/clustered/write-data/best-practices/optimize-writes.md +++ b/content/influxdb3/clustered/write-data/best-practices/optimize-writes.md @@ -14,7 +14,8 @@ related: - /influxdb3/clustered/write-data/use-telegraf/ --- -Use these tips to optimize performance and system overhead when writing data to InfluxDB. +Use these tips to optimize performance and system overhead when writing data to +{{% product-name %}}. - [Batch writes](#batch-writes) - [Sort tags by key](#sort-tags-by-key) @@ -422,9 +423,23 @@ The following example creates sample data for two series (the combination of mea ### Avoid sending duplicate data -Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup) to filter data whose field values are exact repetitions of previous values. +When writing duplicate points (points with the same timestamp and tag set), +InfluxDB deduplicates the data by creating a union of the duplicate points. Deduplicating your data can reduce your write payload size and resource usage. +> [!Important] +> #### Write ordering for duplicate points +> +> InfluxDB attempts to honor write ordering for duplicate points, with the most +> recently written point taking precedence. However, when data is flushed from +> the in-memory buffer to Parquet files—typically every 15 minutes, but +> sometimes sooner—this ordering is not guaranteed if duplicate points are flushed +> at the same time. As a result, the last written duplicate point may not always +> be retained in storage. + +Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup) +to filter data whose field values are exact repetitions of previous values. + The following example shows how to use Telegraf to remove points that repeat field values, and then write the data to InfluxDB: 1. In your terminal, enter the following command to create the sample data file and calculate the number of seconds between the earliest timestamp and _now_. diff --git a/content/shared/v3-line-protocol.md b/content/shared/v3-line-protocol.md index 0f197053d..323fb8d26 100644 --- a/content/shared/v3-line-protocol.md +++ b/content/shared/v3-line-protocol.md @@ -44,7 +44,7 @@ _**Data type:** [String](#string)_ ### Tag set -_**Optional**_ – +(_**Optional**_) All tag key-value pairs for the point. Key-value relationships are denoted with the `=` operand. Multiple tag key-value pairs are comma-delimited. @@ -75,8 +75,8 @@ _**Value data type:** [Float](#float) | [Integer](#integer) | [UInteger](#uinteg ### Timestamp -_**Optional**_ – -The [unix timestamp](/influxdb3/version/reference/glossary/#unix-timestamp) for the data point. +(_**Optional**_) +The [Unix timestamp](/influxdb3/version/reference/glossary/#unix-timestamp) for the data point. InfluxDB accepts one timestamp per point. If no timestamp is provided, InfluxDB uses the system time (UTC) of its host machine. @@ -282,3 +282,15 @@ A point is uniquely identified by the table name, tag set, and timestamp. If you submit line protocol with the same table, tag set, and timestamp, but with a different field set, the field set becomes the union of the old field set and the new field set, where any conflicts favor the new field set. + +{{% show-in "cloud-dedicated,clustered" %}} +> [!Important] +> #### Write ordering for duplicate points +> +> {{% product-name %}} attempts to honor write ordering for duplicate points, +> with the most recently written point taking precedence. However, when data is +> flushed from the in-memory buffer to Parquet files—typically every 15 minutes, +> but sometimes sooner—this ordering is not guaranteed if duplicate points are +> flushed at the same time. As a result, the last written duplicate point may +> not always be retained in storage. +{{% /show-in %}}