fix: add note to distributed about duplicate points on flush (#6330)

pull/6334/head link-checker-v1.2.3
Scott Anderson 2025-08-22 11:08:48 -06:00 committed by GitHub
parent 2494c90cb8
commit d49d69ba26
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 66 additions and 264 deletions

View File

@ -416,9 +416,23 @@ The following example creates sample data for two series (the combination of mea
### Avoid sending duplicate data
Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup) to filter data whose field values are exact repetitions of previous values.
When writing duplicate points (points with the same timestamp and tag set),
InfluxDB deduplicates the data by creating a union of the duplicate points.
Deduplicating your data can reduce your write payload size and resource usage.
> [!Important]
> #### Write ordering for duplicate points
>
> InfluxDB attempts to honor write ordering for duplicate points, with the most
> recently written point taking precedence. However, when data is flushed from
> the in-memory buffer to Parquet files—typically every 15 minutes, but
> sometimes sooner—this ordering is not guaranteed if duplicate points are flushed
> at the same time. As a result, the last written duplicate point may not always
> be retained in storage.
Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup)
to filter data whose field values are exact repetitions of previous values.
The following example shows how to use Telegraf to remove points that repeat field values, and then write the data to InfluxDB:
1. In your terminal, enter the following command to create the sample data file and calculate the number of seconds between the earliest timestamp and _now_.

View File

@ -430,9 +430,23 @@ The following example creates sample data for two series (the combination of mea
### Avoid sending duplicate data
Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup) to filter data whose field values are exact repetitions of previous values.
When writing duplicate points (points with the same timestamp and tag set),
InfluxDB deduplicates the data by creating a union of the duplicate points.
Deduplicating your data can reduce your write payload size and resource usage.
> [!Important]
> #### Write ordering for duplicate points
>
> InfluxDB attempts to honor write ordering for duplicate points, with the most
> recently written point taking precedence. However, when data is flushed from
> the in-memory buffer to Parquet files—typically every 15 minutes, but
> sometimes sooner—this ordering is not guaranteed if duplicate points are flushed
> at the same time. As a result, the last written duplicate point may not always
> be retained in storage.
Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup)
to filter data whose field values are exact repetitions of previous values.
The following example shows how to use Telegraf to remove points that repeat field values, and then write the data to InfluxDB:
1. In your terminal, enter the following command to create the sample data file and calculate the number of seconds between the earliest timestamp and _now_.

View File

@ -2,7 +2,7 @@
title: Line protocol reference
description: >
InfluxDB uses line protocol to write data points.
It is a text-based format that provides the measurement, tag set, field set, and timestamp of a data point.
It is a text-based format that provides the table, tag set, field set, and timestamp of a data point.
menu:
influxdb3_clustered:
name: Line protocol
@ -11,261 +11,8 @@ weight: 102
influxdb3/clustered/tags: [write, line protocol, syntax]
related:
- /influxdb3/clustered/write-data/
source: /shared/v3-line-protocol.md
---
InfluxDB uses line protocol to write data points.
It is a text-based format that provides the measurement, tag set, field set, and timestamp of a data point.
- [Elements of line protocol](#elements-of-line-protocol)
- [Data types and format](#data-types-and-format)
- [Quotes](#quotes)
- [Special characters](#special-characters)
- [Comments](#comments)
- [Naming restrictions](#naming-restrictions)
- [Duplicate points](#duplicate-points)
```js
// Syntax
<measurement>[,<tag_key>=<tag_value>[,<tag_key>=<tag_value>]] <field_key>=<field_value>[,<field_key>=<field_value>] [<timestamp>]
// Example
myMeasurement,tag1=value1,tag2=value2 fieldKey="fieldValue" 1556813561098000000
```
Lines separated by the newline character `\n` represent a single point
in InfluxDB. Line protocol is whitespace sensitive.
> [!Note]
> Line protocol does not support the newline character `\n` in tag or field values.
## Elements of line protocol
{{< influxdb/line-protocol commas=false whitespace=false >}}
### Measurement
({{< req >}})
The measurement name.
InfluxDB accepts one measurement per point.
_Measurement names are case-sensitive and subject to [naming restrictions](#naming-restrictions)._
_**Data type:** [String](#string)_
### Tag set
_**Optional**_
All tag key-value pairs for the point.
Key-value relationships are denoted with the `=` operand.
Multiple tag key-value pairs are comma-delimited.
_Tag keys and tag values are case-sensitive.
Tag keys are subject to [naming restrictions](#naming-restrictions).
Tag values cannot be empty; instead, omit the tag from the tag set._
_**Key data type:** [String](#string)_
_**Value data type:** [String](#string)_
### Field set
({{< req >}})
All field key-value pairs for the point.
Points must have at least one field.
_Field keys and string values are case-sensitive.
Field keys are subject to [naming restrictions](#naming-restrictions)._
_**Key data type:** [String](#string)_
_**Value data type:** [Float](#float) | [Integer](#integer) | [UInteger](#uinteger) | [String](#string) | [Boolean](#boolean)_
> [!Note]
> _Always double quote string field values. More on quotes [below](#quotes)._
>
> ```sh
> measurementName fieldKey="field string value" 1556813561098000000
> ```
### Timestamp
_**Optional**_
The [unix timestamp](/influxdb/v2/reference/glossary/#unix-timestamp) for the data point.
InfluxDB accepts one timestamp per point.
If no timestamp is provided, InfluxDB uses the system time (UTC) of its host machine.
_**Data type:** [Unix timestamp](#unix-timestamp)_
> [!Note]
> #### Important notes about timestamps
>
> - To ensure a data point includes the time a metric is observed (not received by InfluxDB),
> include the timestamp.
> - If your timestamps are not in nanoseconds, specify the precision of your timestamps
> when [writing the data to InfluxDB](/influxdb/v2/write-data/#timestamp-precision).
### Whitespace
Whitespace in line protocol determines how InfluxDB interprets the data point.
The **first unescaped space** delimits the measurement and the tag set from the field set.
The **second unescaped space** delimits the field set from the timestamp.
{{< influxdb/line-protocol elements=false commas=false >}}
## Data types and format
### Float
IEEE-754 64-bit floating-point numbers.
Default numerical type.
_InfluxDB supports scientific notation in float field values._
##### Float field value examples
```js
myMeasurement fieldKey=1.0
myMeasurement fieldKey=1
myMeasurement fieldKey=-1.234456e+78
```
### Integer
Signed 64-bit integers.
Trailing `i` on the number specifies an integer.
| Minimum integer | Maximum integer |
| --------------- | --------------- |
| `-9223372036854775808i` | `9223372036854775807i` |
##### Integer field value examples
```js
myMeasurement fieldKey=1i
myMeasurement fieldKey=12485903i
myMeasurement fieldKey=-12485903i
```
### UInteger
Unsigned 64-bit integers.
Trailing `u` on the number specifies an unsigned integer.
| Minimum uinteger | Maximum uinteger |
| ---------------- | ---------------- |
| `0u` | `18446744073709551615u` |
##### UInteger field value examples
```js
myMeasurement fieldKey=1u
myMeasurement fieldKey=12485903u
```
### String
Plain text string.
Length limit 64KB.
##### String example
```sh
# String measurement name, field key, and field value
myMeasurement fieldKey="this is a string"
```
### Boolean
Stores `true` or `false` values.
| Boolean value | Accepted syntax |
|:-------------:|:--------------- |
| True | `t`, `T`, `true`, `True`, `TRUE` |
| False | `f`, `F`, `false`, `False`, `FALSE` |
##### Boolean field value examples
```js
myMeasurement fieldKey=true
myMeasurement fieldKey=false
myMeasurement fieldKey=t
myMeasurement fieldKey=f
myMeasurement fieldKey=TRUE
myMeasurement fieldKey=FALSE
```
> [!Note]
> Do not quote boolean field values.
> Quoted field values are interpreted as strings.
### Unix timestamp
Unix timestamp in a [specified precision](/influxdb/v2/reference/glossary/#unix-timestamp).
Default precision is nanoseconds (`ns`).
| Minimum timestamp | Maximum timestamp |
| ----------------- | ----------------- |
| `-9223372036854775806` | `9223372036854775806` |
##### Unix timestamp example
```js
myMeasurementName fieldKey="fieldValue" 1556813561098000000
```
## Quotes
Line protocol supports single and double quotes as described in the following table:
| Element | Double quotes | Single quotes |
| :------ | :------------: |:-------------: |
| Measurement | _Limited_ <sup class="required">*</sup> | _Limited_ <sup class="required">*</sup> |
| Tag key | _Limited_ <sup class="required">*</sup> | _Limited_ <sup class="required">*</sup> |
| Tag value | _Limited_ <sup class="required">*</sup> | _Limited_ <sup class="required">*</sup> |
| Field key | _Limited_ <sup class="required">*</sup> | _Limited_ <sup class="required">*</sup> |
| Field value | **Strings only** | Never |
| Timestamp | Never | Never |
<sup class="required">\*</sup> _Line protocol accepts double and single quotes in
measurement names, tag keys, tag values, and field keys, but interprets them as
part of the name, key, or value._
## Special Characters
Line protocol supports special characters in [string elements](#string).
In the following contexts, it requires escaping certain characters with a backslash (`\`):
| Element | Escape characters |
|:------- |:----------------- |
| Measurement | Comma, Space |
| Tag key | Comma, Equals Sign, Space |
| Tag value | Comma, Equals Sign, Space |
| Field key | Comma, Equals Sign, Space |
| Field value | Double quote, Backslash |
You do not need to escape other special characters.
##### Examples of special characters in line protocol
```sh
# Measurement name with spaces
my\ Measurement fieldKey="string value"
# Double quotes in a string field value
myMeasurement fieldKey="\"string\" within a string"
# Tag keys and values with spaces
myMeasurement,tag\ Key1=tag\ Value1,tag\ Key2=tag\ Value2 fieldKey=100
# Emojis
myMeasurement,tagKey=🍭 fieldKey="Launch 🚀" 1556813561098000000
```
### Escaping backslashes
Line protocol supports both literal backslashes and backslashes as an escape character.
With two contiguous backslashes, the first is interpreted as an escape character.
For example:
| Backslashes | Interpreted as |
|:-----------:|:-------------:|
| `\` | `\` |
| `\\` | `\` |
| `\\\` | `\\` |
| `\\\\` | `\\` |
| `\\\\\` | `\\\` |
| `\\\\\\` | `\\\` |
## Comments
Line protocol interprets `#` at the beginning of a line as a comment character
and ignores all subsequent characters until the next newline `\n`.
```sh
# This is a comment
myMeasurement fieldKey="string value" 1556813561098000000
```
## Naming restrictions
Measurement names, tag keys, and field keys cannot begin with an underscore `_`.
The `_` namespace is reserved for InfluxDB system use.
## Duplicate points
A point is uniquely identified by the measurement name, tag set, and timestamp.
If you submit line protocol with the same measurement, tag set, and timestamp,
but with a different field set, the field set becomes the union of the old
field set and the new field set, where any conflicts favor the new field set.
<!-- The content of this file is at
// SOURCE content/shared/v3-line-protocol.md-->

View File

@ -14,7 +14,8 @@ related:
- /influxdb3/clustered/write-data/use-telegraf/
---
Use these tips to optimize performance and system overhead when writing data to InfluxDB.
Use these tips to optimize performance and system overhead when writing data to
{{% product-name %}}.
- [Batch writes](#batch-writes)
- [Sort tags by key](#sort-tags-by-key)
@ -422,9 +423,23 @@ The following example creates sample data for two series (the combination of mea
### Avoid sending duplicate data
Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup) to filter data whose field values are exact repetitions of previous values.
When writing duplicate points (points with the same timestamp and tag set),
InfluxDB deduplicates the data by creating a union of the duplicate points.
Deduplicating your data can reduce your write payload size and resource usage.
> [!Important]
> #### Write ordering for duplicate points
>
> InfluxDB attempts to honor write ordering for duplicate points, with the most
> recently written point taking precedence. However, when data is flushed from
> the in-memory buffer to Parquet files—typically every 15 minutes, but
> sometimes sooner—this ordering is not guaranteed if duplicate points are flushed
> at the same time. As a result, the last written duplicate point may not always
> be retained in storage.
Use Telegraf and the [Dedup processor plugin](/telegraf/v1/plugins/#processor-dedup)
to filter data whose field values are exact repetitions of previous values.
The following example shows how to use Telegraf to remove points that repeat field values, and then write the data to InfluxDB:
1. In your terminal, enter the following command to create the sample data file and calculate the number of seconds between the earliest timestamp and _now_.

View File

@ -44,7 +44,7 @@ _**Data type:** [String](#string)_
### Tag set
_**Optional**_
(_**Optional**_)
All tag key-value pairs for the point.
Key-value relationships are denoted with the `=` operand.
Multiple tag key-value pairs are comma-delimited.
@ -75,8 +75,8 @@ _**Value data type:** [Float](#float) | [Integer](#integer) | [UInteger](#uinteg
### Timestamp
_**Optional**_
The [unix timestamp](/influxdb3/version/reference/glossary/#unix-timestamp) for the data point.
(_**Optional**_)
The [Unix timestamp](/influxdb3/version/reference/glossary/#unix-timestamp) for the data point.
InfluxDB accepts one timestamp per point.
If no timestamp is provided, InfluxDB uses the system time (UTC) of its host machine.
@ -282,3 +282,15 @@ A point is uniquely identified by the table name, tag set, and timestamp.
If you submit line protocol with the same table, tag set, and timestamp,
but with a different field set, the field set becomes the union of the old
field set and the new field set, where any conflicts favor the new field set.
{{% show-in "cloud-dedicated,clustered" %}}
> [!Important]
> #### Write ordering for duplicate points
>
> {{% product-name %}} attempts to honor write ordering for duplicate points,
> with the most recently written point taking precedence. However, when data is
> flushed from the in-memory buffer to Parquet files—typically every 15 minutes,
> but sometimes sooner—this ordering is not guaranteed if duplicate points are
> flushed at the same time. As a result, the last written duplicate point may
> not always be retained in storage.
{{% /show-in %}}