From 8badf774928cbb3750ffdeb9d6479f0a82271a0e Mon Sep 17 00:00:00 2001 From: Scott Anderson Date: Fri, 31 May 2019 10:54:30 -0600 Subject: [PATCH 1/7] created write best practices section --- .../v2.0/write-data/best-practices/_index.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 content/v2.0/write-data/best-practices/_index.md diff --git a/content/v2.0/write-data/best-practices/_index.md b/content/v2.0/write-data/best-practices/_index.md new file mode 100644 index 000000000..e5da4c009 --- /dev/null +++ b/content/v2.0/write-data/best-practices/_index.md @@ -0,0 +1,17 @@ +--- +title: Best practices for writing data +seotitle: Best practices for writing data to InfluxDB +description: > + Learn about the recommendations and best practices for writing data to InfluxDB. +weight: 105 +menu: + v2_0: + name: Best practices + identifier: write-best-practices + parent: Write data +--- + +The following articles walk through recommendations and best practices for writing +data to InfluxDB. + +{{< children >}} From 55e3b9bdc3e76539deb7c4ae81dce19ad3c1ab4a Mon Sep 17 00:00:00 2001 From: Scott Anderson Date: Mon, 3 Jun 2019 16:28:00 -0600 Subject: [PATCH 2/7] handling duplicate data points --- assets/styles/layouts/article/_code.scss | 3 +- content/v2.0/reference/line-protocol.md | 8 +- .../best-practices/duplicate-points.md | 122 ++++++++++++++++++ 3 files changed, 128 insertions(+), 5 deletions(-) create mode 100644 content/v2.0/write-data/best-practices/duplicate-points.md diff --git a/assets/styles/layouts/article/_code.scss b/assets/styles/layouts/article/_code.scss index 259b81263..19cf05116 100644 --- a/assets/styles/layouts/article/_code.scss +++ b/assets/styles/layouts/article/_code.scss @@ -54,7 +54,8 @@ pre { overflow-y: hidden; code { padding: 0; - line-height: 1.4rem; + font-size: .95rem; + line-height: 1.5rem; } } diff --git a/content/v2.0/reference/line-protocol.md b/content/v2.0/reference/line-protocol.md index 9b1d25ae4..0dd970043 100644 --- a/content/v2.0/reference/line-protocol.md +++ b/content/v2.0/reference/line-protocol.md @@ -39,10 +39,10 @@ Line protocol does not support the newline character `\n` in tag or field values ## Elements of line protocol ``` -measurementName,tagKey=tagValue fieldKey="fieldValue" 1465839830100400200 ---------------- --------------- --------------------- ------------------- - | | | | - Measurement Tag set Field set Timestamp +measurementName,tagKey=tagValue fieldKey="fieldValue" 1465839830100400 +--------------- --------------- --------------------- ---------------- + | | | | + Measurement Tag set Field set Timestamp ``` ### Measurement diff --git a/content/v2.0/write-data/best-practices/duplicate-points.md b/content/v2.0/write-data/best-practices/duplicate-points.md new file mode 100644 index 000000000..4fa05f2e8 --- /dev/null +++ b/content/v2.0/write-data/best-practices/duplicate-points.md @@ -0,0 +1,122 @@ +--- +title: Handle duplicate data points +seotitle: Handle duplicate data points when writing to InfluxDB +description: > + placeholder +weight: 202 +menu: + v2_0: + name: Handle duplicate points + parent: write-best-practices +--- + + + +## Identifying unique data points +Data points are written to InfluxDB using [Line protocol](/v2.0/reference/line-protocol). +InfluxDB identifies unique data points by their measurement name, tag set, and timestamp. + +```txt +web,host=host2,region=us_west firstByte=15.0 1559260800000000000 +--- ------------------------- ------------------- + | | | +Measurement Tag set Timestamp +``` + +## How InfluxDB handles duplicate points +If a new point has the same measurement name, tag set, and timestamp as an +existing point, InfluxDB creates a union of the old and new field sets. +For any matching field keys, InfluxDB uses the field value of the new data point. +For example: + +```sh +# Old data point +web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000 + +# New data point +web,host=host2,region=us_west firstByte=15.0 1559260800000000000 +``` + +After you submit the new point, InfluxDB overwrites `firstByte` with the new field +value and leaves the field `dnsLookup` alone: + +{{% note %}} +The output of examples queries in this article has been modified to clearly show +the different approaches to handling duplicate data. +The +{{% /note %}} + +```sh +from(bucket: "example-bucket") + |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z) + |> filter(fn: (r) => r._measurement == "web") + +Table: keys: [_measurement, host, region] + _time _measurement host region dnsLookup firstByte +-------------------- ------------ ----- ------- --------- --------- +2019-05-31T00:00:00Z web host2 us_west 7 15 +``` + +## Preserve duplicate points +In some cases, you may want to preserve both old and new values. +There are two strategies for preserving duplicate field keys in data points that share a measurement, tag set, and timestamp: + +- [Add an arbitrary tag](#add-an-arbitrary-tag) +- [Increment the timestamp](#increment-the-timestamp) + +### Add an arbitrary tag +Introduce an arbitrary tag to duplicate points to enforce the uniqueness of each point. +Because the tag sets are different, InfluxDB treats them as unique points. + +The following example introduces an arbitrary `uniq` tag to each data point: + +```sh +# Old point +web,host=host2,region=us_west,uniq=1 firstByte=24.0,dnsLookup=7.0 1559260800000000000 + +# New point +web,host=host2,region=us_west,uniq=2 firstByte=15.0 1559260800000000000 +``` + +After writing the new point to InfluxDB: + +```sh +from(bucket: "example-bucket") + |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z) + |> filter(fn: (r) => r._measurement == "web") + +Table: keys: [_measurement, host, region, uniq] + _time _measurement host region uniq firstByte dnsLookup +-------------------- ------------ ----- ------- ---- --------- --------- +2019-05-31T00:00:00Z web host2 us_west 1 24 7 + +Table: keys: [_measurement, host, region, uniq] + _time _measurement host region uniq firstByte +-------------------- ------------ ----- ------- ---- --------- +2019-05-31T00:00:00Z web host2 us_west 2 15 +``` + +### Increment the timestamp +Increment the timestamp by a nanosecond to enforce the uniqueness of each point. + +```sh +# Old data point +web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000 + +# New data point +web,host=host2,region=us_west firstByte=15.0 1559260800000000001 +``` + +After writing the new point to InfluxDB: + +```sh +from(bucket: "example-bucket") + |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z) + |> filter(fn: (r) => r._measurement == "web") + +Table: keys: [_measurement, host, region] + _time _measurement host region firstByte dnsLookup +------------------------------ ------------ ----- ------- --------- --------- +2019-05-31T00:00:00.000000000Z web host2 us_west 24 7 +2019-05-31T00:00:00.000000001Z web host2 us_west 15 +``` From b707d15e6f602e453f444844d13af77c395a784f Mon Sep 17 00:00:00 2001 From: Scott Anderson Date: Mon, 3 Jun 2019 16:30:43 -0600 Subject: [PATCH 3/7] reverted epoch timestamp change in line protocol doc --- content/v2.0/reference/line-protocol.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/v2.0/reference/line-protocol.md b/content/v2.0/reference/line-protocol.md index 0dd970043..9b1d25ae4 100644 --- a/content/v2.0/reference/line-protocol.md +++ b/content/v2.0/reference/line-protocol.md @@ -39,10 +39,10 @@ Line protocol does not support the newline character `\n` in tag or field values ## Elements of line protocol ``` -measurementName,tagKey=tagValue fieldKey="fieldValue" 1465839830100400 ---------------- --------------- --------------------- ---------------- - | | | | - Measurement Tag set Field set Timestamp +measurementName,tagKey=tagValue fieldKey="fieldValue" 1465839830100400200 +--------------- --------------- --------------------- ------------------- + | | | | + Measurement Tag set Field set Timestamp ``` ### Measurement From 3392be60505053897a9ef1e972e6b84f14711fb6 Mon Sep 17 00:00:00 2001 From: Scott Anderson Date: Mon, 3 Jun 2019 16:40:03 -0600 Subject: [PATCH 4/7] added description and modified intro of duplicate data doc --- .../best-practices/duplicate-points.md | 23 +++++++++---------- 1 file changed, 11 insertions(+), 12 deletions(-) diff --git a/content/v2.0/write-data/best-practices/duplicate-points.md b/content/v2.0/write-data/best-practices/duplicate-points.md index 4fa05f2e8..3581185f0 100644 --- a/content/v2.0/write-data/best-practices/duplicate-points.md +++ b/content/v2.0/write-data/best-practices/duplicate-points.md @@ -2,19 +2,19 @@ title: Handle duplicate data points seotitle: Handle duplicate data points when writing to InfluxDB description: > - placeholder + InfluxDB identifies unique data points by their measurement, tag set, and timestamp. + This article discusses methods for preserving data from two points with a common + measurement, tag set, and timestamp but a different field set. weight: 202 menu: v2_0: name: Handle duplicate points parent: write-best-practices +v2.0/tags: [best practices, write] --- - - -## Identifying unique data points -Data points are written to InfluxDB using [Line protocol](/v2.0/reference/line-protocol). -InfluxDB identifies unique data points by their measurement name, tag set, and timestamp. +InfluxDB identifies unique data points by their measurement, tag set, and timestamp +(each a part of [Line protocol](/v2.0/reference/line-protocol) used to write data to InfluxDB). ```txt web,host=host2,region=us_west firstByte=15.0 1559260800000000000 @@ -40,12 +40,6 @@ web,host=host2,region=us_west firstByte=15.0 1559260800000000000 After you submit the new point, InfluxDB overwrites `firstByte` with the new field value and leaves the field `dnsLookup` alone: -{{% note %}} -The output of examples queries in this article has been modified to clearly show -the different approaches to handling duplicate data. -The -{{% /note %}} - ```sh from(bucket: "example-bucket") |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z) @@ -120,3 +114,8 @@ Table: keys: [_measurement, host, region] 2019-05-31T00:00:00.000000000Z web host2 us_west 24 7 2019-05-31T00:00:00.000000001Z web host2 us_west 15 ``` + +{{% note %}} +The output of examples queries in this article has been modified to clearly show +the different approaches and results for handling duplicate data. +{{% /note %}} From 63a8e4e764acf114d783174bbe5689a95a8fb9ad Mon Sep 17 00:00:00 2001 From: Scott Anderson Date: Tue, 4 Jun 2019 11:35:42 -0600 Subject: [PATCH 5/7] updated duplicate points doc to address PR feedback --- .../write-data/best-practices/duplicate-points.md | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/content/v2.0/write-data/best-practices/duplicate-points.md b/content/v2.0/write-data/best-practices/duplicate-points.md index 3581185f0..76f202cce 100644 --- a/content/v2.0/write-data/best-practices/duplicate-points.md +++ b/content/v2.0/write-data/best-practices/duplicate-points.md @@ -24,9 +24,9 @@ Measurement Tag set Timestamp ``` ## How InfluxDB handles duplicate points -If a new point has the same measurement name, tag set, and timestamp as an -existing point, InfluxDB creates a union of the old and new field sets. -For any matching field keys, InfluxDB uses the field value of the new data point. +For points that have the same measurement name, tag set, and timestamp, +InfluxDB creates a union of the old and new field sets. +For any matching field keys, InfluxDB uses the field value of the new point. For example: ```sh @@ -52,17 +52,15 @@ Table: keys: [_measurement, host, region] ``` ## Preserve duplicate points -In some cases, you may want to preserve both old and new values. -There are two strategies for preserving duplicate field keys in data points that share a measurement, tag set, and timestamp: +To preserve both old and new field keys in duplicate points, use one of the following strategies: - [Add an arbitrary tag](#add-an-arbitrary-tag) - [Increment the timestamp](#increment-the-timestamp) ### Add an arbitrary tag -Introduce an arbitrary tag to duplicate points to enforce the uniqueness of each point. -Because the tag sets are different, InfluxDB treats them as unique points. +Add an arbitrary tag with unique values so InfluxDB reads the duplicate points as unique. -The following example introduces an arbitrary `uniq` tag to each data point: +For example, add a uniq tag to each data point: ```sh # Old point From 11b6287198e5b2e9d81b8f66d2d34f5fab612beb Mon Sep 17 00:00:00 2001 From: Scott Anderson Date: Mon, 1 Jul 2019 14:56:22 -0600 Subject: [PATCH 6/7] updated duplicate data points doc to address PR feedback --- .../best-practices/duplicate-points.md | 24 ++++++++++++++----- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/content/v2.0/write-data/best-practices/duplicate-points.md b/content/v2.0/write-data/best-practices/duplicate-points.md index 76f202cce..83317fea1 100644 --- a/content/v2.0/write-data/best-practices/duplicate-points.md +++ b/content/v2.0/write-data/best-practices/duplicate-points.md @@ -23,22 +23,27 @@ web,host=host2,region=us_west firstByte=15.0 1559260800000000000 Measurement Tag set Timestamp ``` -## How InfluxDB handles duplicate points +## Duplicate data points For points that have the same measurement name, tag set, and timestamp, InfluxDB creates a union of the old and new field sets. For any matching field keys, InfluxDB uses the field value of the new point. For example: ```sh -# Old data point +# Existing data point web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000 # New data point web,host=host2,region=us_west firstByte=15.0 1559260800000000000 ``` -After you submit the new point, InfluxDB overwrites `firstByte` with the new field -value and leaves the field `dnsLookup` alone: +After you submit the new data point, InfluxDB overwrites `firstByte` with the new +field value and leaves the field `dnsLookup` alone: + +```sh +# Resulting data point +web,host=host2,region=us_west firstByte=15.0,dnsLookup=7.0 1559260800000000000 +``` ```sh from(bucket: "example-bucket") @@ -60,16 +65,23 @@ To preserve both old and new field keys in duplicate points, use one of the foll ### Add an arbitrary tag Add an arbitrary tag with unique values so InfluxDB reads the duplicate points as unique. -For example, add a uniq tag to each data point: +For example, add a `uniq` tag to each data point: ```sh -# Old point +# Existing point web,host=host2,region=us_west,uniq=1 firstByte=24.0,dnsLookup=7.0 1559260800000000000 # New point web,host=host2,region=us_west,uniq=2 firstByte=15.0 1559260800000000000 ``` +{{% note %}} +It is not necessary to retroactively add the unique tag to the existing data point. +Tag sets are evaluated as a whole. +The arbitrary `uniq` tag on the new point allows InfluxDB to recognize it as a unique point. +However, this causes the schema of the two points to differ and may lead to challenges when querying the data. +{{% /note %}} + After writing the new point to InfluxDB: ```sh From eb9fe063cde0a73418b1877272a4ad7f8998a343 Mon Sep 17 00:00:00 2001 From: Scott Anderson Date: Mon, 1 Jul 2019 15:34:48 -0600 Subject: [PATCH 7/7] more updates to the duplicate data points doc --- content/v2.0/write-data/best-practices/duplicate-points.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/v2.0/write-data/best-practices/duplicate-points.md b/content/v2.0/write-data/best-practices/duplicate-points.md index 83317fea1..6dcde2152 100644 --- a/content/v2.0/write-data/best-practices/duplicate-points.md +++ b/content/v2.0/write-data/best-practices/duplicate-points.md @@ -57,7 +57,7 @@ Table: keys: [_measurement, host, region] ``` ## Preserve duplicate points -To preserve both old and new field keys in duplicate points, use one of the following strategies: +To preserve both old and new field values in duplicate points, use one of the following strategies: - [Add an arbitrary tag](#add-an-arbitrary-tag) - [Increment the timestamp](#increment-the-timestamp)