Merge pull request #300 from influxdata/duplicate-data-points

Duplicate data points
2019-07-01 16:04:38 -06:00 · 2019-07-01 16:04:38 -06:00 · d31105ab9c
parent 3ad7dcccb8 eb9fe063cd
commit d31105ab9c
3 changed files with 150 additions and 1 deletions
--- a/assets/styles/layouts/article/_code.scss
+++ b/assets/styles/layouts/article/_code.scss
@ -54,7 +54,8 @@ pre {
  overflow-y: hidden;
  code {
    padding: 0;
-    line-height: 1.4rem;
+    font-size: .95rem;
+    line-height: 1.5rem;
  }
 }

--- a/content/v2.0/write-data/best-practices/_index.md
+++ b/content/v2.0/write-data/best-practices/_index.md
@ -0,0 +1,17 @@
+---
+title: Best practices for writing data
+seotitle: Best practices for writing data to InfluxDB
+description: >
+  Learn about the recommendations and best practices for writing data to InfluxDB.
+weight: 105
+menu:
+  v2_0:
+    name: Best practices
+    identifier: write-best-practices
+    parent: Write data
+---
+
+The following articles walk through recommendations and best practices for writing
+data to InfluxDB.
+
+{{< children >}}
--- a/content/v2.0/write-data/best-practices/duplicate-points.md
+++ b/content/v2.0/write-data/best-practices/duplicate-points.md
@ -0,0 +1,131 @@
+---
+title: Handle duplicate data points
+seotitle: Handle duplicate data points when writing to InfluxDB
+description: >
+  InfluxDB identifies unique data points by their measurement, tag set, and timestamp.
+  This article discusses methods for preserving data from two points with a common
+  measurement, tag set, and timestamp but a different field set.
+weight: 202
+menu:
+  v2_0:
+    name: Handle duplicate points
+    parent: write-best-practices
+v2.0/tags: [best practices, write]
+---
+
+InfluxDB identifies unique data points by their measurement, tag set, and timestamp
+(each a part of [Line protocol](/v2.0/reference/line-protocol) used to write data to InfluxDB).
+
+```txt
+web,host=host2,region=us_west firstByte=15.0 1559260800000000000
+--- -------------------------                -------------------
+ |               |                                    |
+Measurement   Tag set                             Timestamp
+```
+
+## Duplicate data points
+For points that have the same measurement name, tag set, and timestamp,
+InfluxDB creates a union of the old and new field sets.
+For any matching field keys, InfluxDB uses the field value of the new point.
+For example:
+
+```sh
+# Existing data point
+web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000
+
+# New data point
+web,host=host2,region=us_west firstByte=15.0 1559260800000000000
+```
+
+After you submit the new data point, InfluxDB overwrites `firstByte` with the new
+field value and leaves the field `dnsLookup` alone:
+
+```sh
+# Resulting data point
+web,host=host2,region=us_west firstByte=15.0,dnsLookup=7.0 1559260800000000000
+```
+
+```sh
+from(bucket: "example-bucket")
+  |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
+  |> filter(fn: (r) => r._measurement == "web")
+
+Table: keys: [_measurement, host, region]
+               _time  _measurement   host   region  dnsLookup  firstByte
+--------------------  ------------  -----  -------  ---------  ---------
+2019-05-31T00:00:00Z           web  host2  us_west          7         15
+```
+
+## Preserve duplicate points
+To preserve both old and new field values in duplicate points, use one of the following strategies:
+
+- [Add an arbitrary tag](#add-an-arbitrary-tag)
+- [Increment the timestamp](#increment-the-timestamp)
+
+### Add an arbitrary tag
+Add an arbitrary tag with unique values so InfluxDB reads the duplicate points as unique.
+
+For example, add a `uniq` tag to each data point:
+
+```sh
+# Existing point
+web,host=host2,region=us_west,uniq=1 firstByte=24.0,dnsLookup=7.0 1559260800000000000
+
+# New point
+web,host=host2,region=us_west,uniq=2 firstByte=15.0 1559260800000000000
+```
+
+{{% note %}}
+It is not necessary to retroactively add the unique tag to the existing data point.
+Tag sets are evaluated as a whole.
+The arbitrary `uniq` tag on the new point allows InfluxDB to recognize it as a unique point.
+However, this causes the schema of the two points to differ and may lead to challenges when querying the data.
+{{% /note %}}
+
+After writing the new point to InfluxDB:
+
+```sh
+from(bucket: "example-bucket")
+  |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
+  |> filter(fn: (r) => r._measurement == "web")
+
+Table: keys: [_measurement, host, region, uniq]
+               _time  _measurement   host   region  uniq  firstByte  dnsLookup
+--------------------  ------------  -----  -------  ----  ---------  ---------
+2019-05-31T00:00:00Z           web  host2  us_west     1         24          7
+
+Table: keys: [_measurement, host, region, uniq]
+               _time  _measurement   host   region  uniq  firstByte
+--------------------  ------------  -----  -------  ----  ---------
+2019-05-31T00:00:00Z           web  host2  us_west     2         15
+```
+
+### Increment the timestamp
+Increment the timestamp by a nanosecond to enforce the uniqueness of each point.
+
+```sh
+# Old data point
+web,host=host2,region=us_west firstByte=24.0,dnsLookup=7.0 1559260800000000000
+
+# New data point
+web,host=host2,region=us_west firstByte=15.0 1559260800000000001
+```
+
+After writing the new point to InfluxDB:
+
+```sh
+from(bucket: "example-bucket")
+  |> range(start: 2019-05-31T00:00:00Z, stop: 2019-05-31T12:00:00Z)
+  |> filter(fn: (r) => r._measurement == "web")
+
+Table: keys: [_measurement, host, region]
+                         _time  _measurement   host   region  firstByte  dnsLookup
+------------------------------  ------------  -----  -------  ---------  ---------
+2019-05-31T00:00:00.000000000Z           web  host2  us_west         24          7
+2019-05-31T00:00:00.000000001Z           web  host2  us_west         15
+```
+
+{{% note %}}
+The output of examples queries in this article has been modified to clearly show
+the different approaches and results for handling duplicate data.
+{{% /note %}}