edits from Scott

2020-11-18 09:13:41 -08:00 · 2020-11-18 09:13:41 -08:00 · eb0496529f
parent 0b5acfe2a7
commit eb0496529f
3 changed files with 70 additions and 67 deletions
--- a/content/influxdb/cloud/write-data/best-practices/resolve-high-cardinality.md
+++ b/content/influxdb/cloud/write-data/best-practices/resolve-high-cardinality.md
@ -1,7 +1,7 @@
 ---
 title: Resolve high series cardinality
 description: >
-  Reduce high series cardinality in InfluxDB. If reads and writes to InfluxDB have started to slow down, you may have high serires cardinality. Find the source of high cardinality and fix your schema to resolve high cardinality issues.
+  Reduce high series cardinality in InfluxDB. If reads and writes to InfluxDB have started to slow down, you may have high series cardinality. Find the source of high cardinality and fix your schema to resolve high cardinality issues.
 menu:
  influxdb_cloud:
    name: Resolve high cardinality
--- a/content/influxdb/v2.0/write-data/best-practices/resolve-high-cardinality.md
+++ b/content/influxdb/v2.0/write-data/best-practices/resolve-high-cardinality.md
@ -1,7 +1,7 @@
 ---
 title: Resolve high series cardinality
 description: >
-  Reduce high series cardinality in InfluxDB. If reads and writes to InfluxDB have started to slow down, you may have high cardinality. Find the source of high cardinality and fix your schema to resolve high cardinality issues.
+  Reduce high series cardinality in InfluxDB. If reads and writes to InfluxDB have started to slow down, you may have high cardinality. Find the source of high cardinality and adjust your schema to resolve high cardinality issues.
 menu:
  influxdb_2_0:
    name: Resolve high cardinality
@ -9,14 +9,12 @@ menu:
    parent: write-best-practices
 ---

-{{% note %}}
 If reads and writes to InfluxDB have started to slow down, high [series cardinality](/influxdb/v2.0/reference/glossary/#series-cardinality) (too many series) may be causing memory issues.
-{{% /note %}}

 To resolve high series cardinality, complete the following steps (for multiple buckets if applicable):

 1. [Review tags](#review-tags).
-2. [Fix your schema](#fix-your-schema).
+2. [Adjust your schema](#adjust-your-schema).

 ## Review tags

@ -29,58 +27,62 @@ Review your tags to ensure each tag **does not contain** unique values for most

 Look for the following common issues, which often cause many unique tag values:

- *Writing log messages to tags*. If a log message includes a unique timestamp, pointer value, or unique string, many unique tag values are created.
- *Writing timestamps to tags*. Typically done by accidentally in client code.
- *Tags initially set up with few unique values that grow over time.* For example, a user ID tag may work at a small startup, and begin to cause issues when the company grows to thousands of users.
+- **Writing log messages to tags**. If a log message includes a unique timestamp, pointer value, or unique string, many unique tag values are created.
+- **Writing timestamps to tags**. Typically done by accident in client code.
+- **Tags initially set up with few unique values that grow over time.** For example, a user ID tag may work at a small startup, but may begin to cause issues when the company grows to thousands of users.

 ### Count unique tag values

 The following example Flux query shows you which tags are contributing the most to cardinality. Look for tags with values orders of magnitude higher than others.

-  ```js
-  # Count unique values for each tag in a bucket
-  import "influxdata/influxdb/schema"
-  cardinalityByTag = (bucket) =>
+```js
+// Count unique values for each tag in a bucket
+import "influxdata/influxdb/schema"
+
+cardinalityByTag = (bucket) =>
  schema.tagKeys(bucket: bucket)
-  |> map(fn: (r) => ({
-  tag: r._value,
-  _value: if contains(set: ["_stop","_start"], value:r._value) then
-  0
-  else
-  (schema.tagValues(bucket: bucket, tag: r._value)
-  |> count()
-  |> findRecord(fn: (key) => true, idx: 0))._value
-  }))
-  |> group(columns:["tag"])
-  |> sum()
-  cardinalityByTag(bucket: "my-bucket")
-  ```
+    |> map(fn: (r) => ({
+      tag: r._value,
+      _value:
+        if contains(set: ["_stop","_start"], value:r._value) then 0
+        else (schema.tagValues(bucket: bucket, tag: r._value)
+          |> count()
+          |> findRecord(fn: (key) => true, idx: 0))._value
+    }))
+    |> group(columns:["tag"])
+    |> sum()
+
+cardinalityByTag(bucket: "example-bucket")
+```

 {{% note %}}
 If you're experiencing runaway cardinality, the query above may timeout. If you experience a timeout, run the queries below—one at a time.
 {{% /note %}}

-First, run the following query to generate a list of tags.
+1. Generate a list of tags:

-  ```js
-  # Generate a list of tags
-  import "influxdata/influxdb/schema"
-  schema.tagKeys(bucket: "my-bucket")
-  |> yield(name: "tags")
-  ```
+    ```js
+    // Generate a list of tags
+    import "influxdata/influxdb/schema"

-Next, run the following query to find tag values for each tag.
+    schema.tagKeys(bucket: "example-bucket")
+    ```

-  ```js
-  # For each tag, run the following query to find the tag values
-  import "influxdata/influxdb/schema"
-  schema.tagValues(bucket: "my-bucket", tag: "my-tag")
-  |> count()
-  ```
+2. Count unique tag values for each tag:
+
+    ```js
+    // Run the following for each tag to count the number of unique tag values
+    import "influxdata/influxdb/schema"
+
+    tag = "example-tag-key"
+
+    schema.tagValues(bucket: "my-bucket", tag: tag)
+      |> count()
+    ```

 These queries should help to identify the sources of high cardinality in each of your buckets. To determine which specific tags are growing, check the cardinality again after 24 hours to see if one or more tags have grown significantly.

-## Fix your schema
+## Adjust your schema

 Usually, resolving high cardinality is as simple as changing a tag with many unique values to a field. Review the following potential solutions for resolving high cardinality:

@ -95,19 +97,21 @@ Consider whether you need the data causing high cardinality. In some cases, you

 Tags are valuable for indexing, so during a query, the query engine doesn't need to scan every single record in a bucket. However, too many indexes may create performance problems. The trick is to create a middle ground between scanning and indexing.

-For example, if you often query for specific user IDs, and you have thousands of users. A simple query like this, where `userId` is a field, requires InfluxDB to scan every row in storage for the `userId`:
+For example, if you query for specific user IDs with thousands of users, a simple query like this, where `userId` is a field, requires InfluxDB to scan every row for the `userId`:

 ```js
-from(bucket: “my-bucket”)
-|> range(start: -7d)
-|> filter(fn: (r) => r.userId == “abcde”)
+from(bucket: "example-bucket")
+  |> range(start: -7d)
+  |> filter(fn: (r) => r._field == "userId" and r._value == "abcde")
 ```

-Now, if you include a tag that can be reasonably indexed in your schema, for example, if each of your users can be categorized by company, you can add a “companyTag” to reduce the number of rows scanned considerably, retrieving data more quickly:
+If you include a tag in your schema that can be reasonably indexed, such as a `company` tag, you can reduce the number of rows scanned and retrieve data more quickly:

-from(bucket: “my-bucket”)
-|> range(start: -7d)
-|> filter(fn: (r) => r.companyTag == “Acme”)
-|> filter(fn: (r) => r.userId == “abcde”)
+```js
+from(bucket: "example-bucket")
+  |> range(start: -7d)
+  |> filter(fn: (r) => r.company == "Acme")
+  |> filter(fn: (r) => r._field == "userId" and r._value == "abcde")
+```

 Consider tags that can be reasonably indexed to make your queries more performant. For more guidelines to consider, see [InfluxDB schema design](/influxdb/v2.0/write-data/best-practices/schema-design/).
--- a/content/influxdb/v2.0/write-data/best-practices/schema-design.md
+++ b/content/influxdb/v2.0/write-data/best-practices/schema-design.md
@ -1,7 +1,7 @@
 ---
 title: InfluxDB schema design
 description: >
-  Improve InfluxDB schema design and data layout. Store unique values in fields and other tips to reduce high cardinality in InfluxDB and make your data more performant.
+  Improve InfluxDB schema design and data layout to reduce high cardinality and make your data more performant.
 menu:
  influxdb_2_0:
    name: Schema design
@ -9,9 +9,9 @@ menu:
    parent: write-best-practices
 ---

-Each InfluxDB use case is unique and your [schema](/influxdb/v2.0/reference/glossary/#schema) design reflects that uniqueness. Discover a few design guidelines that we recommend for most use cases:
+Each InfluxDB use case is unique and your [schema](/influxdb/v2.0/reference/glossary/#schema) design reflects the uniqueness. We recommend the following design guidelines for most use cases:

- [Where to store data (tags or fields)](#where-to-store-data-tags-or-fields)
+- [Where to store data (tag or field)](#where-to-store-data-tags-or-fields)
 - [Avoid too many series](#avoid-too-many-series)
 - [Use recommended naming conventions](#use-recommended-naming-conventions)
 <!-- - [Recommendations for managing shard group duration](#shard-group-duration-management)
@ -21,10 +21,10 @@ Each InfluxDB use case is unique and your [schema](/influxdb/v2.0/reference/glos
 Follow these guidelines to minimize high series cardinality and make your data more performant.
 {{% /note %}}

-## Where to store data (tags or fields)
+## Where to store data (tag or field)

 [Tags](/influxdb/v2.0/reference/glossary/#tag) are indexed and [fields](/influxdb/v2.0/reference/glossary/#field) are not.
-This means that queries on tags are more performant than queries on fields.
+This means that querying by tags is more performant than querying by fields.

 In general, your queries should guide what gets stored as a tag and what gets stored as a field:

@ -34,15 +34,13 @@ In general, your queries should guide what gets stored as a tag and what gets st

 ## Avoid too many series

-[Tags](/influxdb/v2.0/reference/glossary/#tag) containing highly variable information like UUIDs, hashes, and random strings lead to a large number of [series](/influxdb/v2.0/reference/glossary/#series) in the database, also known as high [series cardinality](/influxdb/v2.0/reference/glossary/#series-cardinality).
+[Tags](/influxdb/v2.0/reference/glossary/#tag) containing highly variable information like unique IDs, hashes, and random strings lead to a large number of [series](/influxdb/v2.0/reference/glossary/#series), also known as high [series cardinality](/influxdb/v2.0/reference/glossary/#series-cardinality).

 High series cardinality is a primary driver of high memory usage for many database workloads.
-When you write to InfluxDB, InfluxDB uses the measurements and tags to create indexes to speed up reads.
-
-However, when there are too many indexes created, both writes and reads may start to slow down. Therefore, if a system has memory constraints, consider storing high-cardinality data as a field rather than a tag.
+InfluxDB uses measurements and tags to create indexes and speed up reads. However, when too many indexes created, both writes and reads may start to slow down. Therefore, if a system has memory constraints, consider storing high-cardinality data as a field rather than a tag.

 {{% note %}}
-If reads and writes to InfluxDB have started to slow down, you may already have high series cardinality (too many series). See how to [resolve high cardinality](/influxdb/v2.0/write-data/best-practices/resolve-high-cardinality/).
+If reads and writes to InfluxDB start to slow down, you may have high series cardinality (too many series). See how to [resolve high cardinality](/influxdb/v2.0/write-data/best-practices/resolve-high-cardinality/).
 {{% /note %}}

 ## Use recommended naming conventions
@ -57,9 +55,9 @@ Use the following conventions when naming your tag and field keys:
 ### Avoid keywords as tag or field names

 Not required, but simplifies writing queries because you won't have to wrap tag or field names in double quotes.
-See [Flux](/influxdb/v2.0/reference/flux/language/lexical-elements/#keywords) keywords to avoid.
+See [Flux keywords](/influxdb/v2.0/reference/flux/language/lexical-elements/#keywords) to avoid.

-Also, if a tag or field name contains characters other than `[A-z,_]`, you must use [bracket notation](/influxdb/v2.0/query-data/get-started/syntax-basics/#records) in Flux.
+Also, if a tag or field name contains non-alphanumeric characters, you must use [bracket notation](/influxdb/v2.0/query-data/get-started/syntax-basics/#records) in Flux.

 ### Avoid the same name for a tag and a field

@ -99,17 +97,18 @@ Use Flux to calculate the average `temp` for blueberries in the `north` region:

 ```js
 // Schema 1 - Query for data encoded in the measurement name
-from(bucket:"<database>/<retention_policy>")
+from(bucket:"example-bucket")
  |> range(start:2016-08-30T00:00:00Z)
  |> filter(fn: (r) =>  r._measurement =~ /\.north$/ and r._field == "temp")
  |> mean()

 // Schema 2 - Query for data encoded in tags
-from(bucket:"<database>/<retention_policy>")
+from(bucket:"example-bucket")
  |> range(start:2016-08-30T00:00:00Z)
  |> filter(fn: (r) =>  r._measurement == "weather_sensor" and r.region == "north" and r._field == "temp")
  |> mean()
 ```
+
 In schema 1, we see that querying the `plot` and `region` in the measurement name makes the data more difficult to query.

 ### Avoid putting more than one piece of information in one tag
@ -127,7 +126,7 @@ weather_sensor,crop=blueberries,location=plot-1.north temp=50.1 1472515200000000
 weather_sensor,crop=blueberries,location=plot-2.midwest temp=49.8 1472515200000000000
 ```

-The schema 1 data encodes multiple separate parameters, the `plot` and `region` into a long tag value (`plot-1.north`).
+The schema 1 data encodes multiple parameters, the `plot` and `region`, into a long tag value (`plot-1.north`).
 Compare this to schema 2:

 ```
@ -137,7 +136,7 @@ weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 147251520000000000
 weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000
 ```

-Schema 2 is preferable because using multiple tags, you don't need a regular expression.
+Schema 2 is preferable because, with multiple tags, you don't need a regular expression.

 #### Flux example to query schemas

@ -145,13 +144,13 @@ The following Flux examples show how to calculate the average `temp` for blueber

 ```js
 // Schema 1 -  Query for multiple data encoded in a single tag
-from(bucket:"<database>/<retention_policy>")
+from(bucket:"example-bucket")
  |> range(start:2016-08-30T00:00:00Z)
  |> filter(fn: (r) =>  r._measurement == "weather_sensor" and r.location =~ /\.north$/ and r._field == "temp")
  |> mean()

 // Schema 2 - Query for data encoded in multiple tags
-from(bucket:"<database>/<retention_policy>")
+from(bucket:"example-bucket")
  |> range(start:2016-08-30T00:00:00Z)
  |> filter(fn: (r) =>  r._measurement == "weather_sensor" and r.region == "north" and r._field == "temp")
  |> mean()