From eb0496529fa3349e3ba0e02e5ab834619cda5290 Mon Sep 17 00:00:00 2001 From: Kelly Date: Wed, 18 Nov 2020 09:13:41 -0800 Subject: [PATCH] edits from Scott --- .../resolve-high-cardinality.md | 2 +- .../resolve-high-cardinality.md | 100 +++++++++--------- .../best-practices/schema-design.md | 35 +++--- 3 files changed, 70 insertions(+), 67 deletions(-) diff --git a/content/influxdb/cloud/write-data/best-practices/resolve-high-cardinality.md b/content/influxdb/cloud/write-data/best-practices/resolve-high-cardinality.md index 7de502f02..efe0508f9 100644 --- a/content/influxdb/cloud/write-data/best-practices/resolve-high-cardinality.md +++ b/content/influxdb/cloud/write-data/best-practices/resolve-high-cardinality.md @@ -1,7 +1,7 @@ --- title: Resolve high series cardinality description: > - Reduce high series cardinality in InfluxDB. If reads and writes to InfluxDB have started to slow down, you may have high serires cardinality. Find the source of high cardinality and fix your schema to resolve high cardinality issues. + Reduce high series cardinality in InfluxDB. If reads and writes to InfluxDB have started to slow down, you may have high series cardinality. Find the source of high cardinality and fix your schema to resolve high cardinality issues. menu: influxdb_cloud: name: Resolve high cardinality diff --git a/content/influxdb/v2.0/write-data/best-practices/resolve-high-cardinality.md b/content/influxdb/v2.0/write-data/best-practices/resolve-high-cardinality.md index 5d99e7aba..5bfb2ee57 100644 --- a/content/influxdb/v2.0/write-data/best-practices/resolve-high-cardinality.md +++ b/content/influxdb/v2.0/write-data/best-practices/resolve-high-cardinality.md @@ -1,7 +1,7 @@ --- title: Resolve high series cardinality description: > - Reduce high series cardinality in InfluxDB. If reads and writes to InfluxDB have started to slow down, you may have high cardinality. Find the source of high cardinality and fix your schema to resolve high cardinality issues. + Reduce high series cardinality in InfluxDB. If reads and writes to InfluxDB have started to slow down, you may have high cardinality. Find the source of high cardinality and adjust your schema to resolve high cardinality issues. menu: influxdb_2_0: name: Resolve high cardinality @@ -9,14 +9,12 @@ menu: parent: write-best-practices --- -{{% note %}} If reads and writes to InfluxDB have started to slow down, high [series cardinality](/influxdb/v2.0/reference/glossary/#series-cardinality) (too many series) may be causing memory issues. -{{% /note %}} To resolve high series cardinality, complete the following steps (for multiple buckets if applicable): 1. [Review tags](#review-tags). -2. [Fix your schema](#fix-your-schema). +2. [Adjust your schema](#adjust-your-schema). ## Review tags @@ -29,58 +27,62 @@ Review your tags to ensure each tag **does not contain** unique values for most Look for the following common issues, which often cause many unique tag values: -- *Writing log messages to tags*. If a log message includes a unique timestamp, pointer value, or unique string, many unique tag values are created. -- *Writing timestamps to tags*. Typically done by accidentally in client code. -- *Tags initially set up with few unique values that grow over time.* For example, a user ID tag may work at a small startup, and begin to cause issues when the company grows to thousands of users. +- **Writing log messages to tags**. If a log message includes a unique timestamp, pointer value, or unique string, many unique tag values are created. +- **Writing timestamps to tags**. Typically done by accident in client code. +- **Tags initially set up with few unique values that grow over time.** For example, a user ID tag may work at a small startup, but may begin to cause issues when the company grows to thousands of users. ### Count unique tag values The following example Flux query shows you which tags are contributing the most to cardinality. Look for tags with values orders of magnitude higher than others. - ```js - # Count unique values for each tag in a bucket - import "influxdata/influxdb/schema" - cardinalityByTag = (bucket) => +```js +// Count unique values for each tag in a bucket +import "influxdata/influxdb/schema" + +cardinalityByTag = (bucket) => schema.tagKeys(bucket: bucket) - |> map(fn: (r) => ({ - tag: r._value, - _value: if contains(set: ["_stop","_start"], value:r._value) then - 0 - else - (schema.tagValues(bucket: bucket, tag: r._value) - |> count() - |> findRecord(fn: (key) => true, idx: 0))._value - })) - |> group(columns:["tag"]) - |> sum() - cardinalityByTag(bucket: "my-bucket") - ``` + |> map(fn: (r) => ({ + tag: r._value, + _value: + if contains(set: ["_stop","_start"], value:r._value) then 0 + else (schema.tagValues(bucket: bucket, tag: r._value) + |> count() + |> findRecord(fn: (key) => true, idx: 0))._value + })) + |> group(columns:["tag"]) + |> sum() + +cardinalityByTag(bucket: "example-bucket") +``` {{% note %}} If you're experiencing runaway cardinality, the query above may timeout. If you experience a timeout, run the queries below—one at a time. {{% /note %}} -First, run the following query to generate a list of tags. +1. Generate a list of tags: - ```js - # Generate a list of tags - import "influxdata/influxdb/schema" - schema.tagKeys(bucket: "my-bucket") - |> yield(name: "tags") - ``` + ```js + // Generate a list of tags + import "influxdata/influxdb/schema" -Next, run the following query to find tag values for each tag. + schema.tagKeys(bucket: "example-bucket") + ``` - ```js - # For each tag, run the following query to find the tag values - import "influxdata/influxdb/schema" - schema.tagValues(bucket: "my-bucket", tag: "my-tag") - |> count() - ``` +2. Count unique tag values for each tag: + + ```js + // Run the following for each tag to count the number of unique tag values + import "influxdata/influxdb/schema" + + tag = "example-tag-key" + + schema.tagValues(bucket: "my-bucket", tag: tag) + |> count() + ``` These queries should help to identify the sources of high cardinality in each of your buckets. To determine which specific tags are growing, check the cardinality again after 24 hours to see if one or more tags have grown significantly. -## Fix your schema +## Adjust your schema Usually, resolving high cardinality is as simple as changing a tag with many unique values to a field. Review the following potential solutions for resolving high cardinality: @@ -95,19 +97,21 @@ Consider whether you need the data causing high cardinality. In some cases, you Tags are valuable for indexing, so during a query, the query engine doesn't need to scan every single record in a bucket. However, too many indexes may create performance problems. The trick is to create a middle ground between scanning and indexing. -For example, if you often query for specific user IDs, and you have thousands of users. A simple query like this, where `userId` is a field, requires InfluxDB to scan every row in storage for the `userId`: +For example, if you query for specific user IDs with thousands of users, a simple query like this, where `userId` is a field, requires InfluxDB to scan every row for the `userId`: ```js -from(bucket: “my-bucket”) -|> range(start: -7d) -|> filter(fn: (r) => r.userId == “abcde”) +from(bucket: "example-bucket") + |> range(start: -7d) + |> filter(fn: (r) => r._field == "userId" and r._value == "abcde") ``` -Now, if you include a tag that can be reasonably indexed in your schema, for example, if each of your users can be categorized by company, you can add a “companyTag” to reduce the number of rows scanned considerably, retrieving data more quickly: +If you include a tag in your schema that can be reasonably indexed, such as a `company` tag, you can reduce the number of rows scanned and retrieve data more quickly: -from(bucket: “my-bucket”) -|> range(start: -7d) -|> filter(fn: (r) => r.companyTag == “Acme”) -|> filter(fn: (r) => r.userId == “abcde”) +```js +from(bucket: "example-bucket") + |> range(start: -7d) + |> filter(fn: (r) => r.company == "Acme") + |> filter(fn: (r) => r._field == "userId" and r._value == "abcde") +``` Consider tags that can be reasonably indexed to make your queries more performant. For more guidelines to consider, see [InfluxDB schema design](/influxdb/v2.0/write-data/best-practices/schema-design/). diff --git a/content/influxdb/v2.0/write-data/best-practices/schema-design.md b/content/influxdb/v2.0/write-data/best-practices/schema-design.md index 327b7de20..504a7bb9f 100644 --- a/content/influxdb/v2.0/write-data/best-practices/schema-design.md +++ b/content/influxdb/v2.0/write-data/best-practices/schema-design.md @@ -1,7 +1,7 @@ --- title: InfluxDB schema design description: > - Improve InfluxDB schema design and data layout. Store unique values in fields and other tips to reduce high cardinality in InfluxDB and make your data more performant. + Improve InfluxDB schema design and data layout to reduce high cardinality and make your data more performant. menu: influxdb_2_0: name: Schema design @@ -9,9 +9,9 @@ menu: parent: write-best-practices --- -Each InfluxDB use case is unique and your [schema](/influxdb/v2.0/reference/glossary/#schema) design reflects that uniqueness. Discover a few design guidelines that we recommend for most use cases: +Each InfluxDB use case is unique and your [schema](/influxdb/v2.0/reference/glossary/#schema) design reflects the uniqueness. We recommend the following design guidelines for most use cases: -- [Where to store data (tags or fields)](#where-to-store-data-tags-or-fields) +- [Where to store data (tag or field)](#where-to-store-data-tags-or-fields) - [Avoid too many series](#avoid-too-many-series) - [Use recommended naming conventions](#use-recommended-naming-conventions)