Merge branch 'master' into hardware-sizing

pull/3389/head
lwandzura 2021-11-19 14:06:25 -06:00 committed by GitHub
commit acf6cf6d8d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
34 changed files with 1218 additions and 785 deletions

1
.gitignore vendored
View File

@ -5,5 +5,6 @@ public
node_modules
*.log
/resources
.hugo_build.lock
/content/influxdb/*/api/*.html
/api-docs/redoc-static.html*

View File

@ -457,7 +457,7 @@ WHERE time > now() - 15m
To link to tabbed content, click on the tab and use the URL parameter shown.
It will have the form `?t=`, plus a string.
For example:
For example:
```
[Windows installation](/influxdb/v2.0/install/?t=Windows)
@ -1034,6 +1034,29 @@ only render in the InfluxDB OSS documentation.
This is necessary to get the first sentence/paragraph to render correctly.
{{% /oss-only %}}
- {{% oss-only %}}This is a list item that will only render in InfluxDB OSS docs.{{% /oss-only %}}
- {{% oss-only %}}
This is a multi-paragraph list item that will only render in the InfluxDB OSS docs.
**Note:** Notice shortcode is _inside_ of the line item.
There also must be blank newline after the opening short-code tag.
This is necessary to get the first sentence/paragraph to render correctly.
{{% /oss-only %}}
1. Step 1
2. {{% oss-only %}}This is a list item that will only render in InfluxDB OSS docs.{{% /oss-only %}}
3. {{% oss-only %}}
This is a list item that contains multiple paragraphs or nested list items and will only render in the InfluxDB OSS docs.
**Note:** Notice shortcode is _inside_ of the line item.
There also must be blank newline after the opening short-code tag.
This is necessary to get the first sentence/paragraph to render correctly.
{{% /oss-only %}}
```
#### cloud-only
@ -1052,6 +1075,29 @@ only render in the InfluxDB Cloud documentation.
This is necessary to get the first sentence/paragraph to render correctly.
{{% /cloud-only %}}
- {{% cloud-only %}}This is a list item that will only render in InfluxDB Cloud docs.{{% /cloud-only %}}
- {{% cloud-only %}}
This is a list item that contains multiple paragraphs or nested list items and will only render in the InfluxDB Cloud docs.
**Note:** Notice shortcode is _inside_ of the line item.
There also must be blank newline after the opening short-code tag.
This is necessary to get the first sentence/paragraph to render correctly.
{{% /cloud-only %}}
1. Step 1
2. {{% cloud-only %}}This is a list item that will only render in InfluxDB Cloud docs.{{% /cloud-only %}}
3. {{% cloud-only %}}
This is a multi-paragraph list item that will only render in the InfluxDB Cloud docs.
**Note:** Notice shortcode is _inside_ of the line item.
There also must be blank newline after the opening short-code tag.
This is necessary to get the first sentence/paragraph to render correctly.
{{% /cloud-only %}}
```
#### All-Caps

View File

@ -5981,7 +5981,7 @@ components:
For more information and examples, see the following:
- [`/authorizations`](#tag/Authorizations) endpoint.
- [Use tokens in API requests](https://docs.influxdata.com/influxdb/cloud/api-guide/api_intro/#authentication).
- [Authorize API requests](https://docs.influxdata.com/influxdb/cloud/api-guide/api_intro/#authentication).
- [Manage API tokens](https://docs.influxdata.com/influxdb/cloud/security/tokens).
- [Assign a token to a specific user](https://docs.influxdata.com/influxdb/cloud/security/tokens/create-token).
scheme: token
@ -9625,14 +9625,13 @@ paths:
headers:
Content-Encoding:
description: >-
The Content-Encoding entity header is used to compress the
media-type. When present, its value indicates which encodings
were applied to the entity-body
Lists any encodings (usually compression algorithms) that have
been applied to the response payload.
schema:
default: identity
description: >-
Specifies that the response in the body is encoded with gzip
or not encoded with identity.
The content coding. `gzip` for compressed data or `identity`
for unmodified, uncompressed data.
enum:
- gzip
- identity
@ -9709,16 +9708,15 @@ paths:
parameters:
- $ref: '#/components/parameters/TraceSpan'
- description: >-
The Accept-Encoding request HTTP header advertises which content
encoding, usually a compression algorithm, the client is able to
understand.
Indicates the content encoding (usually a compression algorithm)
that the client can understand.
in: header
name: Accept-Encoding
schema:
default: identity
description: >-
Specifies that the query response in the body should be encoded
with gzip or not encoded with identity.
The content coding. Use `gzip` for compressed data or `identity`
for unmodified, uncompressed data.
enum:
- gzip
- identity
@ -9777,14 +9775,13 @@ paths:
headers:
Content-Encoding:
description: >-
The Content-Encoding entity header is used to compress the
media-type. When present, its value indicates which encodings
were applied to the entity-body
Lists any encodings (usually compression algorithms) that have
been applied to the response payload.
schema:
default: identity
description: >-
Specifies that the response in the body is encoded with gzip
or not encoded with identity.
description: >
The content coding: `gzip` for compressed data or `identity`
for unmodified, uncompressed data.
enum:
- gzip
- identity
@ -12130,21 +12127,29 @@ paths:
format.
For more information and examples, see [Write data with the InfluxDB
For more information and examples, see the following:
- [Write data with the InfluxDB
API](https://docs.influxdata.com/influxdb/cloud/write-data/developer-tools/api).
- [Optimize writes to
InfluxDB](https://docs.influxdata.com/influxdb/cloud/write-data/best-practices/optimize-writes/).
operationId: PostWrite
parameters:
- $ref: '#/components/parameters/TraceSpan'
- description: >-
When present, the header value tells the database that compression
is applied to the line protocol in the request body.
- description: >
The value tells InfluxDB what compression is applied to the line
protocol in the request payload.
To make an API request with a GZIP payload, send `Content-Encoding:
gzip` as a request header.
in: header
name: Content-Encoding
schema:
default: identity
description: >-
The header value specifies that the line protocol in the request
body is encoded with gzip or not encoded with identity.
The content coding. Use `gzip` for compressed data or `identity`
for unmodified, uncompressed data.
enum:
- gzip
- identity
@ -12370,7 +12375,7 @@ tags:
For more information and examples, see the following:
- [Use tokens in API requests](https://docs.influxdata.com/influxdb/cloud/api-guide/api_intro/#authentication).
- [Authorize API requests](https://docs.influxdata.com/influxdb/cloud/api-guide/api_intro/#authentication).
- [Manage API tokens](https://docs.influxdata.com/influxdb/cloud/security/tokens).
- [Assign a token to a specific user](https://docs.influxdata.com/influxdb/cloud/security/tokens/create-token).
name: Authorizations

View File

@ -5926,7 +5926,7 @@ components:
For more information and examples, see the following:
- [`/authorizations`](#tag/Authorizations) endpoint.
- [Use tokens in API requests](https://docs.influxdata.com/influxdb/v2.1/api-guide/api_intro/#authentication).
- [Authorize API requests](https://docs.influxdata.com/influxdb/v2.1/api-guide/api_intro/#authentication).
- [Manage API tokens](https://docs.influxdata.com/influxdb/v2.1/security/tokens).
- [Assign a token to a specific user](https://docs.influxdata.com/influxdb/v2.1/security/tokens/create-token).
scheme: token
@ -6120,16 +6120,15 @@ paths:
parameters:
- $ref: '#/components/parameters/TraceSpan'
- description: >-
The Accept-Encoding request HTTP header advertises which content
encoding, usually a compression algorithm, the client is able to
understand.
Indicates the content encoding (usually a compression algorithm)
that the client can understand.
in: header
name: Accept-Encoding
schema:
default: identity
description: >-
Specifies that the query response in the body should be encoded
with gzip or not encoded with identity.
The content coding. Use `gzip` for compressed data or `identity`
for unmodified, uncompressed data.
enum:
- gzip
- identity
@ -6144,14 +6143,13 @@ paths:
headers:
Content-Encoding:
description: >-
The Content-Encoding entity header is used to compress the
media-type. When present, its value indicates which encodings
were applied to the entity-body
Lists any encodings (usually compression algorithms) that have
been applied to the response payload.
schema:
default: identity
description: >-
Specifies that the response in the body is encoded with gzip
or not encoded with identity.
description: >
The content coding: `gzip` for compressed data or `identity`
for unmodified, uncompressed data.
enum:
- gzip
- identity
@ -6168,16 +6166,15 @@ paths:
parameters:
- $ref: '#/components/parameters/TraceSpan'
- description: >-
The Accept-Encoding request HTTP header advertises which content
encoding, usually a compression algorithm, the client is able to
understand.
Indicates the content encoding (usually a compression algorithm)
that the client can understand.
in: header
name: Accept-Encoding
schema:
default: identity
description: >-
Specifies that the query response in the body should be encoded
with gzip or not encoded with identity.
The content coding. Use `gzip` for compressed data or `identity`
for unmodified, uncompressed data.
enum:
- gzip
- identity
@ -6206,14 +6203,13 @@ paths:
headers:
Content-Encoding:
description: >-
The Content-Encoding entity header is used to compress the
media-type. When present, its value indicates which encodings
were applied to the entity-body
Lists any encodings (usually compression algorithms) that have
been applied to the response payload.
schema:
default: identity
description: >-
Specifies that the response in the body is encoded with gzip
or not encoded with identity.
description: >
The content coding: `gzip` for compressed data or `identity`
for unmodified, uncompressed data.
enum:
- gzip
- identity
@ -9605,16 +9601,15 @@ paths:
parameters:
- $ref: '#/components/parameters/TraceSpan'
- description: >-
The Accept-Encoding request HTTP header advertises which content
encoding, usually a compression algorithm, the client is able to
understand.
Indicates the content encoding (usually a compression algorithm)
that the client can understand.
in: header
name: Accept-Encoding
schema:
default: identity
description: >-
Specifies that the query response in the body should be encoded
with gzip or not encoded with identity.
The content coding. Use `gzip` for compressed data or `identity`
for unmodified, uncompressed data.
enum:
- gzip
- identity
@ -9673,14 +9668,13 @@ paths:
headers:
Content-Encoding:
description: >-
The Content-Encoding entity header is used to compress the
media-type. When present, its value indicates which encodings
were applied to the entity-body
Lists any encodings (usually compression algorithms) that have
been applied to the response payload.
schema:
default: identity
description: >-
Specifies that the response in the body is encoded with gzip
or not encoded with identity.
description: >
The content coding: `gzip` for compressed data or `identity`
for unmodified, uncompressed data.
enum:
- gzip
- identity
@ -9951,16 +9945,19 @@ paths:
operationId: PostRestoreKV
parameters:
- $ref: '#/components/parameters/TraceSpan'
- description: >-
When present, its value indicates to the database that compression
is applied to the line-protocol body.
- description: >
The value tells InfluxDB what compression is applied to the line
protocol in the request payload.
To make an API request with a GZIP payload, send `Content-Encoding:
gzip` as a request header.
in: header
name: Content-Encoding
schema:
default: identity
description: >-
Specifies that the line protocol in the body is encoded with gzip
or not encoded with identity.
The content coding. Use `gzip` for compressed data or `identity`
for unmodified, uncompressed data.
enum:
- gzip
- identity
@ -10006,9 +10003,12 @@ paths:
operationId: PostRestoreShardId
parameters:
- $ref: '#/components/parameters/TraceSpan'
- description: >-
When present, its value indicates to the database that compression
is applied to the line-protocol body.
- description: >
The value tells InfluxDB what compression is applied to the line
protocol in the request payload.
To make an API request with a GZIP payload, send `Content-Encoding:
gzip` as a request header.
in: header
name: Content-Encoding
schema:
@ -10055,9 +10055,12 @@ paths:
operationId: PostRestoreSQL
parameters:
- $ref: '#/components/parameters/TraceSpan'
- description: >-
When present, its value indicates to the database that compression
is applied to the line-protocol body.
- description: >
The value tells InfluxDB what compression is applied to the line
protocol in the request payload.
To make an API request with a GZIP payload, send `Content-Encoding:
gzip` as a request header.
in: header
name: Content-Encoding
schema:
@ -12733,14 +12736,22 @@ paths:
format.
For more information and examples, see [Write data with the InfluxDB
For more information and examples, see the following:
- [Write data with the InfluxDB
API](https://docs.influxdata.com/influxdb/v2.1/write-data/developer-tools/api).
- [Optimize writes to
InfluxDB](https://docs.influxdata.com/influxdb/v2.1/write-data/best-practices/optimize-writes/).
operationId: PostWrite
parameters:
- $ref: '#/components/parameters/TraceSpan'
- description: >-
When present, the header value tells the database that compression
is applied to the line protocol in the request body.
- description: >
The value tells InfluxDB what compression is applied to the line
protocol in the request payload.
To make an API request with a GZIP payload, send `Content-Encoding:
gzip` as a request header.
in: header
name: Content-Encoding
schema:
@ -12953,7 +12964,7 @@ tags:
For more information and examples, see the following:
- [Use tokens in API requests](https://docs.influxdata.com/influxdb/v2.1/api-guide/api_intro/#authentication).
- [Authorize API requests](https://docs.influxdata.com/influxdb/v2.1/api-guide/api_intro/#authentication).
- [Manage API tokens](https://docs.influxdata.com/influxdb/v2.1/security/tokens).
- [Assign a token to a specific user](https://docs.influxdata.com/influxdb/v2.1/security/tokens/create-token).
name: Authorizations

View File

@ -9,136 +9,160 @@ menu:
parent: Concepts
---
Every InfluxDB use case is special and your [schema](/enterprise_influxdb/v1.9/concepts/glossary/#schema) will reflect that uniqueness.
There are, however, general guidelines to follow and pitfalls to avoid when designing your schema.
Each InfluxDB use case is unique and your [schema](/enterprise_influxdb/v1.9/concepts/glossary/#schema) reflects that uniqueness.
In general, a schema designed for querying leads to simpler and more performant queries.
We recommend the following design guidelines for most use cases:
<table style="width:100%">
<tr>
<td><a href="#general-recommendations">General Recommendations</a></td>
<td><a href="#encouraged-schema-design">Encouraged Schema Design</a></td>
<td><a href="#discouraged-schema-design">Discouraged Schema Design</a></td>
<td><a href="#shard-group-duration-management">Shard Group Duration Management</a></td>
</tr>
</table>
- [Where to store data (tag or field)](#where-to-store-data-tag-or-field)
- [Avoid too many series](#avoid-too-many-series)
- [Use recommended naming conventions](#use-recommended-naming-conventions)
- [Shard Group Duration Management](#shard-group-duration-management)
## General recommendations
## Where to store data (tag or field)
### Encouraged schema design
Your queries should guide what data you store in [tags](/enterprise_influxdb/v1.9/concepts/glossary/#tag) and what you store in [fields](/enterprise_influxdb/v1.9/concepts/glossary/#field) :
We recommend that you:
- Store commonly-queried and grouping ([`group()`](/flux/v0.x/stdlib/universe/group) or [`GROUP BY`](/enterprise_influxdb/v1.9/query_language/explore-data/#group-by-tags)) metadata in tags.
- Store data in fields if each data point contains a different value.
- Store numeric values as fields ([tag values](/enterprise_influxdb/v1.9/concepts/glossary/#tag-value) only support string values).
- [Encode meta data in tags](#encode-meta-data-in-tags)
- [Avoid using keywords as tag or field names](#avoid-using-keywords-as-tag-or-field-names)
## Avoid too many series
#### Encode meta data in tags
IndexDB indexes the following data elements to speed up reads:
[Tags](/enterprise_influxdb/v1.9/concepts/glossary/#tag) are indexed and [fields](/enterprise_influxdb/v1.9/concepts/glossary/#field) are not indexed.
This means that queries on tags are more performant than those on fields.
- [measurement](/enterprise_influxdb/v1.9/concepts/glossary/#measurement)
- [tags](/enterprise_influxdb/v1.9/concepts/glossary/#tag)
In general, your queries should guide what gets stored as a tag and what gets stored as a field:
[Tag values](/enterprise_influxdb/v1.9/concepts/glossary/#tag-value) are indexed and [field values](/enterprise_influxdb/v1.9/concepts/glossary/#field-value) are not.
This means that querying by tags is more performant than querying by fields.
However, when too many indexes are created, both writes and reads may start to slow down.
- Store commonly-queried meta data in tags
- Store data in tags if you plan to use them with the InfluxQL `GROUP BY` clause
- Store data in fields if you plan to use them with an [InfluxQL](/enterprise_influxdb/v1.9/query_language/functions/) function
- Store numeric values as fields ([tag values](/enterprise_influxdb/v1.9/concepts/glossary/#tag-value) only support string values)
Each unique set of indexed data elements forms a [series key](/enterprise_influxdb/v1.9/concepts/glossary/#series-key).
[Tags](/enterprise_influxdb/v1.9/concepts/glossary/#tag) containing highly variable information like unique IDs, hashes, and random strings lead to a large number of [series](/enterprise_influxdb/v1.9/concepts/glossary/#series), also known as high [series cardinality](/enterprise_influxdb/v1.9/concepts/glossary/#series-cardinality).
High series cardinality is a primary driver of high memory usage for many database workloads.
Therefore, to reduce memory consumption, consider storing high-cardinality values in field values rather than in tags or field keys.
#### Avoid using keywords as tag or field names
{{% note %}}
Not required, but simplifies writing queries because you won't have to wrap tag or field names in double quotes.
See [InfluxQL](https://github.com/influxdata/influxql/blob/master/README.md#keywords) and [Flux](https://github.com/influxdata/flux/blob/master/docs/SPEC.md#keywords) keywords to avoid.
If reads and writes to InfluxDB start to slow down, you may have high series cardinality (too many series).
See [how to find and reduce high series cardinality](/enterprise_influxdb/v1.9/troubleshooting/frequently-asked-questions/#why-does-series-cardinality-matter).
Also, if a tag or field name contains characters other than `[A-z,_]`, you must wrap it in double quotes in InfluxQL or use [bracket notation](/{{< latest "influxdb" "v2" >}}/query-data/get-started/syntax-basics/#records) in Flux.
{{% /note %}}
### Discouraged schema design
## Use recommended naming conventions
We recommend that you:
Use the following conventions when naming your tag and field keys:
- [Avoid too many series](#avoid-too-many-series)
- [Avoid the same name for a tag and a field](#avoid-the-same-name-for-a-tag-and-a-field)
- [Avoid encoding data in measurement names](#avoid-encoding-data-in-measurement-names)
- [Avoid putting more than one piece of information in one tag](#avoid-putting-more-than-one-piece-of-information-in-one-tag)
- [Avoid reserved keywords in tag and field keys](#avoid-reserved-keywords-in-tag-and-field-keys)
- [Avoid the same tag and field name](#avoid-the-same-name-for-a-tag-and-a-field)
- [Avoid encoding data in measurements and keys](#avoid-encoding-data-in-measurements-and-keys)
- [Avoid more than one piece of information in one tag](#avoid-putting-more-than-one-piece-of-information-in-one-tag)
#### Avoid too many series
### Avoid reserved keywords in tag and field keys
[Tags](/enterprise_influxdb/v1.9/concepts/glossary/#tag) containing highly variable information like UUIDs, hashes, and random strings lead to a large number of [series](/enterprise_influxdb/v1.9/concepts/glossary/#series) in the database, also known as high series cardinality. High series cardinality is a primary driver of high memory usage for many database workloads.
Not required, but avoiding the use of reserved keywords in your tag keys and field keys simplifies writing queries because you won't have to wrap your keys in double quotes.
See [InfluxQL](https://github.com/influxdata/influxql/blob/master/README.md#keywords) and [Flux keywords](/{{< latest "flux" >}}/spec/lexical-elements/#keywords) to avoid.
<!-- See [Hardware sizing guidelines](/enterprise_influxdb/v1.9/reference/hardware_sizing/) for [series cardinality](/enterprise_influxdb/v1.9/concepts/glossary/#series-cardinality) recommendations based on your hardware. -->
Also, if a tag key or field key contains characters other than `[A-z,_]`, you must wrap it in double quotes in InfluxQL or use [bracket notation](/{{< latest "flux" >}}/data-types/composite/record/#bracket-notation) in Flux.
If the system has memory constraints, consider storing high-cardinality data as a field rather than a tag. For more information, see [series cardinality](/enterprise_influxdb/v1.9/concepts/glossary/#series-cardinality).
<!-- When adding back hardware sizing gudelies, update line 65-67 -->
#### Avoid the same name for a tag and a field
### Avoid the same name for a tag and a field
Avoid using the same name for a tag and field key.
This often results in unexpected behavior when querying data.
If you inadvertently add the same name for a tag and field key, see
If you inadvertently add the same name for a tag and a field, see
[Frequently asked questions](/enterprise_influxdb/v1.9/troubleshooting/frequently-asked-questions/#tag-and-field-key-with-the-same-name)
for information about how to query the data predictably and how to fix the issue.
#### Avoid encoding data in measurement names
### Avoid encoding data in measurements and keys
InfluxDB queries merge data that falls within the same [measurement](/enterprise_influxdb/v1.9/concepts/glossary/#measurement); it's better to differentiate data with [tags](/enterprise_influxdb/v1.9/concepts/glossary/#tag) than with detailed measurement names. If you encode data in a measurement name, you must use a regular expression to query the data, making some queries more complicated or impossible.
Store data in [tag values](/enterprise_influxdb/v1.9/concepts/glossary/#tag-value) or [field values](/enterprise_influxdb/v1.9/concepts/glossary/#field-value), not in [tag keys](/enterprise_influxdb/v1.9/concepts/glossary/#tag-key), [field keys](/enterprise_influxdb/v1.9/concepts/glossary/#field-key), or [measurements](/enterprise_influxdb/v1.9/concepts/glossary/#measurement). If you design your schema to store data in tag and field values,
your queries will be easier to write and more efficient.
_Example:_
In addition, you'll keep cardinality low by not creating measurements and keys as you write data.
To learn more about the performance impact of high series cardinality, see [how to find and reduce high series cardinality](/enterprise_influxdb/v1.9/troubleshooting/frequently-asked-questions/#why-does-series-cardinality-matter).
Consider the following schema represented by line protocol.
#### Compare schemas
Compare the following valid schemas represented by line protocol.
**Recommended**: the following schema stores metadata in separate `crop`, `plot`, and `region` tags. The `temp` field contains variable numeric data.
##### {id="good-measurements-schema"}
```
Schema 1 - Data encoded in the measurement name
-------------
blueberries.plot-1.north temp=50.1 1472515200000000000
blueberries.plot-2.midwest temp=49.8 1472515200000000000
```
The long measurement names (`blueberries.plot-1.north`) with no tags are similar to Graphite metrics.
Encoding the `plot` and `region` in the measurement name makes the data more difficult to query.
For example, calculating the average temperature of both plots 1 and 2 is not possible with schema 1.
Compare this to schema 2:
```
Schema 2 - Data encoded in tags
Good Measurements schema - Data encoded in tags (recommended)
-------------
weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000
```
Use Flux or InfluxQL to calculate the average `temp` for blueberries in the `north` region:
**Not recommended**: the following schema stores multiple attributes (`crop`, `plot` and `region`) concatenated (`blueberries.plot-1.north`) within the measurement, similar to Graphite metrics.
##### Flux
##### {id="bad-measurements-schema"}
```
Bad Measurements schema - Data encoded in the measurement (not recommended)
-------------
blueberries.plot-1.north temp=50.1 1472515200000000000
blueberries.plot-2.midwest temp=49.8 1472515200000000000
```
**Not recommended**: the following schema stores multiple attributes (`crop`, `plot` and `region`) concatenated (`blueberries.plot-1.north`) within the field key.
##### {id="bad-keys-schema"}
```
Bad Keys schema - Data encoded in field keys (not recommended)
-------------
weather_sensor blueberries.plot-1.north.temp=50.1 1472515200000000000
weather_sensor blueberries.plot-2.midwest.temp=49.8 1472515200000000000
```
#### Compare queries
Compare the following queries of the [_Good Measurements_](#good-measurements-schema) and [_Bad Measurements_](#bad-measurements-schema) schemas.
The [Flux](/{{< latest "flux" >}}/) queries calculate the average `temp` for blueberries in the `north` region
**Easy to query**: [_Good Measurements_](#good-measurements-schema) data is easily filtered by `region` tag values, as in the following example.
```js
// Schema 1 - Query for data encoded in the measurement name
from(bucket:"<database>/<retention_policy>")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement =~ /\.north$/ and r._field == "temp")
|> mean()
// Schema 2 - Query for data encoded in tags
from(bucket:"<database>/<retention_policy>")
// Query *Good Measurements*, data stored in separate tags (recommended)
from(bucket: "<database>/<retention_policy>")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement == "weather_sensor" and r.region == "north" and r._field == "temp")
|> mean()
```
##### InfluxQL
**Difficult to query**: [_Bad Measurements_](#bad-measurements-schema) requires regular expressions to extract `plot` and `region` from the measurement, as in the following example.
```js
// Query *Bad Measurements*, data encoded in the measurement (not recommended)
from(bucket: "<database>/<retention_policy>")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement =~ /\.north$/ and r._field == "temp")
|> mean()
```
Complex measurements make some queries impossible. For example, calculating the average temperature of both plots is not possible with the [_Bad Measurements_](#bad-measurements-schema) schema.
##### InfluxQL example to query schemas
```
# Schema 1 - Query for data encoded in the measurement name
# Query *Bad Measurements*, data encoded in the measurement (not recommended)
> SELECT mean("temp") FROM /\.north$/
# Schema 2 - Query for data encoded in tags
# Query *Good Measurements*, data stored in separate tag values (recommended)
> SELECT mean("temp") FROM "weather_sensor" WHERE "region" = 'north'
```
### Avoid putting more than one piece of information in one tag
Splitting a single tag with multiple pieces into separate tags simplifies your queries and reduces the need for regular expressions.
Splitting a single tag with multiple pieces into separate tags simplifies your queries and improves performance by
reducing the need for regular expressions.
Consider the following schema represented by line protocol.
#### Example line protocol schemas
```
Schema 1 - Multiple data encoded in a single tag
-------------
@ -159,7 +183,7 @@ weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000
Use Flux or InfluxQL to calculate the average `temp` for blueberries in the `north` region.
Schema 2 is preferable because using multiple tags, you don't need a regular expression.
##### Flux
#### Flux example to query schemas
```js
// Schema 1 - Query for multiple data encoded in a single tag
@ -175,7 +199,7 @@ from(bucket:"<database>/<retention_policy>")
|> mean()
```
##### InfluxQL
#### InfluxQL example to query schemas
```
# Schema 1 - Query for multiple data encoded in a single tag

View File

@ -2,7 +2,7 @@
title: Example post
description: This is just an example post to show the format of new 2.0 posts
weight: 1
# draft: true
draft: true
related:
- /influxdb/v2.0/write-data/
- /influxdb/v2.0/write-data/quick-start

View File

@ -44,7 +44,7 @@ Input data.
Default is piped-forward data ([`<-`](/flux/v0.x/spec/expressions/#pipe-expressions)).
## Output tables
For each input table with `n` rows, `derivative()` outputs a table with `n - 1` rows.
For each input table with `n` rows, `increase()` outputs a table with `n - 1` rows.
## Examples

View File

@ -93,14 +93,14 @@ to illustrate how `join()` transforms data.
import "generate"
t1 = generate.from(count: 4, fn: (n) => n + 1, start: 2021-01-01T00:00:00Z, stop: 2021-01-05T00:00:00Z)
|> set(key: "tag", value: "foo")
|> set(key: "tag", value: "foo")
t2 = generate.from(count: 4, fn: (n) => n * -1, start: 2021-01-01T00:00:00Z, stop: 2021-01-05T00:00:00Z)
|> set(key: "tag", value: "foo")
|> set(key: "tag", value: "foo")
join(
tables: {t1: t1, t2: t2},
on: ["_time", "tag"]
tables: {t1: t1, t2: t2},
on: ["_time", "tag"],
)
```
@ -146,22 +146,22 @@ joined with Flux.
```js
data_1 = from(bucket:"example-bucket")
|> range(start:-15m)
|> filter(fn: (r) =>
r._measurement == "cpu" and
r._field == "usage_system"
)
|> range(start:-15m)
|> filter(fn: (r) =>
r._measurement == "cpu" and
r._field == "usage_system"
)
data_2 = from(bucket:"example-bucket")
|> range(start:-15m)
|> filter(fn: (r) =>
r._measurement == "mem" and
r._field == "used_percent"
)
|> range(start:-15m)
|> filter(fn: (r) =>
r._measurement == "mem" and
r._field == "used_percent"
)
join(
tables: {d1: data_1, d2: data_2},
on: ["_time", "host"]
tables: {d1: data_1, d2: data_2},
on: ["_time", "host"],
)
```
@ -205,8 +205,8 @@ are illustrated below:
#### join() output
```js
join(
tables: {t1: t1, t2: t2}
on: ["_time", "tag"]
tables: {t1: t1, t2: t2},
on: ["_time", "tag"],
)
```

View File

@ -46,12 +46,12 @@ to illustrate how `union()` transforms data.
import "generate"
t1 = generate.from(count: 4, fn: (n) => n + 1, start: 2021-01-01T00:00:00Z, stop: 2021-01-05T00:00:00Z)
|> set(key: "tag", value: "foo")
|> group(columns: ["tag"])
|> set(key: "tag", value: "foo")
|> group(columns: ["tag"])
t2 = generate.from(count: 4, fn: (n) => n * -1, start: 2021-01-01T00:00:00Z, stop: 2021-01-05T00:00:00Z)
|> set(key: "tag", value: "bar")
|> group(columns: ["tag"])
|> set(key: "tag", value: "bar")
|> group(columns: ["tag"])
union(tables: [t1, t2])
```
@ -130,12 +130,12 @@ A single stream of tables
import "generate"
t1 = generate.from(count: 4, fn: (n) => n + 1, start: 2021-01-01T00:00:00Z, stop: 2021-01-05T00:00:00Z)
|> set(key: "tag", value: "foo")
|> group()
|> set(key: "tag", value: "foo")
|> group()
t2 = generate.from(count: 4, fn: (n) => n * -1, start: 2021-01-01T00:00:00Z, stop: 2021-01-05T00:00:00Z)
|> set(key: "tag", value: "bar")
|> group()
|> set(key: "tag", value: "bar")
|> group()
union(tables: [t1, t2])
```
@ -248,8 +248,8 @@ union(tables: [t1, t2])
#### join() output
```js
join(
tables: {t1: t1, t2: t2}
on: ["_time", "tag"]
tables: {t1: t1, t2: t2},
on: ["_time", "tag"],
)
```

View File

@ -30,9 +30,9 @@ The `bucket-schema` examples below reference [**InfluxDB data elements**](/influ
- [Create a bucket schema](#create-a-bucket-schema)
- [Update a bucket schema](#update-a-bucket-schema)
- [Troubleshoot errors](#troubleshoot-errors)
- [Troubleshoot write errors](#troubleshoot-write-errors)
### Create a bucket schema
## Create a bucket schema
Use the `influx` CLI to set the schema-type and measurement schemas for your bucket:
1. Create a bucket with the `schema-type` flag set to `explicit`.
@ -68,9 +68,9 @@ Use the `influx` CLI to set the schema-type and measurement schemas for your buc
{{% /code-tab-content %}}
{{% /code-tabs-wrapper %}}
#### Write valid schemas
#### Write valid schemas
To ensure your schema is valid, review [InfluxDB data elements](/influxdb/cloud/reference/key-concepts/data-elements/).
Follow these rules when creating your schema columns file:
Follow these rules when creating your schema columns file:
1. Use valid measurement and column names that:
- Are unique within the schema
- Are 1 to 128 characters long
@ -81,14 +81,14 @@ Use the `influx` CLI to set the schema-type and measurement schemas for your buc
2. Include a column with the [`timestamp`](/influxdb/cloud/reference/key-concepts/data-elements/#timestamp) type.
3. Include at least one column with the [`field`](/influxdb/cloud/reference/key-concepts/data-elements/#fields) type (without a field, there is no time-series data), as in the following example:
**Valid**: a schema with [`timestamp`]() and [`field`]() columns.
**Valid**: a schema with [`timestamp`]() and [`field`]() columns.
```json
[
{"name":"time","type":"timestamp"},
{"name":"fsWrite","type":"field","dataType":"float"}
]
```
**Not valid**: a schema without a `field` column.
```json
[
@ -120,7 +120,16 @@ Use the `influx` CLI to set the schema-type and measurement schemas for your buc
--columns-file sensor.ndjson
```
### Update a bucket schema
### Troubleshoot create errors
#### Failed to create measurement
If you attempt to `create` a schema for an existing measurement name, InfluxDB rejects the new schema and returns the following error:
```sh
Error: failed to create measurement: 422 Unprocessable Entity
```
## Update a bucket schema
Use the [`influx bucket-schema update` command](/influxdb/cloud/reference/cli/influx/bucket-schema/update) to add new columns to a schema. You cannot modify or delete columns in bucket schemas.
@ -164,42 +173,12 @@ Use the [`extended-output` flag](/influxdb/cloud/reference/cli/influx/bucket-sch
--columns-file sensor.ndjson
```
### Troubleshoot errors
## Troubleshoot write errors
Troubleshoot and resolve the following bucket schema errors:
- [Not permitted by schema](#not-permitted-by-schema)
- [No measurement schemas](#no-measurement-schemas)
- [Failed to create measurement](#failed-to-create-measurement)
InfluxDB returns an error for the following reasons:
#### Not permitted by schema
If data in the write request doesn't conform to the defined schema, InfluxDB returns an error.
- data in the write request doesn't conform to a defined schema.
- data in the write request doesn't have a schema defined for the bucket.
- data in the write request has invalid syntax.
In the following example, the *cpu* measurement has an incorrect `usage_user` [data type](/influxdb/cloud/reference/glossary/#data-type):
```sh
influx write -b my_explicit_bucket 'cpu,host=myHost usage_user="1001" 1556896326'
```
The following error occurs:
```sh
Error: failed to write data:
unable to parse 'cpu,host=myHost usage_user="1001" 1556896326':
schema: field type for field "usage_user" not permitted by schema; got String but expected Float
```
#### No measurement schemas
If you attempt to write to a bucket that has schema-type `explicit` and doesn't have a defined schema, the
bucket rejects write attempts and returns the following error:
```sh
Error: failed to write data: schema: bucket "my_explicit_bucket" contains
no measurement schemas
```
#### Failed to create measurement
If you attempt to `create` a schema for an existing measurement name, InfluxDB rejects the new schema and returns the following error:
```sh
Error: failed to create measurement: 422 Unprocessable Entity
```
To resolve failures and partial writes, see how to [troubleshoot writes](/influxdb/cloud/write-data/troubleshoot/).

View File

@ -31,7 +31,7 @@ Approximate sample dataset sizes are listed for each [sample dataset](/influxdb/
- **Bird migration sample data**: Explore, visualize, and monitor the latitude and longitude of bird migration patterns.
- **NOAA NDBC sample data**: Explore, visualize, and monitor NDBC's observations from their buoys. This data observes air temperature, wind speed, and more from specific locations.
- **NOAA water sample data**: Explore, visualize, and monitor temperature, water level, pH, and quality from specific locations.
- **USGS Earthquake data**: Explore, visualize, and monitor earthquake monitoring data. This data includes alerts, cdi, quarry blast, magnitide, and more.
- **USGS Earthquake data**: Explore, visualize, and monitor earthquake monitoring data. This data includes alerts, cdi, quarry blast, magnitude, and more.
2. Do one of the following to download sample data:
- [Add sample data with community template](#add-sample-data-with-community-templates)
- [Add sample data using the InfluxDB UI](#add-sample-data)
@ -42,6 +42,8 @@ Approximate sample dataset sizes are listed for each [sample dataset](/influxdb/
{{< nav-icon "settings" >}}
2. Paste the Sample Data community temple URL in **resource manifest file** field:
2. Paste the [Sample Data community template URL](https://github.com/influxdata/community-templates/blob/master/sample-data/sample-data.yml) in the **resource manifest file** field and click the **{{< caps >}}Lookup Template{{< /caps >}}** button.
#### Sample Data community template URL

View File

@ -19,4 +19,4 @@ related:
- /{{< latest "flux" >}}/stdlib/influxdata/influxdb/monitor/notify/
---
{{< duplicate-oss >}}
{{% duplicate-oss %}}

View File

@ -67,19 +67,20 @@ Use the `influx` CLI packaged with InfluxDB 2.x and the
to set up the connection configurations for both your InfluxDB Cloud instance and
your InfluxDB 2.x instance.
Include the following flags for each configuration:
Include the following flags for each configuration:
- **-\-config-name**:
Unique name for the connection configuration.
The examples below use `cloud` and `oss` respectively.
- **-\-host-url**:
[InfluxDB Cloud region URL](/influxdb/cloud/reference/regions/) or
[InfluxDB 2.x URL](/{{< latest "influxdb" >}}/reference/urls/).
- **-\-org**:
InfluxDB organization name.
The default organization name in InfluxDB Cloud is the email address associated with your account.
- **-\-token**: API token to use to connect to InfluxDB.
Provide an **All-Access** token (or an [Operator token](/{{< latest "influxdb" >}}/security/tokens/#operator-token) for 2.x).
- **-\-config-name**:
Unique name for the connection configuration.
The examples below use `cloud` and `oss` respectively.
- **-\-host-url**:
[InfluxDB Cloud region URL](/influxdb/cloud/reference/regions/) or
[InfluxDB 2.x URL](/{{< latest "influxdb" >}}/reference/urls/).
- **-\-org**:
InfluxDB organization name.
The default organization name in InfluxDB Cloud is the email address associated with your account.
- **-\-token**: API token to use to connect to InfluxDB.
- **InfluxDB Cloud**: Provide an **All-Access** token.
- **InfluxDB OSS 2.x**: Provide an [Operator token](/{{< latest "influxdb" >}}/security/tokens/#operator-token).
##### Create an InfluxDB Cloud connection configuration
```sh

View File

@ -1,7 +1,7 @@
---
title: InfluxDB schema design
description: >
Improve InfluxDB schema design and data layout. Store unique values in fields and other tips to make your data more performant.
Design your schema for simpler and more performant queries.
menu:
influxdb_cloud:
name: Schema design

View File

@ -4,12 +4,12 @@ seotitle: Troubleshoot issues writing data
list_title: Troubleshoot issues writing data
weight: 105
description: >
Troubleshoot issues writing data. Find response codes for failed writes. Discover how writes fail, from exceeding rate or payload limits, to syntax errors and schema conflicts.
Troubleshoot issues writing data. Find response codes for failed writes. Discover how writes fail, from exceeding rate or payload limits, to syntax errors and schema conflicts. Find suggestions to fix failures.
menu:
influxdb_cloud:
name: Troubleshoot issues
parent: Write data
nfluxdb/cloud/tags: [write, line protocol, errors]
influxdb/cloud/tags: [write, line protocol, errors]
related:
- /influxdb/cloud/api/#tag/Write, InfluxDB API /write endpoint
- /influxdb/cloud/reference/internals
@ -18,45 +18,51 @@ related:
Learn how to handle and recover from errors when writing to InfluxDB.
- [Discover common failure scenarios](#common-failure-scenarios)
- [Discover common failure scenarios](#discover-common-failure-scenarios)
- [Review HTTP status codes](#review-http-status-codes)
- [Troubleshoot failures](#troubleshoot-failures)
- [Troubleshoot rejected points](#troubleshoot-rejected-points)
## Common failure scenarios
## Discover common failure scenarios
InfluxDB write requests may fail for a number of reasons.
Write requests made to InfluxDB may fail for a number of reasons.
Common failure scenarios that return an HTTP `4xx` or `5xx` error status code include the following:
- API token was invalid. See how to [manage API tokens](/influxdb/cloud/security/tokens/).
- Exceeded a rate limit.
- API token was invalid.
- Payload size was too large.
- Client or server reached a timeout threshold.
- Size of the data payload was too large.
- Data was not formatted correctly.
- Data was not formatted correctly. See how to [find parsing errors](#find-parsing-errors)
- Data did not conform to the [explicit bucket schema](/influxdb/cloud/organizations/buckets/bucket-schema/).
See how to troubleshoot specific [bucket schema errors](/influxdb/cloud/organizations/buckets/bucket-schema/#troubleshoot-errors).
See how to resolve [explicit schema rejections](#resolve-explicit-schema-rejections).
To find the causes of a specific error, [review HTTP status codes](#review-http-status-codes).
### Troubleshoot partial writes
Writes may fail partially or completely even though InfluxDB returns an HTTP `2xx` status code for a valid request.
For example, a partial write may occur when InfluxDB writes all points that conform to the bucket schema, but rejects points that have the wrong data type in a field.
To resolve partial writes and rejected points, see [troubleshoot failures](#troubleshoot-failures).
## HTTP status codes
## Review HTTP status codes
InfluxDB uses conventional HTTP status codes to indicate the success or failure of a request.
Write requests return the following status codes:
- `204` **Success**: InfluxDB validated the request data format and accepted the data for writing to the bucket.
{{% note %}}
`204` doesn't indicate a successful write operation since writes are asynchronous.
If some of your data did not write to the bucket, see how to [troubleshoot rejected points](#troubleshoot-rejected-points).
`204` doesn't indicate a successful write operation given writes are asynchronous. If some of your data did not write to the bucket, see how to [check for rejected points](#review-rejected-points).
{{% /note %}}
- `400` **Bad request**: The [line protocol](/influxdb/cloud/reference/syntax/line-protocol/) data in the request was malformed.
The response body contains the first malformed line in the data. All request data was rejected and not written.
- `400` **Bad request**: InfluxDB rejected some or all of the request data.
`code` and `message` in the response body provide details about the problem.
For more information, see how to [troubleshoot rejected points](#troubleshoot-rejected-points).
- `401` **Unauthorized**: May indicate one of the following:
- [`Authorization: Token` header](/influxdb/cloud/api-guide/api_intro/#authentication) is missing or malformed.
- [API token](/influxdb/cloud/api-guide/api_intro/#authentication) value is missing from the header.
- API token does not have sufficient permissions to write to the organization and bucket. For more information about token types and permissions, see [Manage API tokens](/influxdb/cloud/security/tokens/)
- `404` **Not found**: A requested resource (e.g. an organization or bucket) was not found. The response body contains the requested resource type, e.g. "organization", and resource name.
- `413` **Request entity too large**: The payload exceeded the 50MB limit. All request data was rejected and not written.
- API token does not have sufficient permissions to write to the organization and bucket.
For more information about token types and permissions, see [Manage API tokens](/influxdb/cloud/security/tokens/)
- `404` **Not found**: A requested resource (e.g. an organization or bucket) was not found.
The response body contains the requested resource type, e.g. "organization", and resource name.
- `413` **Request entity too large**: The write request payload exceeded the size limit (**50 MB *uncompressed*** data or **250 MB *decompressed***).
- `429` **Too many requests**: API token is temporarily over quota. The `Retry-After` header describes when to try the write request again.
- `500` **Internal server error**: Default HTTP status for an error.
- `503` **Service unavailable**: Server is temporarily unavailable to accept writes. The `Retry-After` header describes when to try the write again.
@ -67,20 +73,191 @@ The `message` property of the response body may contain additional details about
If you notice data is missing in your bucket, do the following:
- Check the `message` property in the response body for details about the error.
- Check the `message` property in the response body for details about the error, e.g. `partial write error` indicates [rejected points](#troubleshoot-rejected-points).
- Check for [rejected points](#troubleshoot-rejected-points) in your organization's `_monitoring` bucket.
- Verify all lines contain valid syntax, e.g. [line protocol](/influxdb/cloud/reference/syntax/line-protocol/) or [CSV](/influxdb/cloud/reference/syntax/annotated-csv/).
- Verify the data types match the [bucket schema](/influxdb/cloud/organizations/buckets/bucket-schema/).
For example, did you attempt to write `string` data to an `int` field?
- Verify the data types match the [series](/influxdb/cloud/reference/key-concepts/data-elements/#series) or [bucket schema](/influxdb/cloud/organizations/buckets/bucket-schema/).
- Verify the timestamps match the [precision parameter](/influxdb/cloud/write-data/#timestamp-precision).
- Minimize payload size and network errors by [optimizing writes](/influxdb/cloud/write-data/best-practices/optimize-writes/)
- Minimize payload size and network errors by [optimizing writes](/influxdb/cloud/write-data/best-practices/optimize-writes/).
### Troubleshoot rejected points
## Troubleshoot rejected points
InfluxDB may reject points even if the HTTP request returns "Success".
If some of your data did not write to the bucket, check for [field type](/influxdb/cloud/reference/key-concepts/data-elements/#field-value) differences between the missing data point and other points that have the same [series](/influxdb/cloud/reference/key-concepts/data-elements/#series).
For example, did you attempt to write `string` data to an `int` field?
InfluxDB may have rejected points even if the HTTP request returned "Success".
InfluxDB logs rejected data points and associated errors to your organization's `_monitoring` bucket.
InfluxDB rejects points for the following reasons:
- The **batch** contains another point with the same series, but one of the fields has a different value type.
- The **bucket** contains a saved point with the same series, but one of the fields has a different value type.
- [Review rejected points](#review-rejected-points)
- [Find parsing errors](#find-parsing-errors)
- [Find data type conflicts and schema rejections](#find-data-type-conflicts-and-schema-rejections)
- [Resolve data type conflicts](#resolve-data-type-conflicts)
- [Resolve explicit schema rejections](#resolve-explicit-schema-rejections)
### Review rejected points
To get a log of rejected points, query the [`rejected_points` measurement](/influxdb/cloud/reference/internals/system-buckets/#_monitoring-bucket-schema) in your organization's `_monitoring` bucket.
To more quickly locate `rejected_points`, keep the following in mind:
- If your line protocol batch contains single lines with multiple [fields](/influxdb/cloud/reference/syntax/line-protocol/#field-set), InfluxDB logs an entry for each point (each unique field) that is rejected.
- Each entry contains a `reason` tag that describes why the point was rejected.
- Entries for [data type conflicts and schema rejections](#find-data-type-conflicts-and-schema-rejections) have a `count` field value of `1`.
- Entries for [parsing errors](#find-parsing-errors) contain an `error` field (and don't contain a `count` field).
#### rejected_points schema
| Name | Value |
|:------ |:----- |
| `_measurement`| `rejected_points` |
| `_field` | [`count`](#find-data-type-conflicts-and-schema-rejections) or [`error`](#find-parsing-errors) |
| `_value` | [`1`](#find-data-type-conflicts-and-schema-rejections) or [error details](#find-parsing-errors) |
| `bucket` | ID of the bucket that rejected the point |
| `measurement` | Measurement name of the point |
| `field` | Name of the field that caused the rejection |
| `reason` | Brief description of the problem. See specific reasons in [field type conflicts and schema rejections](#find-field-type-conflicts-and-schema-rejections). |
| `gotType` | Received [field](/influxdb/cloud/reference/key-concepts/data-elements/#field-value) type: `Boolean`, `Float`, `Integer`, `String`, or `UnsignedInteger` |
| `wantType` | Expected [field](/influxdb/cloud/reference/key-concepts/data-elements/#field-value) type: `Boolean`, `Float`, `Integer`, `String`, or `UnsignedInteger` |
| `<timestamp>` | Time the rejected point was logged |
#### Find parsing errors
If InfluxDB can't parse a line (e.g. due to syntax problems), the response `message` might not provide details.
To find parsing error details, query `rejected_points` entries that contain the `error` field (instead of the `count` field).
```js
from(bucket: "_monitoring")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "rejected_points")
|> filter(fn: (r) => r._field == "error")
```
#### Find data type conflicts and schema rejections
To find `rejected_points` caused by [data type conflicts](#resolve-data-type-conflicts) or [schema rejections](#resolve-explicit-schema-rejections),
query for the `count` field.
```js
from(bucket: "_monitoring")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "rejected_points")
|> filter(fn: (r) => r._field == "count")
```
### Resolve data type conflicts
When you write to a bucket that has the `implicit` schema type, InfluxDB compares new points to points that have the same [series](/influxdb/cloud/reference/key-concepts/data-elements/#series).
If a point has a field with a different data type than the series, InfluxDB rejects the point and logs a `rejected_points` entry.
The `rejected_points` entry contains one of the following reasons:
| Reason | Meaning |
|:------ |:------- |
| `type conflict in batch write` | The **batch** contains another point with the same series, but one of the fields has a different value type. |
| `type conflict with existing data` | The **bucket** contains another point with the same series, but one of the fields has a different value type. |
### Resolve explicit schema rejections
Buckets with the [`explicit` schema type] use [explicit bucket schemas](/influxdb/cloud/organizations/buckets/bucket-schema/).
When you write to a bucket that uses explicit bucket schemas,
InfluxDB rejects data that don't conform to one of the configured schemas.
Learn how to interpret `rejected_points` logging for [explicit bucket schemas](/influxdb/cloud/organizations/buckets/bucket-schema/).
- [Detect a measurement mismatch](#detect-a-measurement-mismatch)
- [Detect a field type mismatch](#detect-a-field-type-mismatch)
##### Detect a measurement mismatch
InfluxDB rejects a point if the [measurement](/influxdb/cloud/reference/key-concepts/data-elements/#measurement) doesn't match the **name** of a [bucket schema](/influxdb/cloud/organizations/buckets/bucket-schema/).
The `rejected_points` entry contains the following `reason` tag value:
| Reason | Meaning |
|:------ |:-------
| `measurement not allowed by schema` | The **bucket** is configured to use explicit schemas and none of the schemas matches the **measurement** of the point. |
Consider the following [line protocol](/influxdb/cloud/reference/syntax/line-protocol) data.
```
airSensors,sensorId=TLM0201 temperature=73.97,humidity=35.23,co=0.48 1637014074
```
The line has an `airSensors` measurement and three fields (`temperature`, `humidity`, and `co`).
If you try to write this data to a bucket that has the [`explicit` schema type](/influxdb/cloud/organizations/buckets/bucket-schema/) and doesn't have an `airSensors` schema, the `/api/v2/write` InfluxDB API returns an error and the following data:
```json
{
"code": "invalid",
"message": "3 out of 3 points rejected (check rejected_points in your _monitoring bucket for further information)"
}
```
InfluxDB logs three `rejected_points` entries, one for each field.
| _measurement | _field | _value | field | measurement | reason |
|:----------------|:-------|:-------|:------------|:------------|:----------------------------------|
| rejected_points | count | 1 | humidity | airSensors | measurement not allowed by schema |
| rejected_points | count | 1 | co | airSensors | measurement not allowed by schema |
| rejected_points | count | 1 | temperature | airSensors | measurement not allowed by schema |
##### Detect a field type mismatch
InfluxDB rejects a point if the [measurement](/influxdb/cloud/reference/key-concepts/data-elements/#measurement) matches the **name** of a bucket schema and the field data types don't match.
The `rejected_points` entry contains the following reason:
| Reason | Meaning |
|:------------------------------------|:-----------------------------------------------------------------------------------------------------|
| `field type mismatch with schema` | The point has the same measurement as a configured schema and they have different field value types. |
Consider a bucket that has the following `airSensors` [`explicit bucket schema`](/influxdb/cloud/organizations/buckets/bucket-schema/):
```json
{
"name": "airSensors",
"columns": [
{
"name": "time",
"type": "timestamp"
},
{
"name": "sensorId",
"type": "tag"
},
{
"name": "temperature",
"type": "field",
"dataType": "float"
},
{
"name": "humidity",
"type": "field",
"dataType": "float"
},
{
"name": "co",
"type": "field",
"dataType": "float"
}
]
}
```
The following [line protocol](/influxdb/cloud/reference/syntax/line-protocol/) data has an `airSensors` measurement, a `sensorId` tag, and three fields (`temperature`, `humidity`, and `co`).
```
airSensors,sensorId=L1 temperature=90.5,humidity=70.0,co=0.2 1637014074
airSensors,sensorId=L1 temperature="90.5",humidity=70.0,co=0.2 1637014074
```
In the example data, one of the points has a `temperature` field value with the _string_ data type.
However, the `airSensors` schema requires `temperature` to have the _float_ data type.
If you try to write the example data to the `airSensors` bucket schema,
InfluxDB returns a `400` error and a message that describes the result:
```json
{
"code": "invalid",
"message": "partial write error (5 accepted): 1 out of 6 points rejected (check rejected_points in your _monitoring bucket for further information)"
}
```
InfluxDB logs the following `rejected_points` entry to the `_monitoring` bucket:
| _measurement | _field | _value | bucket | field | gotType | measurement | reason | wantType |
|:------------------|:-------|:-------|:-------------------|:--------------|:---------|:------------|:----------------------------------|:---------|
| rejected_points | count | 1 | a7d5558b880a93da | temperature | String | airSensors | field type mismatch with schema | Float |

View File

@ -1,7 +1,7 @@
---
title: InfluxDB schema design and data layout
description: >
General guidelines for InfluxDB schema design and data layout.
Improve InfluxDB schema design and data layout to reduce high cardinality and make your data more performant.
menu:
influxdb_1_8:
name: Schema design and data layout
@ -9,132 +9,161 @@ menu:
parent: Concepts
---
Every InfluxDB use case is special and your [schema](/influxdb/v1.8/concepts/glossary/#schema) will reflect that uniqueness.
There are, however, general guidelines to follow and pitfalls to avoid when designing your schema.
Each InfluxDB use case is unique and your [schema](/influxdb/v1.8/concepts/glossary/#schema) reflects that uniqueness.
In general, a schema designed for querying leads to simpler and more performant queries.
We recommend the following design guidelines for most use cases:
<table style="width:100%">
<tr>
<td><a href="#general-recommendations">General Recommendations</a></td>
<td><a href="#encouraged-schema-design">Encouraged Schema Design</a></td>
<td><a href="#discouraged-schema-design">Discouraged Schema Design</a></td>
<td><a href="#shard-group-duration-management">Shard Group Duration Management</a></td>
</tr>
</table>
- [Where to store data (tag or field)](#where-to-store-data-tag-or-field)
- [Avoid too many series](#avoid-too-many-series)
- [Use recommended naming conventions](#use-recommended-naming-conventions)
- [Shard Group Duration Management](#shard-group-duration-management)
## General recommendations
## Where to store data (tag or field)
### Encouraged schema design
Your queries should guide what data you store in [tags](/influxdb/v1.8/concepts/glossary/#tag) and what you store in [fields](/influxdb/v1.8/concepts/glossary/#field) :
We recommend that you:
- Store commonly-queried and grouping ([`group()`](/flux/v0.x/stdlib/universe/group) or [`GROUP BY`](/influxdb/v1.8/query_language/explore-data/#group-by-tags)) metadata in tags.
- Store data in fields if each data point contains a different value.
- Store numeric values as fields ([tag values](/influxdb/v1.8/concepts/glossary/#tag-value) only support string values).
- [Encode meta data in tags](#encode-meta-data-in-tags)
- [Avoid using keywords as tag or field names](#avoid-using-keywords-as-tag-or-field-names)
## Avoid too many series
#### Encode meta data in tags
IndexDB indexes the following data elements to speed up reads:
[Tags](/influxdb/v1.8/concepts/glossary/#tag) are indexed and [fields](/influxdb/v1.8/concepts/glossary/#field) are not indexed.
This means that queries on tags are more performant than those on fields.
- [measurement](/influxdb/v1.8/concepts/glossary/#measurement)
- [tags](/influxdb/v1.8/concepts/glossary/#tag)
In general, your queries should guide what gets stored as a tag and what gets stored as a field:
[Tag values](/influxdb/v1.8/concepts/glossary/#tag-value) are indexed and [field values](/influxdb/v1.8/concepts/glossary/#field-value) are not.
This means that querying by tags is more performant than querying by fields.
However, when too many indexes are created, both writes and reads may start to slow down.
- Store commonly-queried meta data in tags
- Store data in tags if you plan to use them with the InfluxQL `GROUP BY` clause
- Store data in fields if you plan to use them with an [InfluxQL](/influxdb/v1.8/query_language/functions/) function
- Store numeric values as fields ([tag values](/influxdb/v1.8/concepts/glossary/#tag-value) only support string values)
Each unique set of indexed data elements forms a [series key](/influxdb/v1.8/concepts/glossary/#series-key).
[Tags](/influxdb/v1.8/concepts/glossary/#tag) containing highly variable information like unique IDs, hashes, and random strings lead to a large number of [series](/influxdb/v1.8/concepts/glossary/#series), also known as high [series cardinality](/influxdb/v1.8/concepts/glossary/#series-cardinality).
High series cardinality is a primary driver of high memory usage for many database workloads.
Therefore, to reduce memory consumption, consider storing high-cardinality values in field values rather than in tags or field keys.
#### Avoid using keywords as tag or field names
{{% note %}}
Not required, but simplifies writing queries because you won't have to wrap tag or field names in double quotes.
See [InfluxQL](https://github.com/influxdata/influxql/blob/master/README.md#keywords) and [Flux](https://github.com/influxdata/flux/blob/master/docs/SPEC.md#keywords) keywords to avoid.
If reads and writes to InfluxDB start to slow down, you may have high series cardinality (too many series).
See [how to find and reduce series high cardinality](/influxdb/v1.8/troubleshooting/frequently-asked-questions/#why-does-series-cardinality-matter).
Also, if a tag or field name contains characters other than `[A-z,_]`, you must wrap it in double quotes in InfluxQL or use [bracket notation](/{{< latest "influxdb" "v2" >}}/query-data/get-started/syntax-basics/#records) in Flux.
{{% /note %}}
### Discouraged schema design
## Use recommended naming conventions
We recommend that you:
Use the following conventions when naming your tag and field keys:
- [Avoid too many series](#avoid-too-many-series)
- [Avoid the same name for a tag and a field](#avoid-the-same-name-for-a-tag-and-a-field)
- [Avoid encoding data in measurement names](#avoid-encoding-data-in-measurement-names)
- [Avoid putting more than one piece of information in one tag](#avoid-putting-more-than-one-piece-of-information-in-one-tag)
- [Avoid reserved keywords in tag and field keys](#avoid-reserved-keywords-in-tag-and-field-keys)
- [Avoid the same tag and field name](#avoid-the-same-name-for-a-tag-and-a-field)
- [Avoid encoding data in measurements and keys](#avoid-encoding-data-in-measurements-and-keys)
- [Avoid more than one piece of information in one tag](#avoid-putting-more-than-one-piece-of-information-in-one-tag)
#### Avoid too many series
### Avoid reserved keywords in tag and field keys
[Tags](/influxdb/v1.8/concepts/glossary/#tag) containing highly variable information like UUIDs, hashes, and random strings lead to a large number of [series](/influxdb/v1.8/concepts/glossary/#series) in the database, also known as high series cardinality. High series cardinality is a primary driver of high memory usage for many database workloads.
Not required, but avoiding the use of reserved keywords in your tag and field keys simplifies writing queries because you won't have to wrap your keys in double quotes.
See [InfluxQL](https://github.com/influxdata/influxql/blob/master/README.md#keywords) and [Flux keywords](/{{< latest "flux" >}}/spec/lexical-elements/#keywords) to avoid.
See [Hardware sizing guidelines](/influxdb/v1.8/guides/hardware_sizing/) for [series cardinality](/influxdb/v1.8/concepts/glossary/#series-cardinality) recommendations based on your hardware. If the system has memory constraints, consider storing high-cardinality data as a field rather than a tag.
Also, if a tag or field key contains characters other than `[A-z,_]`, you must wrap it in double quotes in InfluxQL or use [bracket notation](/{{< latest "flux" >}}/data-types/composite/record/#bracket-notation) in Flux.
#### Avoid the same name for a tag and a field
### Avoid the same name for a tag and a field
Avoid using the same name for a tag and field key.
This often results in unexpected behavior when querying data.
If you inadvertently add the same name for a tag and field key, see
If you inadvertently add the same name for a tag and a field, see
[Frequently asked questions](/influxdb/v1.8/troubleshooting/frequently-asked-questions/#tag-and-field-key-with-the-same-name)
for information about how to query the data predictably and how to fix the issue.
#### Avoid encoding data in measurement names
### Avoid encoding data in measurements and keys
InfluxDB queries merge data that falls within the same [measurement](/influxdb/v1.8/concepts/glossary/#measurement); it's better to differentiate data with [tags](/influxdb/v1.8/concepts/glossary/#tag) than with detailed measurement names. If you encode data in a measurement name, you must use a regular expression to query the data, making some queries more complicated or impossible.
_Example:_
Store data in [tag values](/influxdb/v1.8/concepts/glossary/#tag-value) or [field values](/influxdb/v1.8/concepts/glossary/#field-value), not in [tag keys](/influxdb/v1.8/concepts/glossary/#tag-key), [field keys](/influxdb/v1.8/concepts/glossary/#field-key), or [measurements](/influxdb/v1.8/concepts/glossary/#measurement). If you design your schema to store data in tag and field values,
your queries will be easier to write and more efficient.
Consider the following schema represented by line protocol.
In addition, you'll keep cardinality low by not creating measurements and keys as you write data.
To learn more about the performance impact of high series cardinality, see [how to find and reduce high series cardinality](/influxdb/v1.8/troubleshooting/frequently-asked-questions/#why-does-series-cardinality-matter).
#### Compare schemas
Compare the following valid schemas represented by line protocol.
**Recommended**: the following schema stores metadata in separate `crop`, `plot`, and `region` tags. The `temp` field contains variable numeric data.
##### {id="good-measurements-schema"}
```
Schema 1 - Data encoded in the measurement name
-------------
blueberries.plot-1.north temp=50.1 1472515200000000000
blueberries.plot-2.midwest temp=49.8 1472515200000000000
```
The long measurement names (`blueberries.plot-1.north`) with no tags are similar to Graphite metrics.
Encoding the `plot` and `region` in the measurement name makes the data more difficult to query.
For example, calculating the average temperature of both plots 1 and 2 is not possible with schema 1.
Compare this to schema 2:
```
Schema 2 - Data encoded in tags
Good Measurements schema - Data encoded in tags (recommended)
-------------
weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000
```
Use Flux or InfluxQL to calculate the average `temp` for blueberries in the `north` region:
**Not recommended**: the following schema stores multiple attributes (`crop`, `plot` and `region`) concatenated (`blueberries.plot-1.north`) within the measurement, similar to Graphite metrics.
##### Flux
##### {id="bad-measurements-schema"}
```
Bad Measurements schema - Data encoded in the measurement (not recommended)
-------------
blueberries.plot-1.north temp=50.1 1472515200000000000
blueberries.plot-2.midwest temp=49.8 1472515200000000000
```
**Not recommended**: the following schema stores multiple attributes (`crop`, `plot` and `region`) concatenated (`blueberries.plot-1.north`) within the field key.
##### {id="bad-keys-schema"}
```
Bad Keys schema - Data encoded in field keys (not recommended)
-------------
weather_sensor blueberries.plot-1.north.temp=50.1 1472515200000000000
weather_sensor blueberries.plot-2.midwest.temp=49.8 1472515200000000000
```
#### Compare queries
Compare the following queries of the [_Good Measurements_](#good-measurements-schema) and [_Bad Measurements_](#bad-measurements-schema) schemas.
The [Flux](/{{< latest "flux" >}}/) queries calculate the average `temp` for blueberries in the `north` region
**Easy to query**: [_Good Measurements_](#good-measurements-schema) data is easily filtered by `region` tag values, as in the following example.
```js
// Schema 1 - Query for data encoded in the measurement name
from(bucket:"<database>/<retention_policy>")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement =~ /\.north$/ and r._field == "temp")
|> mean()
// Schema 2 - Query for data encoded in tags
from(bucket:"<database>/<retention_policy>")
// Query *Good Measurements*, data stored in separate tag values (recommended)
from(bucket: "<database>/<retention_policy>")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement == "weather_sensor" and r.region == "north" and r._field == "temp")
|> mean()
```
##### InfluxQL
**Difficult to query**: [_Bad Measurements_](#bad-measurements-schema) requires regular expressions to extract `plot` and `region` from the measurement, as in the following example.
```js
// Query *Bad Measurements*, data encoded in the measurement (not recommended)
from(bucket: "<database>/<retention_policy>")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement =~ /\.north$/ and r._field == "temp")
|> mean()
```
Complex measurements make some queries impossible. For example, calculating the average temperature of both plots is not possible with the [_Bad Measurements_](#bad-measurements-schema) schema.
##### InfluxQL example to query schemas
```
# Schema 1 - Query for data encoded in the measurement name
# Query *Bad Measurements*, data encoded in the measurement (not recommended)
> SELECT mean("temp") FROM /\.north$/
# Schema 2 - Query for data encoded in tags
# Query *Good Measurements*, data stored in separate tag values (recommended)
> SELECT mean("temp") FROM "weather_sensor" WHERE "region" = 'north'
```
### Avoid putting more than one piece of information in one tag
Splitting a single tag with multiple pieces into separate tags simplifies your queries and reduces the need for regular expressions.
Splitting a single tag with multiple pieces into separate tags simplifies your queries and improves performance by
reducing the need for regular expressions.
Consider the following schema represented by line protocol.
#### Example line protocol schemas
```
Schema 1 - Multiple data encoded in a single tag
-------------
@ -155,7 +184,7 @@ weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000
Use Flux or InfluxQL to calculate the average `temp` for blueberries in the `north` region.
Schema 2 is preferable because using multiple tags, you don't need a regular expression.
##### Flux
#### Flux example to query schemas
```js
// Schema 1 - Query for multiple data encoded in a single tag
@ -171,7 +200,7 @@ from(bucket:"<database>/<retention_policy>")
|> mean()
```
##### InfluxQL
#### InfluxQL example to query schemas
```
# Schema 1 - Query for multiple data encoded in a single tag

View File

@ -2,7 +2,7 @@
title: Restore data
seotitle: Restore data in InfluxDB
description: >
Use the `influxd restore` command to restore backup data and metadata from InfluxDB.
Use the `influx restore` command to restore backup data and metadata from InfluxDB.
menu:
influxdb_2_0:
parent: Back up & restore data

View File

@ -798,6 +798,13 @@ A tuple of named values represented using a record type.
Regular expressions (regex or regexp) are patterns used to match character combinations in strings.
### rejected point
A data point that InfluxDB could not write to the target bucket.
InfluxDB logs information about rejected points to the `_monitoring` system bucket.
See how to [review rejected writes](/influxdb/v2.0/write-data/troubleshoot/#review-rejected-writes) for more information.
### retention period
The duration of time that a bucket retains data.
Points with timestamps older than their bucket's retention period are dropped.
@ -1092,13 +1099,22 @@ Related entries: [point](#point), [unix timestamp](#unix-timestamp), [RFC3339 ti
### token
Tokens (or API tokens) verify user and organization permissions in InfluxDB.
There are different types of athentication tokens:
There are different types of API tokens:
- **Operator token:** grants full read and write access to all resources in **all organizations in InfluxDB OSS 2.x**.
_InfluxDB Cloud does not support Operator tokens._
{{% oss-only %}}
- **Operator token:** grants full read and write access to all resources in **all organizations in InfluxDB OSS 2.x**. _InfluxDB Cloud does not support Operator tokens._
- **All-Access token:** grants full read and write access to all resources in an organization.
- **Read/Write token:** grants read or write access to specific resources in an organization.
{{% /oss-only %}}
{{% cloud-only %}}
- **All-Access token:** grants full read and write access to all resources in an organization.
- **Read/Write token:** grants read or write access to specific resources in an organization.
{{% /cloud-only %}}
Related entries: [Create a token](/influxdb/v2.0/security/tokens/create-token/).
### tracing

View File

@ -56,6 +56,8 @@ influx auth create \
--all-access
```
{{% oss-only %}}
#### Create an Operator token
Create an Operator token to grant permissions to all resources in all organizations.
@ -65,6 +67,7 @@ influx auth create \
--org my-org \
--operator
```
{{% /oss-only %}}
#### Create a token with specified read permissions

View File

@ -71,7 +71,11 @@ To filter tokens by user, include `userID` as a query parameter in your request.
{{% get-shared-text "api/v2.0/auth/oss/tokens-view-filter.sh" %}}
```
{{% oss-only %}}
[***Operator tokens***](/{{% latest "influxdb" %}}/security/tokens/#operator-token) have access to all organizations' authorizations.
To filter authorizations by organization when using an operator token, include an `org` or `orgID` query parameter in your request.
{{% oss-only %}}
See the [`/authorizations` endpoint documentation](/influxdb/v2.0/api/#tag/Authorizations) for more information about available parameters.

View File

@ -10,11 +10,48 @@ menu:
---
If reads and writes to InfluxDB have started to slow down, high [series cardinality](/influxdb/v2.0/reference/glossary/#series-cardinality) (too many series) may be causing memory issues.
Take steps to understand and resolve high series cardinality.
1. [Learn the causes of high cardinality](#learn-the-causes-of-high-series-cardinality)
2. [Measure series cardinality](#measure-series-cardinality)
3. [Resolve high cardinality](#resolve-high-cardinality)
## Learn the causes of high series cardinality
{{% oss-only %}}
IndexDB indexes the following data elements to speed up reads:
- [measurement](/influxdb/v2.0/reference/glossary/#measurement)
- [tags](/influxdb/v2.0/reference/glossary/#tag)
{{% /oss-only %}}
{{% cloud-only %}}
IndexDB indexes the following data elements to speed up reads:
- [measurement](/influxdb/v2.0/reference/glossary/#measurement)
- [tags](/influxdb/v2.0/reference/glossary/#tag)
- [field keys](/influxdb/cloud/reference/glossary/#field-key)
{{% /cloud-only %}}
Each unique set of indexed data elements forms a [series key](/influxdb/v2.0/reference/glossary/#series-key).
[Tags](/influxdb/v2.0/reference/glossary/#tag) containing highly variable information like unique IDs, hashes, and random strings lead to a large number of [series](/influxdb/v2.0/reference/glossary/#series), also known as high [series cardinality](/influxdb/v2.0/reference/glossary/#series-cardinality).
High series cardinality is a primary driver of high memory usage for many database workloads.
## Measure series cardinality
Use the following to measure series cardinality of your buckets:
- [`influxdb.cardinality()`](/{{< latest "flux" >}}/stdlib/influxdata/influxdb/cardinality): Flux function that returns the number of unique [series keys](/influxdb/v2.0/reference/glossary/#series) in your data.
- [`SHOW SERIES CARDINALITY`](/influxdb/v2.0/query_language/spec/#show-series-cardinality): InfluxQL command that returns the number of unique [series keys](/influxdb/v2.0/reference/glossary/#series) in your data.
## Resolve high cardinality
To resolve high series cardinality, complete the following steps (for multiple buckets if applicable):
1. [Review tags](#review-tags).
2. [Adjust your schema](#adjust-your-schema).
2. [Improve your schema](#improve-your-schema).
3. [Delete high cardinality data](#delete-data-to-reduce-high-cardinality).
## Review tags
@ -80,38 +117,14 @@ cardinalityByTag(bucket: "example-bucket")
|> count()
```
These queries should help to identify the sources of high cardinality in each of your buckets. To determine which specific tags are growing, check the cardinality again after 24 hours to see if one or more tags have grown significantly.
These queries should help identify the sources of high cardinality in each of your buckets. To determine which specific tags are growing, check the cardinality again after 24 hours to see if one or more tags have grown significantly.
## Adjust your schema
## Improve your schema
Usually, resolving high cardinality is as simple as changing a tag with many unique values to a field. Review the following potential solutions for resolving high cardinality:
To minimize cardinality in the future, design your schema for easy and performant querying.
Review [best practices for schema design](/influxdb/v2.0/write-data/best-practices/schema-design/).
- Delete data to reduce high cardinality
- Design schema for read performance
## Delete data to reduce high cardinality
### Delete data to reduce high cardinality
Consider whether you need the data causing high cardinality. In some cases, you may decide you no longer need this data, in which case you may choose to [delete the whole bucket](/influxdb/v2.0/organizations/buckets/delete-bucket/) or [delete a range of data](/influxdb/v2.0/write-data/delete-data/).
### Design schema for read performance
Tags are valuable for indexing, so during a query, the query engine doesn't need to scan every single record in a bucket. However, too many indexes may create performance problems. The trick is to create a middle ground between scanning and indexing.
For example, if you query for specific user IDs with thousands of users, a simple query like this, where `userId` is a field, requires InfluxDB to scan every row for the `userId`:
```js
from(bucket: "example-bucket")
|> range(start: -7d)
|> filter(fn: (r) => r._field == "userId" and r._value == "abcde")
```
If you include a tag in your schema that can be reasonably indexed, such as a `company` tag, you can reduce the number of rows scanned and retrieve data more quickly:
```js
from(bucket: "example-bucket")
|> range(start: -7d)
|> filter(fn: (r) => r.company == "Acme")
|> filter(fn: (r) => r._field == "userId" and r._value == "abcde")
```
Consider tags that can be reasonably indexed to make your queries more performant. For more guidelines to consider, see [InfluxDB schema design](/influxdb/v2.0/write-data/best-practices/schema-design/).
Consider whether you need the data that is causing high cardinality.
If you no longer need this data, you can [delete the whole bucket](/influxdb/v2.0/organizations/buckets/delete-bucket/) or [delete a range of data](/influxdb/v2.0/write-data/delete-data/).

View File

@ -1,7 +1,7 @@
---
title: InfluxDB schema design
description: >
Improve InfluxDB schema design and data layout to reduce high cardinality and make your data more performant.
Design your schema for simpler and more performant queries.
menu:
influxdb_2_0:
name: Schema design
@ -9,220 +9,239 @@ menu:
parent: write-best-practices
---
Each InfluxDB use case is unique and your [schema](/influxdb/v2.0/reference/glossary/#schema) design reflects the uniqueness. We recommend the following design guidelines for most use cases:
Design your [schema](/influxdb/v2.0/reference/glossary/#schema) for simpler and more performant queries.
Follow design guidelines to make your schema easy to query.
Learn how these guidelines lead to more performant queries.
- [Where to store data (tag or field)](#where-to-store-data-tags-or-fields)
- [Avoid too many series](#avoid-too-many-series)
- [Use recommended naming conventions](#use-recommended-naming-conventions)
<!-- - [Recommendations for managing shard group duration](#shard-group-duration-management)
-->
- [Design to query](#design-to-query)
- [Keep measurements and keys simple](#keep-measurements-and-keys-simple)
- [Use tags and fields](#use-tags-and-fields)
- [Use fields for unique and numeric data](#use-fields-for-unique-and-numeric-data)
- [Use tags to improve query performance](#use-tags-to-improve-query-performance)
- [Keep tags simple](#keep-tags-simple)
{{% note %}}
Follow these guidelines to minimize high series cardinality and make your data more performant.
Good schema design can prevent high series cardinality, resulting in better performing queries. If you notice data reads and writes slowing down or want to learn how cardinality affects performance, see how to [resolve high cardinality](/influxdb/v2.0/write-data/best-practices/resolve-high-cardinality/).
{{% /note %}}
## Where to store data (tag or field)
## Design to query
[Tags](/influxdb/v2.0/reference/glossary/#tag) are indexed and [fields](/influxdb/v2.0/reference/glossary/#field) are not.
This means that querying by tags is more performant than querying by fields.
The schemas below demonstrate [measurements](/influxdb/v2.0/reference/glossary/#measurement), [tag keys](/influxdb/v2.0/reference/glossary/#tag-key), and [field keys](/influxdb/v2.0/reference/glossary/#field-key) that are easy to query.
In general, your queries should guide what gets stored as a tag and what gets stored as a field:
| measurement | tag key | tag key | field key | field key |
|----------------------|-----------|---------|-----------|-------------|
| airSensor | sensorId | station | humidity | temperature |
| waterQualitySensor | sensorId | station | pH | temperature |
- Store commonly-queried meta data in tags.
- Store data in fields if each data point contains a different value.
- Store numeric values as fields ([tag values](/influxdb/v2.0/reference/glossary/#tag-value) only support string values).
The `airSensor` and `waterQualitySensor` schemas illustrate the following guidelines:
- Each measurement is a simple name that describes a schema.
- Keys [don't repeat within a schema](#avoid-duplicate-names-for-tags-and-fields).
- Keys [don't use reserved keywords or special characters](#avoid-keywords-and-special-characters-in-keys).
- Tags (`sensorId` and `station`) [store metadata common across many data points](#use-tags-to-improve-query-performance).
- Fields (`humidity`, `pH`, and `temperature`) [store numeric data](#use-fields-for-unique-and-numeric-data).
- Fields [store unique or highly variable](#use-fields-for-unique-and-numeric-data) data.
- Measurements and keys [don't contain data](#keep-measurements-and-keys-simple); tag values and field values will store data.
## Avoid too many series
[Tags](/influxdb/v2.0/reference/glossary/#tag) containing highly variable information like unique IDs, hashes, and random strings lead to a large number of [series](/influxdb/v2.0/reference/glossary/#series), also known as high [series cardinality](/influxdb/v2.0/reference/glossary/#series-cardinality).
High series cardinality is a primary driver of high memory usage for many database workloads.
InfluxDB uses measurements and tags to create indexes and speed up reads. However, when too many indexes created, both writes and reads may start to slow down. Therefore, if a system has memory constraints, consider storing high-cardinality data as a field rather than a tag.
{{% note %}}
If reads and writes to InfluxDB start to slow down, you may have high series cardinality (too many series). See how to [resolve high cardinality](/influxdb/v2.0/write-data/best-practices/resolve-high-cardinality/).
{{% /note %}}
## Use recommended naming conventions
Use the following conventions when naming your tag and field keys:
- [Avoid keywords in tag and field names](#avoid-keywords-as-tag-or-field-names)
- [Avoid the same tag and field name](#avoid-the-same-name-for-a-tag-and-a-field)
- [Avoid encoding data in measurement names](#avoid-encoding-data-in-measurement-names)
- [Avoid more than one piece of information in one tag](#avoid-putting-more-than-one-piece-of-information-in-one-tag)
### Avoid keywords as tag or field names
Not required, but simplifies writing queries because you won't have to wrap tag or field names in double quotes.
See [Flux keywords](/{{< latest "flux" >}}/spec/lexical-elements/#keywords) to avoid.
Also, if a tag or field name contains non-alphanumeric characters, you must use [bracket notation](/{{< latest "flux" >}}/data-types/composite/record/#bracket-notation) in Flux.
### Avoid the same name for a tag and a field
Avoid using the same name for a tag and field key, which may result in unexpected behavior when querying data.
### Avoid encoding data in measurement names
InfluxDB queries merge data that falls within the same [measurement](/influxdb/v2.0/reference/glossary/#measurement), so it's better to differentiate data with [tags](/influxdb/v2.0/reference/glossary/#tag) than with detailed measurement names. If you encode data in a measurement name, you must use a regular expression to query the data, making some queries more complicated.
#### Example line protocol schemas
Consider the following schema represented by line protocol.
The following points (formatted as line protocol) use the `airSensor` and `waterQualitySensor` schemas:
```
Schema 1 - Data encoded in the measurement name
airSensor,sensorId=A0100,station=Harbor humidity=35.0658,temperature=21.667 1636729543000000000
waterQualitySensor,sensorId=W0101,station=Harbor pH=6.1,temperature=16.103 1472515200000000000
```
### Keep measurements and keys simple
Store data in [tag values](/influxdb/v2.0/reference/glossary/#tag-value) or [field values](/influxdb/v2.0/reference/glossary/#field-value), not in [tag keys](/influxdb/v2.0/reference/glossary/#tag-key), [field keys](/influxdb/v2.0/reference/glossary/#field-key), or [measurements](/influxdb/v2.0/reference/glossary/#measurement).
If you design your schema to store data in tag and field values,
your queries will be easier to write and more efficient.
{{% oss-only %}}
In addition, you'll keep cardinality low by not creating measurements and keys as you write data.
To learn more about the performance impact of high series cardinality, see how to [resolve high cardinality](/influxdb/v2.0/write-data/best-practices/resolve-high-cardinality/).
{{% /oss-only %}}
#### Compare schemas
Compare the following valid schemas represented by line protocol.
**Recommended**: the following schema stores metadata in separate `crop`, `plot`, and `region` tags. The `temp` field contains variable numeric data.
##### {id="good-measurements-schema"}
```
Good Measurements schema - Data encoded in tags (recommended)
-------------
weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000
```
**Not recommended**: the following schema stores multiple attributes (`crop`, `plot` and `region`) concatenated (`blueberries.plot-1.north`) within the measurement, similar to Graphite metrics.
##### {id="bad-measurements-schema"}
```
Bad Measurements schema - Data encoded in the measurement (not recommended)
-------------
blueberries.plot-1.north temp=50.1 1472515200000000000
blueberries.plot-2.midwest temp=49.8 1472515200000000000
```
The long measurement names (`blueberries.plot-1.north`) with no tags are similar to Graphite metrics.
Encoding the `plot` and `region` in the measurement name makes the data more difficult to query.
For example, calculating the average temperature of both plots 1 and 2 is not possible with schema 1.
Compare this to schema 2:
**Not recommended**: the following schema stores multiple attributes (`crop`, `plot` and `region`) concatenated (`blueberries.plot-1.north`) within the field key.
##### {id="bad-keys-schema"}
```
Schema 2 - Data encoded in tags
Bad Keys schema - Data encoded in field keys (not recommended)
-------------
weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000
weather_sensor blueberries.plot-1.north.temp=50.1 1472515200000000000
weather_sensor blueberries.plot-2.midwest.temp=49.8 1472515200000000000
```
#### Flux example to query schemas
#### Compare queries
Use Flux to calculate the average `temp` for blueberries in the `north` region:
Compare the following queries of the [_Good Measurements_](#good-measurements-schema) and [_Bad Measurements_](#bad-measurements-schema) schemas.
The [Flux](/{{< latest "flux" >}}/) queries calculate the average `temp` for blueberries in the `north` region
**Easy to query**: [_Good Measurements_](#good-measurements-schema) data is easily filtered by `region` tag values, as in the following example.
```js
// Schema 1 - Query for data encoded in the measurement name
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement =~ /\.north$/ and r._field == "temp")
|> mean()
// Schema 2 - Query for data encoded in tags
// Query *Good Measurements*, data stored in separate tags (recommended)
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement == "weather_sensor" and r.region == "north" and r._field == "temp")
|> mean()
```
In schema 1, we see that querying the `plot` and `region` in the measurement name makes the data more difficult to query.
### Avoid putting more than one piece of information in one tag
Splitting a single tag with multiple pieces into separate tags simplifies your queries and reduces the need for regular expressions.
#### Example line protocol schemas
Consider the following schema represented by line protocol.
**Difficult to query**: [_Bad Measurements_](#bad-measurements-schema) requires regular expressions to extract `plot` and `region` from the measurement, as in the following example.
```js
// Query *Bad Measurements*, data encoded in the measurement (not recommended)
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement =~ /\.north$/ and r._field == "temp")
|> mean()
```
Schema 1 - Multiple data encoded in a single tag
Complex measurements make some queries impossible. For example, calculating the average temperature of both plots is not possible with the [_Bad Measurements_](#bad-measurements-schema) schema.
#### Keep keys simple
In addition to keeping your keys free of data, follow these additional guidelines to make them easier to query:
- [Avoid keywords and special characters](#avoid-keywords-and-special-characters-in-keys)
- [Avoid duplicate names for tags and fields](#avoid-duplicate-names-for-tags-and-fields)
##### Avoid keywords and special characters in keys
To simplify query writing, don't include reserved keywords or special characters in tag and field keys.
If you use [Flux keywords](/{{< latest "flux" >}}/spec/lexical-elements/#keywords) in keys,
then you'll have to wrap the keys in double quotes.
If you use non-alphanumeric characters in keys, then you'll have to use [bracket notation](/{{< latest "flux" >}}/data-types/composite/record/#bracket-notation) in [Flux]((/{{< latest "flux" >}}/).
##### Avoid duplicate names for tags and fields
Avoid using the same name for a [tag key](/influxdb/v2.0/reference/glossary/#tag-key) and a [field key](/influxdb/v2.0/reference/glossary/#field-key) within the same schema.
Your query results may be unpredictable if you have a tag and a field with the same name.
{{% cloud-only %}}
{{% note %}}
Use [explicit bucket schemas]() to enforce unique tag and field keys within a schema.
{{% /note %}}
{{% /cloud-only %}}
## Use tags and fields
[Tag values](/influxdb/v2.0/reference/glossary/#tag-value) are indexed and [field values](/influxdb/v2.0/reference/glossary/#field-value) aren't.
This means that querying tags is more performant than querying fields.
Your queries should guide what you store in tags and what you store in fields.
### Use fields for unique and numeric data
- Store unique or frequently changing values as field values.
- Store numeric values as field values. ([Tags](/influxdb/v2.0/reference/glossary/#tag-value) only store strings).
### Use tags to improve query performance
- Store values as tag values if they can be reasonably indexed.
- Store values as [tag values](/influxdb/v2.0/reference/glossary/#tag-value) if the values are used in [filter()]({{< latest "flux" >}}/universe/filter/) or [group()](/{{< latest "flux" >}}/universe/group/) functions.
- Store values as tag values if the values are shared across multiple data points, i.e. metadata about the field.
Because InfluxDB indexes tags, the query engine doesn't need to scan every record in a bucket to locate a tag value.
For example, consider a bucket that stores data about thousands of users. With `userId` stored in a [field](/influxdb/v2.0/reference/glossary/#field), a query for user `abcde` requires InfluxDB to scan `userId` in every row.
```js
from(bucket: "example-bucket")
|> range(start: -7d)
|> filter(fn: (r) => r._field == "userId" and r._value == "abcde")
```
To retrieve data more quickly, filter on a tag to reduce the number of rows scanned.
The tag should store data that can be reasonably indexed.
The following query filters by the `company` tag to reduce the number of rows scanned for `userId`.
```js
from(bucket: "example-bucket")
|> range(start: -7d)
|> filter(fn: (r) => r.company == "Acme")
|> filter(fn: (r) => r._field == "userId" and r._value == "abcde")
```
### Keep tags simple
Use one tag for each data attribute.
If your source data contains multiple data attributes in a single parameter,
split each attribute into its own tag.
When each tag represents one attribute (not multiple concatenated attributes) of your data,
you'll reduce the need for regular expressions in your queries.
Without regular expressions, your queries will be easier to write and more performant.
#### Compare schemas
Compare the following valid schemas represented by line protocol.
**Recommended**: the following schema splits location data into `plot` and `region` tags.
##### {id="good-tags-schema"}
```
Good Tags schema - Data encoded in multiple tags
-------------
weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000
```
**Not recommended**: the following schema stores multiple attributes (`plot` and `region`) concatenated within the `location` tag value (`plot-1.north`).
##### {id="bad-tags-schema"}
```
Bad Tags schema - Multiple data encoded in a single tag
-------------
weather_sensor,crop=blueberries,location=plot-1.north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,location=plot-2.midwest temp=49.8 1472515200000000000
```
The schema 1 data encodes multiple parameters, the `plot` and `region`, into a long tag value (`plot-1.north`).
Compare this to schema 2:
#### Compare queries
```
Schema 2 - Data encoded in multiple tags
-------------
weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000
```
Compare queries of the [_Good Tags_](#good-tags-schema) and [_Bad Tags_](#bad-tags-schema) schemas.
The [Flux](/{{< latest "flux" >}}/) queries calculate the average `temp` for blueberries in the `north` region.
Schema 2 is preferable because, with multiple tags, you don't need a regular expression.
#### Flux example to query schemas
The following Flux examples show how to calculate the average `temp` for blueberries in the `north` region; both for schema 1 and schema 2.
**Easy to query**: [_Good Tags_](#good-tags-schema) data is easily filtered by `region` tag values, as in the following example.
```js
// Schema 1 - Query for multiple data encoded in a single tag
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement == "weather_sensor" and r.location =~ /\.north$/ and r._field == "temp")
|> mean()
// Schema 2 - Query for data encoded in multiple tags
// Query *Good Tags* schema, data encoded in multiple tags
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement == "weather_sensor" and r.region == "north" and r._field == "temp")
|> mean()
```
In schema 1, we see that querying the `plot` and `region` in a single tag makes the data more difficult to query.
<!--
## Shard group duration management
**Difficult to query**: [_Bad Tags_](#bad-tags-schema) requires regular expressions to parse the complex `location` values, as in the following example.
InfluxDB stores data in shard groups.
Shard groups are organized by [buckets](/influxdb/v2.0/reference/glossary/#bucket) and store data with timestamps that fall within a specific time interval called the [shard duration](/influxdb/v1.8/concepts/glossary/#shard-duration).
If no shard group duration is provided, the shard group duration is determined by the RP [duration](/influxdb/v1.8/concepts/glossary/#duration) at the time the RP is created. The default values are:
| RP Duration | Shard Group Duration |
|---|---|
| < 2 days | 1 hour |
| >= 2 days and <= 6 months | 1 day |
| > 6 months | 7 days |
The shard group duration is also configurable per RP.
To configure the shard group duration, see [Retention Policy Management](/influxdb/v1.8/query_language/manage-database/#retention-policy-management).
### Shard group duration tradeoffs
Determining the optimal shard group duration requires finding the balance between:
- Better overall performance with longer shards
- Flexibility provided by shorter shards
#### Long shard group duration
Longer shard group durations let InfluxDB store more data in the same logical location.
This reduces data duplication, improves compression efficiency, and improves query speed in some cases.
#### Short shard group duration
Shorter shard group durations allow the system to more efficiently drop data and record incremental backups.
When InfluxDB enforces an RP it drops entire shard groups, not individual data points, even if the points are older than the RP duration.
A shard group will only be removed once a shard group's duration *end time* is older than the RP duration.
For example, if your RP has a duration of one day, InfluxDB will drop an hour's worth of data every hour and will always have 25 shard groups. One for each hour in the day and an extra shard group that is partially expiring, but isn't removed until the whole shard group is older than 24 hours.
>**Note:** A special use case to consider: filtering queries on schema data (such as tags, series, measurements) by time. For example, if you want to filter schema data within a one hour interval, you must set the shard group duration to 1h. For more information, see [filter schema data by time](/influxdb/v1.8/query_language/explore-schema/#filter-meta-queries-by-time).
### Shard group duration recommendations
The default shard group durations work well for most cases. However, high-throughput or long-running instances will benefit from using longer shard group durations.
Here are some recommendations for longer shard group durations:
| RP Duration | Shard Group Duration |
|---|---|
| <= 1 day | 6 hours |
| > 1 day and <= 7 days | 1 day |
| > 7 days and <= 3 months | 7 days |
| > 3 months | 30 days |
| infinite | 52 weeks or longer |
> **Note:** Note that `INF` (infinite) is not a [valid shard group duration](/influxdb/v1.8/query_language/manage-database/#retention-policy-management).
In extreme cases where data covers decades and will never be deleted, a long shard group duration like `1040w` (20 years) is perfectly valid.
Other factors to consider before setting shard group duration:
* Shard groups should be twice as long as the longest time range of the most frequent queries
* Shard groups should each contain more than 100,000 [points](/influxdb/v1.8/concepts/glossary/#point) per shard group
* Shard groups should each contain more than 1,000 points per [series](/influxdb/v1.8/concepts/glossary/#series)
#### Shard group duration for backfilling
Bulk insertion of historical data covering a large time range in the past creates a large number of shards at once.
The concurrent access and overhead of writing to hundreds or thousands of shards can quickly lead to slow performance and memory exhaustion.
When writing historical data, consider your ingest rate limits, volume, and existing data schema affects performance and memory.
-->
```js
// Query *Bad Tags* schema, multiple data encoded in a single tag
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement == "weather_sensor" and r.location =~ /\.north$/ and r._field == "temp")
|> mean()
```

View File

@ -3,7 +3,7 @@ title: Troubleshoot issues writing data
seotitle: Troubleshoot issues writing data
list_title: Troubleshoot issues writing data
weight: 106 >
Troubleshoot issues writing data. Find response codes for failed writes. Discover how writes fail, from exceeding rate or payload limits, to syntax errors and schema conflicts.
Troubleshoot issues writing data. Find response codes for failed writes. Discover how writes fail, from exceeding rate or payload limits, to syntax errors and schema conflicts. Find suggestions to fix failures.
menu:
influxdb_2_0:
name: Troubleshoot issues
@ -16,33 +16,38 @@ related:
---
Learn how to handle and recover from errors when writing to InfluxDB.
- [Discover common failure scenarios](#common-failure-scenarios)
- [HTTP status codes](#http-status-codes)
- [Discover common failure scenarios](#discover-common-failure-scenarios)
- [Review HTTP status codes](#review-http-status-codes)
- [Troubleshoot failures](#troubleshoot-failures)
## Common failure scenarios
## Discover common failure scenarios
Write requests made to InfluxDB may fail for a number of reasons.
Common failure scenarios that return an HTTP `4xx` or `5xx` error status code include the following:
- Request exceeded a rate limit.
- API token was invalid.
- API token was invalid. . See how to [manage API tokens](/influxdb/v2.0/security/tokens/).
- Payload size was too large.
- Client or server reached a timeout threshold.
- Size of the data payload was too large.
- Data was not formatted correctly.
Writes may fail partially or completely even though InfluxDB returns an HTTP `2xx` status code for a valid request. For example, a partial write may occur when InfluxDB writes all points that conform to the bucket schema, but rejects points that have the wrong data type in a field.
To find the causes of a specific error, [review HTTP status codes](#review-http-status-codes).
## HTTP status codes
### Troubleshoot partial writes
Writes may fail partially or completely even though InfluxDB returns an HTTP `2xx` status code for a valid request.
For example, a partial write may occur when InfluxDB writes all points that conform to the bucket schema, but rejects points that have the wrong data type in a field.
To resolve partial writes and rejected points, see [troubleshoot failures](#troubleshoot-failures).
## Review HTTP status codes
InfluxDB uses conventional HTTP status codes to indicate the success or failure of a request.
Write requests return the following status codes:
- `204` **Success**: InfluxDB validated the request data format and accepted the data for writing to the bucket.
{{% note %}}
{{% note %}}
`204` doesn't indicate a successful write operation since writes are asynchronous.
If some of your data did not write to the bucket, see how to [troubleshoot rejected points](#troubleshoot-rejected-points).
{{% /note %}}
{{% /note %}}
- `400` **Bad request**: The [line protocol](/influxdb/v2.0/reference/syntax/line-protocol/) data in the request was malformed.
The response body contains the first malformed line in the data. All request data was rejected and not written.
@ -51,7 +56,7 @@ Write requests return the following status codes:
- [API token](/influxdb/v2.0/api-guide/api_intro/#authentication) value is missing from the header.
- API token does not have sufficient permissions to write to the organization and the bucket. For more information about token types and permissions, see [Manage API tokens](/influxdb/v2.0/security/tokens/)
- `404` **Not found**: A requested resource (e.g. an organization or bucket) was not found. The response body contains the requested resource type, e.g. "organization", and resource name.
- `413` **Request entity too large**: All request data was rejected and not written. InfluxDB OSS only returns this error if the [Go (golang) `ioutil.ReadAll()`](https://pkg.go.dev/io/ioutil#ReadAll) function raises an error.
- `413` **Request entity too large**: All request data was rejected and not written. InfluxDB OSS only returns this error if the [Go (golang) `ioutil.ReadAll()`](https://pkg.go.dev/io/ioutil#ReadAll) function raises an error.
- `500` **Internal server error**: Default HTTP status for an error.
- `503` **Service unavailable**: Server is temporarily unavailable to accept writes. The `Retry-After` header describes when to try the write again.
@ -66,7 +71,7 @@ If you notice data is missing in your bucket, do the following:
- Verify the data types match other data points with the same series.
For example, did you attempt to write `string` data to an `int` field?
- Verify the timestamps match the [precision parameter](/influxdb/v2.0/write-data/#timestamp-precision).
- Minimize payload size and network errors by [optimizing writes](/influxdb/v2.0/write-data/best-practices/optimize-writes/)
- Minimize payload size and network errors by [optimizing writes](/influxdb/v2.0/write-data/best-practices/optimize-writes/)
### Troubleshoot rejected points

View File

@ -2,7 +2,7 @@
title: Restore data
seotitle: Restore data in InfluxDB
description: >
Use the `influxd restore` command to restore backup data and metadata from InfluxDB.
Use the `influx restore` command to restore backup data and metadata from InfluxDB.
menu:
influxdb_2_1:
parent: Back up & restore data

View File

@ -259,9 +259,9 @@ Related entries: [bucket](#bucket)
### data type
A data type is defined by the values it can take, the programming language used, or the operations that can be performed on it.
A data type is defined by the values it can take, the programming language used, or the operations that can be performed on it.
InfluxDB supports the following data types:
InfluxDB supports the following data types:
| Data type | Alias/annotation |
| :--------------- | :----------------- |
| string | |
@ -798,6 +798,11 @@ A tuple of named values represented using a record type.
Regular expressions (regex or regexp) are patterns used to match character combinations in strings.
### rejected points
In a batch of data, points that InfluxDB couldn't write to a bucket.
Field type conflicts are a common cause of rejected points.
### retention period
The duration of time that a bucket retains data.
Points with timestamps older than their bucket's retention period are dropped.
@ -1092,13 +1097,22 @@ Related entries: [point](#point), [unix timestamp](#unix-timestamp), [RFC3339 ti
### token
Tokens (or API tokens) verify user and organization permissions in InfluxDB.
There are different types of athentication tokens:
There are different types of API tokens:
- **Operator token:** grants full read and write access to all resources in **all organizations in InfluxDB OSS 2.x**.
_InfluxDB Cloud does not support Operator tokens._
{{% oss-only %}}
- **Operator token:** grants full read and write access to all resources in **all organizations in InfluxDB OSS 2.x**. _InfluxDB Cloud does not support Operator tokens._
- **All-Access token:** grants full read and write access to all resources in an organization.
- **Read/Write token:** grants read or write access to specific resources in an organization.
{{% /oss-only %}}
{{% cloud-only %}}
- **All-Access token:** grants full read and write access to all resources in an organization.
- **Read/Write token:** grants read or write access to specific resources in an organization.
{{% /cloud-only %}}
Related entries: [Create a token](/influxdb/v2.1/security/tokens/create-token/).
### tracing

View File

@ -70,6 +70,23 @@ The `_monitoring` system bucket stores InfluxDB data used to
- **\_source_timestamp:** original timestamp of the queried data
- **\_status_timestamp:** timestamp when the status (`_level`) was evaluated
- _other fields inherited from queried data_
- {{% cloud-only %}}
**rejected_points** _(measurement)_
- **tags:**
- **bucket:** ID of the bucket targeted in the write request
- **reason:** brief description of why InfluxDB rejected the point
- **field:** field name of the point (present if the point contained a field)
- **measurement:** measurement of the point (present if the point contained a measurement)
- **gotType:** InfluxDB field type in the point (present if type mismatch)
- **wantType:** InfluxDB field type in the bucket schema (present if type mismatch)
- **fields:**
- **_field:** `count` (for data type and schema conflicts) or `error` (for parsing errors)
- **_value:** `1` if `_field: "count"` or error details if `_field: "error"`
- **timestamp:** time the rejected point was logged
{{% /cloud-only %}}
## \_tasks system bucket
The `_tasks` system bucket stores data related to [InfluxDB task](/influxdb/v2.1/process-data/) executions.

View File

@ -58,6 +58,8 @@ influx auth create \
--all-access
```
{{% oss-only %}}
#### Create an Operator token
Create an Operator token to grant permissions to all resources in all organizations.
@ -68,6 +70,8 @@ influx auth create \
--operator
```
{{% /oss-only %}}
#### Create a token with specified read permissions
```sh

View File

@ -71,7 +71,11 @@ To filter tokens by user, include `userID` as a query parameter in your request.
{{% get-shared-text "api/v2.0/auth/oss/tokens-view-filter.sh" %}}
```
{{% oss-only %}}
[***Operator tokens***](/{{% latest "influxdb" %}}/security/tokens/#operator-token) have access to all organizations' authorizations.
To filter authorizations by organization when using an operator token, include an `org` or `orgID` query parameter in your request.
{{% oss-only %}}
See the [`/authorizations` endpoint documentation](/influxdb/v2.1/api/#tag/Authorizations) for more information about available parameters.

View File

@ -10,11 +10,48 @@ menu:
---
If reads and writes to InfluxDB have started to slow down, high [series cardinality](/influxdb/v2.1/reference/glossary/#series-cardinality) (too many series) may be causing memory issues.
Take steps to understand and resolve high series cardinality.
1. [Learn the causes of high cardinality](#learn-the-causes-of-high-series-cardinality)
2. [Measure series cardinality](#measure-series-cardinality)
3. [Resolve high cardinality](#resolve-high-cardinality)
## Learn the causes of high series cardinality
{{% oss-only %}}
IndexDB indexes the following data elements to speed up reads:
- [measurement](/influxdb/v2.1/reference/glossary/#measurement)
- [tags](/influxdb/v2.1/reference/glossary/#tag)
{{% /oss-only %}}
{{% cloud-only %}}
IndexDB indexes the following data elements to speed up reads:
- [measurement](/influxdb/v2.1/reference/glossary/#measurement)
- [tags](/influxdb/v2.1/reference/glossary/#tag)
- [field keys](/influxdb/cloud/reference/glossary/#field-key)
{{% /cloud-only %}}
Each unique set of indexed data elements forms a [series key](/influxdb/v2.1/reference/glossary/#series-key).
[Tags](/influxdb/v2.1/reference/glossary/#tag) containing highly variable information like unique IDs, hashes, and random strings lead to a large number of [series](/influxdb/v2.1/reference/glossary/#series), also known as high [series cardinality](/influxdb/v2.1/reference/glossary/#series-cardinality).
High series cardinality is a primary driver of high memory usage for many database workloads.
## Measure series cardinality
Use the following to measure series cardinality of your buckets:
- [`influxdb.cardinality()`](/{{< latest "flux" >}}/stdlib/influxdata/influxdb/cardinality): Flux function that returns the number of unique [series keys](/influxdb/v2.1/reference/glossary/#series) in your data.
- [`SHOW SERIES CARDINALITY`](/influxdb/v2.1/query_language/spec/#show-series-cardinality): InfluxQL command that returns the number of unique [series keys](/influxdb/v2.1/reference/glossary/#series) in your data.
## Resolve high cardinality
To resolve high series cardinality, complete the following steps (for multiple buckets if applicable):
1. [Review tags](#review-tags).
2. [Adjust your schema](#adjust-your-schema).
2. [Improve your schema](#improve-your-schema).
3. [Delete high cardinality data](#delete-data-to-reduce-high-cardinality).
## Review tags
@ -80,38 +117,14 @@ cardinalityByTag(bucket: "example-bucket")
|> count()
```
These queries should help to identify the sources of high cardinality in each of your buckets. To determine which specific tags are growing, check the cardinality again after 24 hours to see if one or more tags have grown significantly.
These queries should help identify the sources of high cardinality in each of your buckets. To determine which specific tags are growing, check the cardinality again after 24 hours to see if one or more tags have grown significantly.
## Adjust your schema
## Improve your schema
Usually, resolving high cardinality is as simple as changing a tag with many unique values to a field. Review the following potential solutions for resolving high cardinality:
To minimize cardinality in the future, design your schema for easy and performant querying.
Review [best practices for schema design](/influxdb/v2.1/write-data/best-practices/schema-design/).
- Delete data to reduce high cardinality
- Design schema for read performance
## Delete data to reduce high cardinality
### Delete data to reduce high cardinality
Consider whether you need the data causing high cardinality. In some cases, you may decide you no longer need this data, in which case you may choose to [delete the whole bucket](/influxdb/v2.1/organizations/buckets/delete-bucket/) or [delete a range of data](/influxdb/v2.1/write-data/delete-data/).
### Design schema for read performance
Tags are valuable for indexing, so during a query, the query engine doesn't need to scan every single record in a bucket. However, too many indexes may create performance problems. The trick is to create a middle ground between scanning and indexing.
For example, if you query for specific user IDs with thousands of users, a simple query like this, where `userId` is a field, requires InfluxDB to scan every row for the `userId`:
```js
from(bucket: "example-bucket")
|> range(start: -7d)
|> filter(fn: (r) => r._field == "userId" and r._value == "abcde")
```
If you include a tag in your schema that can be reasonably indexed, such as a `company` tag, you can reduce the number of rows scanned and retrieve data more quickly:
```js
from(bucket: "example-bucket")
|> range(start: -7d)
|> filter(fn: (r) => r.company == "Acme")
|> filter(fn: (r) => r._field == "userId" and r._value == "abcde")
```
Consider tags that can be reasonably indexed to make your queries more performant. For more guidelines to consider, see [InfluxDB schema design](/influxdb/v2.1/write-data/best-practices/schema-design/).
Consider whether you need the data that is causing high cardinality.
If you no longer need this data, you can [delete the whole bucket](/influxdb/v2.1/organizations/buckets/delete-bucket/) or [delete a range of data](/influxdb/v2.1/write-data/delete-data/).

View File

@ -1,7 +1,7 @@
---
title: InfluxDB schema design
description: >
Improve InfluxDB schema design and data layout to reduce high cardinality and make your data more performant.
Design your schema for simpler and more performant queries.
menu:
influxdb_2_1:
name: Schema design
@ -9,220 +9,238 @@ menu:
parent: write-best-practices
---
Each InfluxDB use case is unique and your [schema](/influxdb/v2.1/reference/glossary/#schema) design reflects the uniqueness. We recommend the following design guidelines for most use cases:
Design your [schema](/influxdb/v2.1/reference/glossary/#schema) for simpler and more performant queries.
Follow design guidelines to make your schema easy to query.
Learn how these guidelines lead to more performant queries.
- [Where to store data (tag or field)](#where-to-store-data-tags-or-fields)
- [Avoid too many series](#avoid-too-many-series)
- [Use recommended naming conventions](#use-recommended-naming-conventions)
<!-- - [Recommendations for managing shard group duration](#shard-group-duration-management)
-->
- [Design to query](#design-to-query)
- [Keep measurements and keys simple](#keep-measurements-and-keys-simple)
- [Use tags and fields](#use-tags-and-fields)
- [Use fields for unique and numeric data](#use-fields-for-unique-and-numeric-data)
- [Use tags to improve query performance](#use-tags-to-improve-query-performance)
- [Keep tags simple](#keep-tags-simple)
{{% note %}}
Follow these guidelines to minimize high series cardinality and make your data more performant.
Good schema design can prevent high series cardinality, resulting in better performing queries. If you notice data reads and writes slowing down or want to learn how cardinality affects performance, see how to [resolve high cardinality](/influxdb/v2.1/write-data/best-practices/resolve-high-cardinality/).
{{% /note %}}
## Where to store data (tag or field)
## Design to query
[Tags](/influxdb/v2.1/reference/glossary/#tag) are indexed and [fields](/influxdb/v2.1/reference/glossary/#field) are not.
This means that querying by tags is more performant than querying by fields.
The schemas below demonstrate [measurements](/influxdb/v2.1/reference/glossary/#measurement), [tag keys](/influxdb/v2.1/reference/glossary/#tag-key), and [field keys](/influxdb/v2.1/reference/glossary/#field-key) that are easy to query.
In general, your queries should guide what gets stored as a tag and what gets stored as a field:
| measurement | tag key | tag key | field key | field key |
|----------------------|-----------|---------|-----------|-------------|
| airSensor | sensorId | station | humidity | temperature |
| waterQualitySensor | sensorId | station | pH | temperature |
- Store commonly-queried meta data in tags.
- Store data in fields if each data point contains a different value.
- Store numeric values as fields ([tag values](/influxdb/v2.1/reference/glossary/#tag-value) only support string values).
The `airSensor` and `waterQualitySensor` schemas illustrate the following guidelines:
- Each measurement is a simple name that describes a schema.
- Keys [don't repeat within a schema](#avoid-duplicate-names-for-tags-and-fields).
- Keys [don't use reserved keywords or special characters](#avoid-keywords-and-special-characters-in-keys).
- Tags (`sensorId` and `station`) [store metadata common across many data points](#use-tags-to-improve-query-performance).
- Fields (`humidity`, `pH`, and `temperature`) [store numeric data](#use-fields-for-unique-and-numeric-data).
- Fields [store unique or highly variable](#use-fields-for-unique-and-numeric-data) data.
- Measurements and keys [don't contain data](#keep-measurements-and-keys-simple); tag values and field values will store data.
## Avoid too many series
[Tags](/influxdb/v2.1/reference/glossary/#tag) containing highly variable information like unique IDs, hashes, and random strings lead to a large number of [series](/influxdb/v2.1/reference/glossary/#series), also known as high [series cardinality](/influxdb/v2.1/reference/glossary/#series-cardinality).
High series cardinality is a primary driver of high memory usage for many database workloads.
InfluxDB uses measurements and tags to create indexes and speed up reads. However, when too many indexes created, both writes and reads may start to slow down. Therefore, if a system has memory constraints, consider storing high-cardinality data as a field rather than a tag.
{{% note %}}
If reads and writes to InfluxDB start to slow down, you may have high series cardinality (too many series). See how to [resolve high cardinality](/influxdb/v2.1/write-data/best-practices/resolve-high-cardinality/).
{{% /note %}}
## Use recommended naming conventions
Use the following conventions when naming your tag and field keys:
- [Avoid keywords in tag and field names](#avoid-keywords-as-tag-or-field-names)
- [Avoid the same tag and field name](#avoid-the-same-name-for-a-tag-and-a-field)
- [Avoid encoding data in measurement names](#avoid-encoding-data-in-measurement-names)
- [Avoid more than one piece of information in one tag](#avoid-putting-more-than-one-piece-of-information-in-one-tag)
### Avoid keywords as tag or field names
Not required, but simplifies writing queries because you won't have to wrap tag or field names in double quotes.
See [Flux keywords](/{{< latest "flux" >}}/spec/lexical-elements/#keywords) to avoid.
Also, if a tag or field name contains non-alphanumeric characters, you must use [bracket notation](/{{< latest "flux" >}}/data-types/composite/record/#bracket-notation) in Flux.
### Avoid the same name for a tag and a field
Avoid using the same name for a tag and field key, which may result in unexpected behavior when querying data.
### Avoid encoding data in measurement names
InfluxDB queries merge data that falls within the same [measurement](/influxdb/v2.1/reference/glossary/#measurement), so it's better to differentiate data with [tags](/influxdb/v2.1/reference/glossary/#tag) than with detailed measurement names. If you encode data in a measurement name, you must use a regular expression to query the data, making some queries more complicated.
#### Example line protocol schemas
Consider the following schema represented by line protocol.
The following points (formatted as line protocol) use the `airSensor` and `waterQualitySensor` schemas:
```
Schema 1 - Data encoded in the measurement name
airSensor,sensorId=A0100,station=Harbor humidity=35.0658,temperature=21.667 1636729543000000000
waterQualitySensor,sensorId=W0101,station=Harbor pH=6.1,temperature=16.103 1472515200000000000
```
### Keep measurements and keys simple
Store data in [tag values](/influxdb/v2.1/reference/glossary/#tag-value) or [field values](/influxdb/v2.1/reference/glossary/#field-value), not in [tag keys](/influxdb/v2.1/reference/glossary/#tag-key), [field keys](/influxdb/v2.1/reference/glossary/#field-key), or [measurements](/influxdb/v2.1/reference/glossary/#measurement). If you design your schema to store data in tag and field values,
your queries will be easier to write and more efficient.
{{% oss-only %}}
In addition, you'll keep cardinality low by not creating measurements and keys as you write data.
To learn more about the performance impact of high series cardinality, see how to [resolve high cardinality](/influxdb/v2.1/write-data/best-practices/resolve-high-cardinality/).
{{% /oss-only %}}
#### Compare schemas
Compare the following valid schemas represented by line protocol.
**Recommended**: the following schema stores metadata in separate `crop`, `plot`, and `region` tags. The `temp` field contains variable numeric data.
##### {id="good-measurements-schema"}
```
Good Measurements schema - Data encoded in tags (recommended)
-------------
weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000
```
**Not recommended**: the following schema stores multiple attributes (`crop`, `plot` and `region`) concatenated (`blueberries.plot-1.north`) within the measurement, similar to Graphite metrics.
##### {id="bad-measurements-schema"}
```
Bad Measurements schema - Data encoded in the measurement (not recommended)
-------------
blueberries.plot-1.north temp=50.1 1472515200000000000
blueberries.plot-2.midwest temp=49.8 1472515200000000000
```
The long measurement names (`blueberries.plot-1.north`) with no tags are similar to Graphite metrics.
Encoding the `plot` and `region` in the measurement name makes the data more difficult to query.
For example, calculating the average temperature of both plots 1 and 2 is not possible with schema 1.
Compare this to schema 2:
**Not recommended**: the following schema stores multiple attributes (`crop`, `plot` and `region`) concatenated (`blueberries.plot-1.north`) within the field key.
##### {id="bad-keys-schema"}
```
Schema 2 - Data encoded in tags
Bad Keys schema - Data encoded in field keys (not recommended)
-------------
weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000
weather_sensor blueberries.plot-1.north.temp=50.1 1472515200000000000
weather_sensor blueberries.plot-2.midwest.temp=49.8 1472515200000000000
```
#### Flux example to query schemas
#### Compare queries
Use Flux to calculate the average `temp` for blueberries in the `north` region:
Compare the following queries of the [_Good Measurements_](#good-measurements-schema) and [_Bad Measurements_](#bad-measurements-schema) schemas.
The [Flux](/{{< latest "flux" >}}/) queries calculate the average `temp` for blueberries in the `north` region
**Easy to query**: [_Good Measurements_](#good-measurements-schema) data is easily filtered by `region` tag values, as in the following example.
```js
// Schema 1 - Query for data encoded in the measurement name
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement =~ /\.north$/ and r._field == "temp")
|> mean()
// Schema 2 - Query for data encoded in tags
// Query *Good Measurements*, data stored in separate tags (recommended)
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement == "weather_sensor" and r.region == "north" and r._field == "temp")
|> mean()
```
In schema 1, we see that querying the `plot` and `region` in the measurement name makes the data more difficult to query.
### Avoid putting more than one piece of information in one tag
Splitting a single tag with multiple pieces into separate tags simplifies your queries and reduces the need for regular expressions.
#### Example line protocol schemas
Consider the following schema represented by line protocol.
**Difficult to query**: [_Bad Measurements_](#bad-measurements-schema) requires regular expressions to extract `plot` and `region` from the measurement, as in the following example.
```js
// Query *Bad Measurements*, data encoded in the measurement (not recommended)
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement =~ /\.north$/ and r._field == "temp")
|> mean()
```
Schema 1 - Multiple data encoded in a single tag
Complex measurements make some queries impossible. For example, calculating the average temperature of both plots is not possible with the [_Bad Measurements_](#bad-measurements-schema) schema.
#### Keep keys simple
In addition to keeping your keys free of data, follow these additional guidelines to make them easier to query:
- [Avoid keywords and special characters](#avoid-keywords-and-special-characters-in-keys)
- [Avoid duplicate names for tags and fields](#avoid-duplicate-names-for-tags-and-fields)
##### Avoid keywords and special characters in keys
To simplify query writing, don't include reserved keywords or special characters in tag and field keys.
If you use [Flux keywords](/{{< latest "flux" >}}/spec/lexical-elements/#keywords) in keys,
then you'll have to wrap the keys in double quotes.
If you use non-alphanumeric characters in keys, then you'll have to use [bracket notation](/{{< latest "flux" >}}/data-types/composite/record/#bracket-notation) in [Flux]((/{{< latest "flux" >}}/).
##### Avoid duplicate names for tags and fields
Avoid using the same name for a [tag key](/influxdb/v2.1/reference/glossary/#tag-key) and a [field key](/influxdb/v2.1/reference/glossary/#field-key) within the same schema.
Your query results may be unpredictable if you have a tag and a field with the same name.
{{% cloud-only %}}
{{% note %}}
Use [explicit bucket schemas]() to enforce unique tag and field keys within a schema.
{{% /note %}}
{{% /cloud-only %}}
## Use tags and fields
[Tag values](/influxdb/v2.1/reference/glossary/#tag-value) are indexed and [field values](/influxdb/v2.1/reference/glossary/#field-value) aren't.
This means that querying tags is more performant than querying fields.
Your queries should guide what you store in tags and what you store in fields.
### Use fields for unique and numeric data
- Store unique or frequently changing values as field values.
- Store numeric values as field values. ([Tags](/influxdb/v2.1/reference/glossary/#tag-value) only store strings).
### Use tags to improve query performance
- Store values as tag values if they can be reasonably indexed.
- Store values as [tag values](/influxdb/v2.1/reference/glossary/#tag-value) if the values are used in [filter()]({{< latest "flux" >}}/universe/filter/) or [group()](/{{< latest "flux" >}}/universe/group/) functions.
- Store values as tag values if the values are shared across multiple data points, i.e. metadata about the field.
Because InfluxDB indexes tags, the query engine doesn't need to scan every record in a bucket to locate a tag value.
For example, consider a bucket that stores data about thousands of users. With `userId` stored in a [field](/influxdb/v2.1/reference/glossary/#field), a query for user `abcde` requires InfluxDB to scan `userId` in every row.
```js
from(bucket: "example-bucket")
|> range(start: -7d)
|> filter(fn: (r) => r._field == "userId" and r._value == "abcde")
```
To retrieve data more quickly, filter on a tag to reduce the number of rows scanned.
The tag should store data that can be reasonably indexed.
The following query filters by the `company` tag to reduce the number of rows scanned for `userId`.
```js
from(bucket: "example-bucket")
|> range(start: -7d)
|> filter(fn: (r) => r.company == "Acme")
|> filter(fn: (r) => r._field == "userId" and r._value == "abcde")
```
### Keep tags simple
Use one tag for each data attribute.
If your source data contains multiple data attributes in a single parameter,
split each attribute into its own tag.
When each tag represents one attribute (not multiple concatenated attributes) of your data,
you'll reduce the need for regular expressions in your queries.
Without regular expressions, your queries will be easier to write and more performant.
#### Compare schemas
Compare the following valid schemas represented by line protocol.
**Recommended**: the following schema splits location data into `plot` and `region` tags.
##### {id="good-tags-schema"}
```
Good Tags schema - Data encoded in multiple tags
-------------
weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000
```
**Not recommended**: the following schema stores multiple attributes (`plot` and `region`) concatenated within the `location` tag value (`plot-1.north`).
##### {id="bad-tags-schema"}
```
Bad Tags schema - Multiple data encoded in a single tag
-------------
weather_sensor,crop=blueberries,location=plot-1.north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,location=plot-2.midwest temp=49.8 1472515200000000000
```
The schema 1 data encodes multiple parameters, the `plot` and `region`, into a long tag value (`plot-1.north`).
Compare this to schema 2:
#### Compare queries
```
Schema 2 - Data encoded in multiple tags
-------------
weather_sensor,crop=blueberries,plot=1,region=north temp=50.1 1472515200000000000
weather_sensor,crop=blueberries,plot=2,region=midwest temp=49.8 1472515200000000000
```
Compare queries of the [_Good Tags_](#good-tags-schema) and [_Bad Tags_](#bad-tags-schema) schemas.
The [Flux](/{{< latest "flux" >}}/) queries calculate the average `temp` for blueberries in the `north` region.
Schema 2 is preferable because, with multiple tags, you don't need a regular expression.
#### Flux example to query schemas
The following Flux examples show how to calculate the average `temp` for blueberries in the `north` region; both for schema 1 and schema 2.
**Easy to query**: [_Good Tags_](#good-tags-schema) data is easily filtered by `region` tag values, as in the following example.
```js
// Schema 1 - Query for multiple data encoded in a single tag
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement == "weather_sensor" and r.location =~ /\.north$/ and r._field == "temp")
|> mean()
// Schema 2 - Query for data encoded in multiple tags
// Query *Good Tags* schema, data encoded in multiple tags
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement == "weather_sensor" and r.region == "north" and r._field == "temp")
|> mean()
```
In schema 1, we see that querying the `plot` and `region` in a single tag makes the data more difficult to query.
<!--
## Shard group duration management
**Difficult to query**: [_Bad Tags_](#bad-tags-schema) requires regular expressions to parse the complex `location` values, as in the following example.
InfluxDB stores data in shard groups.
Shard groups are organized by [buckets](/influxdb/v2.1/reference/glossary/#bucket) and store data with timestamps that fall within a specific time interval called the [shard duration](/influxdb/v1.8/concepts/glossary/#shard-duration).
If no shard group duration is provided, the shard group duration is determined by the RP [duration](/influxdb/v1.8/concepts/glossary/#duration) at the time the RP is created. The default values are:
| RP Duration | Shard Group Duration |
|---|---|
| < 2 days | 1 hour |
| >= 2 days and <= 6 months | 1 day |
| > 6 months | 7 days |
The shard group duration is also configurable per RP.
To configure the shard group duration, see [Retention Policy Management](/influxdb/v1.8/query_language/manage-database/#retention-policy-management).
### Shard group duration tradeoffs
Determining the optimal shard group duration requires finding the balance between:
- Better overall performance with longer shards
- Flexibility provided by shorter shards
#### Long shard group duration
Longer shard group durations let InfluxDB store more data in the same logical location.
This reduces data duplication, improves compression efficiency, and improves query speed in some cases.
#### Short shard group duration
Shorter shard group durations allow the system to more efficiently drop data and record incremental backups.
When InfluxDB enforces an RP it drops entire shard groups, not individual data points, even if the points are older than the RP duration.
A shard group will only be removed once a shard group's duration *end time* is older than the RP duration.
For example, if your RP has a duration of one day, InfluxDB will drop an hour's worth of data every hour and will always have 25 shard groups. One for each hour in the day and an extra shard group that is partially expiring, but isn't removed until the whole shard group is older than 24 hours.
>**Note:** A special use case to consider: filtering queries on schema data (such as tags, series, measurements) by time. For example, if you want to filter schema data within a one hour interval, you must set the shard group duration to 1h. For more information, see [filter schema data by time](/influxdb/v1.8/query_language/explore-schema/#filter-meta-queries-by-time).
### Shard group duration recommendations
The default shard group durations work well for most cases. However, high-throughput or long-running instances will benefit from using longer shard group durations.
Here are some recommendations for longer shard group durations:
| RP Duration | Shard Group Duration |
|---|---|
| <= 1 day | 6 hours |
| > 1 day and <= 7 days | 1 day |
| > 7 days and <= 3 months | 7 days |
| > 3 months | 30 days |
| infinite | 52 weeks or longer |
> **Note:** Note that `INF` (infinite) is not a [valid shard group duration](/influxdb/v1.8/query_language/manage-database/#retention-policy-management).
In extreme cases where data covers decades and will never be deleted, a long shard group duration like `1040w` (20 years) is perfectly valid.
Other factors to consider before setting shard group duration:
* Shard groups should be twice as long as the longest time range of the most frequent queries
* Shard groups should each contain more than 100,000 [points](/influxdb/v1.8/concepts/glossary/#point) per shard group
* Shard groups should each contain more than 1,000 points per [series](/influxdb/v1.8/concepts/glossary/#series)
#### Shard group duration for backfilling
Bulk insertion of historical data covering a large time range in the past creates a large number of shards at once.
The concurrent access and overhead of writing to hundreds or thousands of shards can quickly lead to slow performance and memory exhaustion.
When writing historical data, consider your ingest rate limits, volume, and existing data schema affects performance and memory.
-->
```js
// Query *Bad Tags* schema, multiple data encoded in a single tag
from(bucket:"example-bucket")
|> range(start:2016-08-30T00:00:00Z)
|> filter(fn: (r) => r._measurement == "weather_sensor" and r.location =~ /\.north$/ and r._field == "temp")
|> mean()
```

View File

@ -16,33 +16,38 @@ related:
---
Learn how to handle and recover from errors when writing to InfluxDB.
- [Discover common failure scenarios](#common-failure-scenarios)
- [HTTP status codes](#http-status-codes)
- [Discover common failure scenarios](#discover-common-failure-scenarios)
- [Review HTTP status codes](#review-http-status-codes)
- [Troubleshoot failures](#troubleshoot-failures)
## Common failure scenarios
## Discover common failure scenarios
Write requests made to InfluxDB may fail for a number of reasons.
Common failure scenarios that return an HTTP `4xx` or `5xx` error status code include the following:
- Request exceeded a rate limit.
- API token was invalid.
- API token was invalid. . See how to [manage API tokens](/influxdb/v2.0/security/tokens/).
- Payload size was too large.
- Client or server reached a timeout threshold.
- Size of the data payload was too large.
- Data was not formatted correctly.
Writes may fail partially or completely even though InfluxDB returns an HTTP `2xx` status code for a valid request. For example, a partial write may occur when InfluxDB writes all points that conform to the bucket schema, but rejects points that have the wrong data type in a field.
To find the causes of a specific error, [review HTTP status codes](#review-http-status-codes).
## HTTP status codes
### Troubleshoot partial writes
Writes may fail partially or completely even though InfluxDB returns an HTTP `2xx` status code for a valid request.
For example, a partial write may occur when InfluxDB writes all points that conform to the bucket schema, but rejects points that have the wrong data type in a field.
To resolve partial writes and rejected points, see [troubleshoot failures](#troubleshoot-failures).
## Review HTTP status codes
InfluxDB uses conventional HTTP status codes to indicate the success or failure of a request.
Write requests return the following status codes:
- `204` **Success**: InfluxDB validated the request data format and accepted the data for writing to the bucket.
{{% note %}}
{{% note %}}
`204` doesn't indicate a successful write operation since writes are asynchronous.
If some of your data did not write to the bucket, see how to [troubleshoot rejected points](#troubleshoot-rejected-points).
{{% /note %}}
{{% /note %}}
- `400` **Bad request**: The [line protocol](/influxdb/v2.1/reference/syntax/line-protocol/) data in the request was malformed.
The response body contains the first malformed line in the data. All request data was rejected and not written.
@ -51,7 +56,7 @@ Write requests return the following status codes:
- [API token](/influxdb/v2.1/api-guide/api_intro/#authentication) value is missing from the header.
- API token does not have sufficient permissions to write to the organization and the bucket. For more information about token types and permissions, see [Manage API tokens](/influxdb/v2.1/security/tokens/)
- `404` **Not found**: A requested resource (e.g. an organization or bucket) was not found. The response body contains the requested resource type, e.g. "organization", and resource name.
- `413` **Request entity too large**: All request data was rejected and not written. InfluxDB OSS only returns this error if the [Go (golang) `ioutil.ReadAll()`](https://pkg.go.dev/io/ioutil#ReadAll) function raises an error.
- `413` **Request entity too large**: All request data was rejected and not written. InfluxDB OSS only returns this error if the [Go (golang) `ioutil.ReadAll()`](https://pkg.go.dev/io/ioutil#ReadAll) function raises an error.
- `500` **Internal server error**: Default HTTP status for an error.
- `503` **Service unavailable**: Server is temporarily unavailable to accept writes. The `Retry-After` header describes when to try the write again.
@ -66,7 +71,7 @@ If you notice data is missing in your bucket, do the following:
- Verify the data types match other data points with the same series.
For example, did you attempt to write `string` data to an `int` field?
- Verify the timestamps match the [precision parameter](/influxdb/v2.1/write-data/#timestamp-precision).
- Minimize payload size and network errors by [optimizing writes](/influxdb/v2.1/write-data/best-practices/optimize-writes/)
- Minimize payload size and network errors by [optimizing writes](/influxdb/v2.1/write-data/best-practices/optimize-writes/)
### Troubleshoot rejected points

View File

@ -9,6 +9,29 @@ menu:
weight: 10
parent: About the project
---
## v1.20.4 [2021-11-17]
- Update `BurntSushi/toml` from 0.3.1 to 0.4.1.
- Update `gosnmp` module from 1.32 to 1.33.
- Update `go.opentelemetry.io/otel` from v0.23.0 to v0.24.0.
- Fix plugin linters.
### Input plugin updates
- Cisco Model-Driven Telemetry (`cisco_telemetry_mdt`): Move to new protobuf library.
- InfluxDB (`influxdb`): Update input schema docs.
- Intel RDT (`intel_rdt`): Correct the timezone to use local timezone by default instead of UTC from metrics gathered from the `pqos` tool.
- IPMI Sensor (`ipmi`): Redact passwords in log files to maintain security.
- Modbus (`modbus`): Do not build on OpenBSD.
- MySQL (`mysql`):
- Fix type conversion follow-up.
- Correctly set the default paths.
- NVIDIA SMI (`nvidia_smi`): Correctly set the default paths.
- Proxmox (`proxmox`): Parse the column types of the server status.
- SQL Server (`sqlserver`): Add elastic pool in supported versions.
### Output plugin updates
- Loki (`loki`): Include the metric name as a label for improved query performance and metric filtering.
## v1.20.3 [2021-10-28]
- Update Go to 1.17.2.

View File

@ -35,7 +35,7 @@ telegraf:
versions: [v1.9, v1.10, v1.11, v1.12, v1.13, v1.14, v1.15, v1.16, v1.17, v1.18, v1.19, v1.20]
latest: v1.20
latest_patches:
"1.20": 3
"1.20": 4
"1.19": 3
"1.18": 3
"1.17": 3

View File

@ -5,9 +5,9 @@
{{ $influxdbCloud := and (eq $product "influxdb") (eq $version "cloud") }}
{{ if $influxdbOSS }}
{{ .Content | replaceRE `(?U)(<span class=\'cloud\-only\'>.*<\/span><\!\-\- close \-\-\>)` "" | replaceRE `(?Us)(<div class=\'cloud\-only\'>.*<\/div><\!\-\- close \-\-\>)` "" | safeHTML}}
{{ .Content | replaceRE `(?Us)(<li>\s*<(?:div|span) class=\'cloud\-only\'>.*<\/(?:div|span)><\!\-\- close \-\-\>\s*</li>)` "" | replaceRE `(?Us)(<(?:div|span) class=\'cloud\-only\'>.*<\/(?:div|span)><\!\-\- close \-\-\>)` "" | safeHTML}}
{{ else if $influxdbCloud }}
{{ .Content | replaceRE `(?U)(<span class=\'oss\-only\'>.*<\/span><\!\-\- close \-\-\>)` "" | replaceRE `(?Us)(<div class=\'oss\-only\'>.*<\/div><\!\-\- close \-\-\>)` "" | safeHTML}}
{{ .Content | replaceRE `(?Us)(<li>\s*<(?:div|span) class=\'oss\-only\'>.*<\/(?:div|span)><\!\-\- close \-\-\>\s*</li>)` "" | replaceRE `(?Us)(<(?:div|span) class=\'oss\-only\'>.*<\/(?:div|span)><\!\-\- close \-\-\>)` "" | safeHTML}}
{{ else }}
{{ .Content }}
{{ end }}
{{ end }}