From 5649770e060137546c98dd9b699364b516ff1eb6 Mon Sep 17 00:00:00 2001 From: Jason Stirnaman Date: Wed, 6 Dec 2023 16:25:37 -0600 Subject: [PATCH] Fix-5226 v2 and v3 series inaccuracy (#5264) * fix(v2): series keys in OSS closes #5226 - Fix v2 series keys definitions - Remove field key from OSS description - Update glossary and schema-design. * fix(v3): rejected points reason closes Fix series definitions and reasons for rejected points #5226 - Replaces series with partition in reasons for rejected points. --- .../write-data/troubleshoot.md | 13 +- .../write-data/troubleshoot.md | 9 +- .../clustered/write-data/troubleshoot.md | 13 +- content/influxdb/v2/reference/glossary.md | 23 ++- .../reference/key-concepts/data-elements.md | 157 +++++++++--------- .../best-practices/schema-design.md | 3 +- 6 files changed, 99 insertions(+), 119 deletions(-) diff --git a/content/influxdb/cloud-dedicated/write-data/troubleshoot.md b/content/influxdb/cloud-dedicated/write-data/troubleshoot.md index b27dd1674..9de94dd99 100644 --- a/content/influxdb/cloud-dedicated/write-data/troubleshoot.md +++ b/content/influxdb/cloud-dedicated/write-data/troubleshoot.md @@ -19,16 +19,12 @@ related: Learn how to avoid unexpected results and recover from errors when writing to {{% product-name %}}. - - - [Handle write responses](#handle-write-responses) - [Review HTTP status codes](#review-http-status-codes) - [Troubleshoot failures](#troubleshoot-failures) - [Troubleshoot rejected points](#troubleshoot-rejected-points) - - -## Handle `write` responses +## Handle write responses In {{% product-name %}}, writes are synchronous. After InfluxDB validates the request and ingests the data, it sends a _success_ response (HTTP `204` status code) as an acknowledgement that the data is written and queryable. @@ -66,9 +62,6 @@ If you notice data is missing in your database, do the following: ## Troubleshoot rejected points -InfluxDB rejects points for the following reasons: +InfluxDB rejects points that fall within the same partition (default partitioning is measurement and day) as existing bucket data and have a different data type for an existing field. -- The **batch** contains another point with the same series, but one of the fields has a different value type. -- The **bucket** contains another point with the same series, but one of the fields has a different value type. - -Check for [field data type](/influxdb/cloud-dedicated/reference/syntax/line-protocol/#data-types-and-format) differences between the missing data point and other points that have the same [series](/influxdb/cloud-dedicated/reference/glossary/#series)--for example, did you attempt to write `string` data to an `int` field? +Check for [field data type](/influxdb/cloud-dedicated/reference/syntax/line-protocol/#data-types-and-format) differences between the rejected data point and points within the same database and partition--for example, did you attempt to write `string` data to an `int` field? diff --git a/content/influxdb/cloud-serverless/write-data/troubleshoot.md b/content/influxdb/cloud-serverless/write-data/troubleshoot.md index 5bb78afec..f6425927b 100644 --- a/content/influxdb/cloud-serverless/write-data/troubleshoot.md +++ b/content/influxdb/cloud-serverless/write-data/troubleshoot.md @@ -26,7 +26,7 @@ Learn how to avoid unexpected results and recover from errors when writing to {{ -## Handle `write` responses +## Handle write responses In {{% product-name %}}, writes are synchronous. After InfluxDB validates the request and ingests the data, it sends a _success_ response (HTTP `204` status code) as an acknowledgement that the data is written and queryable. @@ -68,9 +68,6 @@ If you notice data is missing in your database, do the following: ## Troubleshoot rejected points -InfluxDB rejects points for the following reasons: +InfluxDB rejects points that fall within the same partition (measurement and day) as existing bucket data and have a different data type for an existing field. -- The **batch** contains another point with the same series, but one of the fields has a different value type. -- The **bucket** contains another point with the same series, but one of the fields has a different value type. - -Check for [field data type](/influxdb/cloud-serverless/reference/syntax/line-protocol/#data-types-and-format) differences between the missing data point and other points that have the same [series](/influxdb/cloud-serverless/reference/glossary/#series)--for example, did you attempt to write `string` data to an `int` field? +Check for [field data type](/influxdb/cloud-serverless/reference/syntax/line-protocol/#data-types-and-format) differences between the rejected data point and points in the bucket that have the same measurement and day--for example, did you attempt to write `string` data to an `int` field? diff --git a/content/influxdb/clustered/write-data/troubleshoot.md b/content/influxdb/clustered/write-data/troubleshoot.md index c280250a5..1c7c9b5f5 100644 --- a/content/influxdb/clustered/write-data/troubleshoot.md +++ b/content/influxdb/clustered/write-data/troubleshoot.md @@ -19,16 +19,12 @@ related: Learn how to avoid unexpected results and recover from errors when writing to {{% product-name %}}. - - - [Handle write responses](#handle-write-responses) - [Review HTTP status codes](#review-http-status-codes) - [Troubleshoot failures](#troubleshoot-failures) - [Troubleshoot rejected points](#troubleshoot-rejected-points) - - -## Handle `write` responses +## Handle write responses In {{% product-name %}}, writes are synchronous. After InfluxDB validates the request and ingests the data, it sends a _success_ response (HTTP `204` status code) as an acknowledgement that the data is written and queryable. @@ -66,9 +62,6 @@ If you notice data is missing in your database, do the following: ## Troubleshoot rejected points -InfluxDB rejects points for the following reasons: +InfluxDB rejects points that fall within the same partition (default partitioning is measurement and day) as existing bucket data and have a different data type for an existing field. -- The **batch** contains another point with the same series, but one of the fields has a different value type. -- The **bucket** contains another point with the same series, but one of the fields has a different value type. - -Check for [field data type](/influxdb/clustered/reference/syntax/line-protocol/#data-types-and-format) differences between the missing data point and other points that have the same [series](/influxdb/clustered/reference/glossary/#series)--for example, did you attempt to write `string` data to an `int` field? +Check for [field data type](/influxdb/clustered/reference/syntax/line-protocol/#data-types-and-format) differences between the rejected data point and points within the same database and partition--for example, did you attempt to write `string` data to an `int` field? diff --git a/content/influxdb/v2/reference/glossary.md b/content/influxdb/v2/reference/glossary.md index cb24403db..3aadbccfa 100644 --- a/content/influxdb/v2/reference/glossary.md +++ b/content/influxdb/v2/reference/glossary.md @@ -902,6 +902,7 @@ InfluxDB scrapes data from specified targets at regular intervals and writes the Data can be scraped from any accessible endpoint that provides data in the [Prometheus exposition format](https://prometheus.io/docs/instrumenting/exposition_formats/). ### secret + Secrets are key-value pairs that contain information you want to control access to, such as API keys, passwords, or certificates. ### selector @@ -913,11 +914,10 @@ Related entries: [aggregate](#aggregate), [function](#function), [transformation ### series -A collection of data in the InfluxDB data structure that share a common -{{% cloud-only %}}**measurement**, **tag set**, and **field key**.{{% /cloud-only %}} -{{% oss-only %}}**measurement** and **tag set**.{{% /oss-only %}} +A collection of timestamps and field values that share a common series key +({{% cloud-only %}}measurement, tag set, and field key{{% /cloud-only %}}{{% oss-only %}}measurement and tag set{{% /oss-only %}}). -Related entries: [field set](#field-set), [measurement](#measurement), [tag set](#tag-set) +Related entries: [field set](#field-set), [measurement](#measurement), [series key](#series-key), [tag set](#tag-set) ### series cardinality @@ -925,8 +925,8 @@ The number of unique measurement, tag set, and field key combinations in an Infl For example, assume that an InfluxDB bucket has one measurement. The single measurement has two tag keys: `email` and `status`. -If there are three different `email`s, and each email address is associated with two -different `status`es, the series cardinality for the measurement is 6 +If the data contains three different `email` values, and each email address is associated with two +different `status` values, the series cardinality for the measurement is `6` (3 × 2 = 6): | email | status | @@ -938,12 +938,11 @@ different `status`es, the series cardinality for the measurement is 6 | cliff@influxdata.com | start | | cliff@influxdata.com | finish | -In some cases, performing this multiplication may overestimate series cardinality -because of the presence of dependent tags. -Dependent tags are scoped by another tag and do not increase series cardinality. -If we add the tag `firstname` to the example above, the series cardinality -would not be 18 (3 × 2 × 3 = 18). -The series cardinality would remain unchanged at 6, as `firstname` is already scoped by the `email` tag: +In some cases, this calculation may overestimate series cardinality +because of the presence of _dependent tags_--tags scoped by another tag. +Dependent tags do not increase series cardinality. +Adding the tag `firstname` to the preceding example would not increase the series cardinality to `18` (3 × 2 × 3 = 18). +The series cardinality would remain unchanged at `6`, as `firstname` is already scoped by the `email` tag: | email | status | firstname | | :------------------- | :----- | :-------- | diff --git a/content/influxdb/v2/reference/key-concepts/data-elements.md b/content/influxdb/v2/reference/key-concepts/data-elements.md index d707341c6..4b9ecf383 100644 --- a/content/influxdb/v2/reference/key-concepts/data-elements.md +++ b/content/influxdb/v2/reference/key-concepts/data-elements.md @@ -10,25 +10,29 @@ menu: influxdb/v2/tags: [key concepts, schema] related: - /resources/videos/data-model-building-blocks/ + - /influxdb/v2/write-data/best-practices/resolve-high-cardinality/ + - /influxdb/v2/write-data/best-practices/schema-design/ --- InfluxDB {{< current-version >}} includes the following data elements: -- [timestamp](#timestamp) -- [field key](#field-key) -- [field value](#field-value) -- [field set](#field-set) -- [tag key](#tag-key) -- [tag value](#tag-value) -- [tag set](#tag-set) -- [measurement](#measurement) -- [series](#series) -- [point](#point) -- [bucket](#bucket) -- [bucket schema](#bucket-schema) -- [organization](#organization) +- [Timestamp](#timestamp) +- [Measurement](#measurement) +- [Fields](#fields) + - [Field key](#field-key) + - [Field value](#field-value) + - [Field set](#field-set) +- [Tags](#tags) + - [Tag key](#tag-key) + - [Tag value](#tag-value) + - [Tag set](#tag-set) +- [Series](#series) +- [Point](#point) +- [Bucket](#bucket) +- [Organization](#organization) -The sample data below is used to illustrate data elements concepts. + +The following sample data represents time series records stored in InfluxDB and is used to illustrate data elements concepts. _Hover over highlighted terms to get acquainted with InfluxDB terminology and layout._ @@ -55,7 +59,7 @@ A field includes a field key stored in the `_field` column and a field value sto ### Field key -A field key is a string that represents the name of the field. In the sample data above, `bees` and `ants` are field keys. +A field key is a string that represents the name of the field. In the preceding [sample data](#sample-data), `bees` and `ants` are field keys. ### Field value @@ -63,19 +67,25 @@ A field value represents the value of an associated field. Field values can be s ### Field set -A field set is a collection of field key-value pairs associated with a timestamp. The sample data includes the following field sets: - -```bash +A field set is a collection of field key-value pairs associated with a timestamp. The [sample data](#sample-data) includes the following field sets: +```text census bees=23i,ants=30i 1566086400000000000 census bees=28i,ants=32i 1566086760000000000 ----------------- Field set - ``` {{% note %}} -**Fields aren't indexed:** Fields are required in InfluxDB data and are not indexed. Queries that filter field values must scan all field values to match query conditions. As a result, queries on tags > are more performant than queries on fields. **Store commonly queried metadata in tags.** + +#### Fields aren't indexed + +Fields are required in InfluxDB data and are not indexed. +Queries that filter field values must scan all field values to match query conditions. +As a result, queries on tags are more performant than queries on fields. + +See how to [use tags and fields](/influxdb/v2/write-data/best-practices/schema-design/#use-tags-and-fields) to make your schema easier to query. + {{% /note %}} ## Tags @@ -106,72 +116,61 @@ location = portland, scientist = mullen ``` {{% note %}} -**Tags are indexed:** Tags are optional. You don't need tags in your data structure, but it's typically a good idea to include tags. -Because tags are indexed, queries on tags are faster than queries on fields. This makes tags ideal for storing commonly queried metadata. + +#### Tags are indexed + +Tags are optional. +You don't need tags in your data structure, but it's typically a good idea to include them. +Because InfluxDB indexes tags, the query engine doesn’t need to scan every record in a bucket to locate a tag value. +See how to [use tags to improve query performance](/influxdb/v2/write-data/best-practices/schema-design/#use-tags-to-improve-query-performance). + {{% /note %}} -{{% note %}} -Tags containing highly variable information like UUIDs, hashes, and random strings will lead to a large number of unique series in the database, known as **high series cardinality**. High series cardinality is a primary driver of high memory usage for many database workloads. See [series cardinality](/influxdb/v2/reference/glossary/#series-cardinality) for more information. -{{% /note %}} +### Why your schema matters - -#### Why your schema matters - -If most of your queries focus on values in the fields, for example, a query to find when 23 bees were counted: - -```js -from(bucket: "bucket-name") - |> range(start: 2019-08-17T00:00:00Z, stop: 2019-08-19T00:00:00Z) - |> filter(fn: (r) => r._field == "bees" and r._value == 23) -``` - -InfluxDB scans every field value in the dataset for `bees` before the query returns a response. If our sample `census` data grew to millions of rows, to optimize your query, you could rearrange your [schema](/influxdb/v2/reference/glossary/#schema) so the fields (`bees` and `ants`) becomes tags and the tags (`location` and `scientist`) become fields: - -| _time | _measurement | {{< tooltip "Tag key" "bees" >}} | _field | _value | -|:------------------- |:------------ |:------- |:-- |:------ | -| 2019-08-18T00:00:00Z | census | 23 | location | klamath | -| 2019-08-18T00:00:00Z | census | 23 | scientist | anderson | -| 2019-08-18T00:06:00Z | census | {{< tooltip "Tag value" "28" >}} | {{< tooltip "Field key" "location" >}} | {{< tooltip "Field value" "klamath" >}} | -| 2019-08-18T00:06:00Z | census | 28 | scientist | anderson | - -| _time | _measurement | {{< tooltip "Tag key" "ants" >}} | _field | _value | -|:------------------- |:------------ |:------- |:-- |:------ | -| 2019-08-18T00:00:00Z | census | 30 | location | portland | -| 2019-08-18T00:00:00Z | census | 30 | scientist | mullen | -| 2019-08-18T00:06:00Z | census | {{< tooltip "Tag value" "32" >}} | {{< tooltip "Field key" "location" >}} | {{< tooltip "Field value" "portland" >}} | -| 2019-08-18T00:06:00Z | census | 32 | scientist | mullen | - -Now that `bees` and `ants` are tags, InfluxDB doesn't have to scan all `_field` and `_value` columns. This makes your queries faster. - -## Bucket schema - -In InfluxDB Cloud, a bucket with the `explicit` schema-type requires an explicit -schema for each measurement. -Measurements contain tags, fields, and timestamps. -An explicit schema constrains the shape of data that can be written to that measurement. - -The following schema constrains `census` data: - -name | type | data_type -|:------- |:---------------|:-------------------- -time | timestamp | -location | tag | string -scientist | tag | string -ants | field | integer -bees | field | integer +How you structure measurements, fields, and tags in your data can make queries easier to write and more performant. +Good [schema design](/influxdb/v2/write-data/best-practices/schema-design) can prevent [high series cardinality](/influxdb/v2/write-data/best-practices/resolve-high-cardinality/), resulting in better performing queries. ## Series -Now that you're familiar with measurements, field sets, and tag sets, it's time to discuss series keys and series. A **series key** is a collection of points that share a measurement, tag set, and field key. For example, the [sample data](#sample-data) includes two unique series keys: +Now that you're familiar with measurements, field sets, and tag sets, it's time to discuss series keys and series. + +{{% oss-only %}} +In {{% product-name %}}, a **series key** is a unique combination of measurement and tag set. + +For example, the [sample data](#sample-data) includes two unique series keys: + +| _measurement | tag set | +|:------------- |:------------------------------- | +| census | {{< tooltip "Tag set" "location=klamath,scientist=anderson" >}} | +| census | location=portland,scientist=mullen | + +A **series** includes timestamps and field values for a given series key. +From the sample data, here's a **series key** and the corresponding **series**: + +```text +# series key +census,location=klamath,scientist=anderson + +# series +2019-08-18T00:00:00Z 23 +2019-08-18T00:06:00Z 28 +``` + +{{% /oss-only %}} +{{% cloud-only %}} +In {{% product-name %}}, a **series key** is a unique combination of measurement, tag set, and field key. + +For example, the [sample data](#sample-data) includes two unique series keys: | _measurement | tag set | _field | |:------------- |:------------------------------- |:------ | | census | {{< tooltip "Tag set" "location=klamath,scientist=anderson" >}} | {{< tooltip "Field key" "bees" >}} | | census | location=portland,scientist=mullen | ants | -A **series** includes timestamps and field values for a given series key. From the sample data, here's a **series key** and the corresponding **series**: +A **series** includes timestamps and field values for a given series key--for example, the following is a **series key** and the corresponding **series** from the sample data: -```bash +```text # series key census,location=klamath,scientist=anderson bees @@ -180,11 +179,13 @@ census,location=klamath,scientist=anderson bees 2019-08-18T00:06:00Z 28 ``` -Understanding the concept of a series is essential when designing your [schema](/influxdb/v2/reference/glossary/#schema) and working with your data in InfluxDB. +{{% /cloud-only %}} + +Understanding the concept of a series is essential when [designing your schema](/influxdb/v2/write-data/best-practices/schema-design/) and working with your data in InfluxDB. ## Point -A **point** includes the series key, a field value, and a timestamp. For example, a single point from the [sample data](#sample-data) looks like this: +A **point** includes the series key, a field value, and a timestamp--for example, a single point from the [sample data](#sample-data): `2019-08-18T00:00:00Z census ants 30 portland mullen` @@ -196,11 +197,7 @@ All InfluxDB data is stored in a bucket. A **bucket** combines the concept of a An InfluxDB **organization** is a workspace for a group of [users](/influxdb/v2/admin/users/). All [dashboards](/influxdb/v2/visualize-data/dashboards/), [tasks](/influxdb/v2/process-data/), buckets, and users belong to an organization. For more information about organizations, see [Manage organizations](/influxdb/v2/admin/organizations/). -If you're just starting out, we recommend taking a look at the following guides: - -- [Get started](/influxdb/v2/get-started) -- [Write data](/influxdb/v2/write-data) -- [Query data](/influxdb/v2/query-data) +If you're new to using InfluxDB, see how to [get started](/influxdb/v2/get-started) writing and querying data. For an overview of how these elements interconnect within InfluxDB's data model, watch the following video: diff --git a/content/influxdb/v2/write-data/best-practices/schema-design.md b/content/influxdb/v2/write-data/best-practices/schema-design.md index 2db1b679e..f39044aed 100644 --- a/content/influxdb/v2/write-data/best-practices/schema-design.md +++ b/content/influxdb/v2/write-data/best-practices/schema-design.md @@ -8,10 +8,11 @@ menu: weight: 201 parent: write-best-practices related: + - /influxdb/v2/reference/key-concepts/data-elements/ - /resources/videos/data-model-building-blocks/ --- -Design your [schema](/influxdb/v2/reference/glossary/#schema) for simpler and more performant queries. +Design your [schema](/influxdb/v2/reference/key-concepts/data-elements/) for simpler and more performant queries. Follow design guidelines to make your schema easy to query. Learn how these guidelines lead to more performant queries.