* fix(cloud-iox): typo * fix(cloud-iox): fix typos and cleanup SQL descriptions * fix(cloud-iox): schema design corrections (closes #4851): - Update IOx schema design best practice with feedback from @pauldix: - Timestamp - Primary key and complexity - Wide schema correction - Sparse schema clarification - Remove explicit bucket schemas - "nearly infinite" was replaced in an earlier commit. - Add timestamp references in Glossary. * Update content/influxdb/cloud-iox/write-data/best-practices/schema-design.md Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com> --------- Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>pull/4889/head^2
parent
114ef5085a
commit
6d13675739
|
@ -66,10 +66,10 @@ to your InfluxDB Cloud bucket before running the example queries.
|
|||
|
||||
### Query data within time boundaries
|
||||
|
||||
- Use the `SELECT` clause to specify what tags and fields to return.
|
||||
- Use the `SELECT` clause to specify what columns (tags and fields) to return.
|
||||
To return all tags and fields, use the wildcard alias (`*`).
|
||||
- Specify the measurement to query in the `FROM` clause.
|
||||
- Specify time boundaries in the `WHERE` clause.
|
||||
- In the `FROM` clause, specify the table (measurement) to query.
|
||||
- In the `WHERE` clause, specify time boundaries and other conditions for filtering.
|
||||
Include time-based predicates that compare the value of the `time` column to a timestamp.
|
||||
Use the `AND` logical operator to chain multiple predicates together.
|
||||
|
||||
|
@ -110,8 +110,8 @@ WHERE
|
|||
|
||||
{{% expand "Query with absolute time boundaries" %}}
|
||||
|
||||
To query data from absolute time boundaries, compare the value of the `time column
|
||||
to a timestamp literals.
|
||||
To query data from absolute time boundaries, compare the value of the `time` column
|
||||
to a timestamp literal.
|
||||
Use the `AND` logical operator to chain together multiple predicates and define
|
||||
both start and stop boundaries for the query.
|
||||
|
||||
|
@ -132,11 +132,11 @@ WHERE
|
|||
|
||||
### Query data without time boundaries
|
||||
|
||||
To query data without time boundaries, do not include any time-based predicates
|
||||
To query data without time boundaries, don't include any time-based predicates
|
||||
in your `WHERE` clause.
|
||||
|
||||
{{% warn %}}
|
||||
Querying data _without time bounds_ can return an unexpected amount of data.
|
||||
Querying data _without time bounds_ can return a large number of rows.
|
||||
The query may take a long time to complete and results may be truncated.
|
||||
{{% /warn %}}
|
||||
|
||||
|
@ -146,8 +146,8 @@ SELECT * FROM home
|
|||
|
||||
### Query specific fields and tags
|
||||
|
||||
To query specific fields, include them in the `SELECT` clause.
|
||||
If querying multiple fields or tags, comma-delimit each.
|
||||
To specify columns (fields, tags, or calculations) you want to retrieve, list them in the `SELECT` clause.
|
||||
Use a comma to separate column names.
|
||||
If the field or tag keys include special characters or spaces or are case-sensitive,
|
||||
wrap the key in _double-quotes_.
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
title: Explore your schema with SQL
|
||||
description: >
|
||||
When working with InfluxDB's implementation of SQL, a **bucket** is equivalent
|
||||
to a databases, a **measurement** is structured as a table, and **time**,
|
||||
to a database, a **measurement** is structured as a table, and **time**,
|
||||
**fields**, and **tags** are structured as columns.
|
||||
menu:
|
||||
influxdb_cloud_iox:
|
||||
|
|
|
@ -13,20 +13,23 @@ menu:
|
|||
Use the following guidelines to design your [schema](/influxdb/cloud-iox/reference/glossary/#schema)
|
||||
for simpler and more performant queries.
|
||||
|
||||
<!-- TOC -->
|
||||
|
||||
- [InfluxDB data structure](#influxdb-data-structure)
|
||||
- [Primary keys](#primary-keys)
|
||||
- [Tags versus fields](#tags-versus-fields)
|
||||
- [Schema restrictions](#schema-restrictions)
|
||||
- [Do not use duplicate names for tags and fields](#do-not-use-duplicate-names-for-tags-and-fields)
|
||||
- [Measurements can contain up to 200 columns](#measurements-can-contain-up-to-200-columns)
|
||||
- [Design for performance](#design-for-performance)
|
||||
- [Avoid wide schemas](#avoid-wide-schemas)
|
||||
- [Avoid too many tags](#avoid-too-many-tags)
|
||||
- [Avoid sparse schemas](#avoid-sparse-schemas)
|
||||
- [Writing individual fields with different timestamps](#writing-individual-fields-with-different-timestamps)
|
||||
- [Measurement schemas should be homogenous](#measurement-schemas-should-be-homogenous)
|
||||
- [Design for query simplicity](#design-for-query-simplicity)
|
||||
- [Keep measurement names, tag keys, and field keys simple](#keep-measurement-names-tag-keys-and-field-keys-simple)
|
||||
- [Avoid keywords and special characters](#avoid-keywords-and-special-characters)
|
||||
- [Use explicit bucket schemas to enforce schema](#use-explicit-bucket-schemas-to-enforce-schema)
|
||||
---
|
||||
|
||||
## InfluxDB data structure
|
||||
|
||||
|
@ -35,17 +38,25 @@ A bucket can contain multiple measurements. Measurements contain multiple
|
|||
tags and fields.
|
||||
|
||||
- **Bucket**: Named location where time series data is stored.
|
||||
In the InfluxDB SQL implementation, a bucket is synonymous with a _database_.
|
||||
A bucket can contain multiple _measurements_.
|
||||
- **Measurement**: Logical grouping for time series data.
|
||||
In the InfluxDB SQL implementation, a measurement is synonymous with a _table_.
|
||||
All _points_ in a given measurement should have the same _tags_.
|
||||
A measurement contains multiple _tags_ and _fields_.
|
||||
- **Tags**: Key-value pairs that provide metadata for each point--for example,
|
||||
something to identify the source or context of the data like host,
|
||||
- **Tags**: Key-value pairs that store metadata string values for each point--for example,
|
||||
a value that identifies or differentiates the data source or context--for example, host,
|
||||
location, station, etc.
|
||||
- **Fields**: Key-value pairs with values that change over time--for example,
|
||||
- **Fields**: Key-value pairs that store data for each point--for example,
|
||||
temperature, pressure, stock price, etc.
|
||||
- **Timestamp**: Timestamp associated with the data.
|
||||
When stored on disk and queried, all data is ordered by time.
|
||||
In InfluxDB, a timestamp is a nanosecond-scale [unix timestamp](#unix-timestamp) in UTC.
|
||||
|
||||
### Primary keys
|
||||
|
||||
In time series data, the primary key for a row of data is typically a combination of timestamp and other attributes that uniquely identify each data point.
|
||||
In InfluxDB, the primary key for a row is the combination of the point's timestamp and _tag set_ - the collection of [tag keys](/influxdb/cloud-iox/reference/glossary/#tag-key) and [tag values](/influxdb/cloud-iox/reference/glossary/#tag-value) on the point.
|
||||
|
||||
### Tags versus fields
|
||||
|
||||
|
@ -54,7 +65,7 @@ tag and what should be a field?" The following guidelines should help answer tha
|
|||
question as you design your schema.
|
||||
|
||||
- Use tags to store identifying information about the source or context of the data.
|
||||
- Use fields to store values that change over time.
|
||||
- Use fields to store measured values.
|
||||
- Tag values can only be strings.
|
||||
- Field values can be any of the following data types:
|
||||
- Integer
|
||||
|
@ -64,9 +75,9 @@ question as you design your schema.
|
|||
- Boolean
|
||||
|
||||
{{% note %}}
|
||||
If coming from a version of InfluxDB backed by the TSM storage engine, **tag value**
|
||||
cardinality no longer affects the overall performance of your database.
|
||||
The InfluxDB IOx engine supports infinite tag value and series cardinality.
|
||||
Unlike InfluxDB backed by the TSM storage engine, **tag value**
|
||||
cardinality doesn't affect the overall performance of your database.
|
||||
{{% /note %}}
|
||||
|
||||
---
|
||||
|
@ -81,11 +92,6 @@ measurement on disk.
|
|||
If you attempt to write a measurement that contains tags or fields with the same name,
|
||||
the write fails due to a column conflict.
|
||||
|
||||
{{% note %}}
|
||||
Use [explicit bucket schemas](/influxdb/cloud-iox/admin/buckets/manage-explicit-bucket-schemas/) to enforce unique tag and
|
||||
field keys within a schema.
|
||||
{{% /note %}}
|
||||
|
||||
### Measurements can contain up to 200 columns
|
||||
|
||||
A measurement can contain **up to 200 columns**. Each row requires a time column,
|
||||
|
@ -106,30 +112,55 @@ The following guidelines help to optimize query performance:
|
|||
- [Avoid sparse schemas](#avoid-sparse-schemas)
|
||||
- [Measurement schemas should be homogenous](#measurement-schemas-should-be-homogenous)
|
||||
|
||||
|
||||
### Avoid wide schemas
|
||||
|
||||
A wide schema is one with many tags and fields and corresponding columns for each.
|
||||
At query time, InfluxDB evaluates each row in the queried measurement to
|
||||
determine what rows to return. The "wider" the measurement (more columns), the
|
||||
less performant queries are against that measurement.
|
||||
To ensure queries stay performant, the InfluxDB IOx storage engine has a
|
||||
With the InfluxDB IOx storage engine, wide schemas don't impact query execution performance.
|
||||
Because IOx is a columnar database, it executes queries only against columns selected in the query.
|
||||
|
||||
Although a wide schema won't affect query performance, it can lead to the following:
|
||||
|
||||
- More resources required for persisting and compacting data during ingestion.
|
||||
- Decreased sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
|
||||
|
||||
The InfluxDB IOx storage engine has a
|
||||
[limit of 200 columns per measurement](#measurements-can-contain-up-to-200-columns).
|
||||
|
||||
To avoid a wide schema, limit the number of tags and fields stored in a measurement.
|
||||
If you need to store more than 199 total tags and fields, consider segmenting
|
||||
your fields into a separate measurement.
|
||||
|
||||
#### Avoid too many tags
|
||||
|
||||
In InfluxDB, the primary key for a row is the combination of the point's timestamp and _tag set_ - the collection of [tag keys](/influxdb/cloud-iox/reference/glossary/#tag-key) and [tag values](/influxdb/cloud-iox/reference/glossary/#tag-value) on the point.
|
||||
A point that contains more tags has a more complex primary key, which could impact sorting performance if you sort using all parts of the key.
|
||||
|
||||
### Avoid sparse schemas
|
||||
|
||||
A sparse schema is one where, for many rows, columns contain null values.
|
||||
These generally stem from [non-homogenous measurement schemas](#measurement-schemas-should-be-homogenous)
|
||||
or individual fields for a tag set being reported at separate times.
|
||||
|
||||
These generally stem from the following:
|
||||
- [non-homogenous measurement schemas](#measurement-schemas-should-be-homogenous)
|
||||
- [writing individual fields with different timestamps]()
|
||||
|
||||
Sparse schemas require the InfluxDB query engine to evaluate many
|
||||
null columns, adding unnecessary overhead to storing and querying data.
|
||||
|
||||
_For an example of a sparse schema,
|
||||
[view the non-homogenous schema example below](#view-example-of-a-sparse-non-homogenous-schema)._
|
||||
|
||||
#### Writing individual fields with different timestamps
|
||||
|
||||
Reporting fields at different times with different timestamps creates distinct rows that contain null values--for example:
|
||||
|
||||
You report `fieldA` with `tagset`, and then report `field B` with the same `tagset`, but with a different timestamp.
|
||||
The result is two rows: one row has a _null_ value for **field A** and the other has a _null_ value for **field B**.
|
||||
|
||||
In contrast, if you report fields at different times while using the same tagset and timestamp, the existing row is updated.
|
||||
This requires slightly more resources at ingestion time, but then gets resolved at persistence time or compaction time
|
||||
and avoids a sparse schema.
|
||||
|
||||
### Measurement schemas should be homogenous
|
||||
|
||||
Data stored within a measurement should be "homogenous," meaning each row should
|
||||
|
@ -368,9 +399,3 @@ iox.from(bucket: "example-bucket")
|
|||
|
||||
{{% /code-tab-content %}}
|
||||
{{< /code-tabs-wrapper >}}
|
||||
|
||||
## Use explicit bucket schemas to enforce schema
|
||||
|
||||
By default, buckets have an `implicit` **schema-type** and a schema that conforms to your data.
|
||||
To require measurements to have specific columns and data types and prevent non-conforming write requests,
|
||||
use [`explicit` buckets and explicit bucket schemas](/influxdb/cloud-iox/admin/buckets/manage-explicit-bucket-schemas/).
|
||||
|
|
|
@ -673,6 +673,8 @@ Related entries: [check](#check), [notification endpoint](#notification-endpoint
|
|||
|
||||
The local server's nanosecond timestamp.
|
||||
|
||||
Related entries: [timestamp](#timestamp)
|
||||
|
||||
### null
|
||||
|
||||
A data type that represents a missing or unknown value.
|
||||
|
@ -776,7 +778,7 @@ For example, if the precision is set to `ms`, the nanosecond epoch timestamp `14
|
|||
Telegraf output plugins do not alter the timestamp further.
|
||||
The precision setting is ignored for service input plugins.
|
||||
|
||||
Related entries: [aggregator plugin](#aggregator-plugin), [input plugin](#input-plugin), [output plugin](#output-plugin), [processor plugin](#processor-plugin), [service input plugin](#service-input-plugin)
|
||||
Related entries: [aggregator plugin](#aggregator-plugin), [input plugin](#input-plugin), [output plugin](#output-plugin), [processor plugin](#processor-plugin), [service input plugin](#service-input-plugin), [timestamp](#timestamp)
|
||||
|
||||
### predicate expression
|
||||
|
||||
|
@ -1139,12 +1141,12 @@ Irregular time series data changes at non-constant intervals.
|
|||
### timestamp
|
||||
|
||||
The date and time associated with a point.
|
||||
Time in InfluxDB is in UTC.
|
||||
In InfluxDB, a timestamp is a nanosecond-scale [unix timestamp](#unix-timestamp) in UTC.
|
||||
|
||||
To specify time when writing data, see [Elements of line protocol](/influxdb/v2.7/reference/syntax/line-protocol/#elements-of-line-protocol).
|
||||
To specify time when querying data, see [Query InfluxDB with Flux](/influxdb/v2.7/query-data/get-started/query-influxdb/#2-specify-a-time-range).
|
||||
|
||||
Related entries: [point](#point), [unix timestamp](#unix-timestamp), [RFC3339 timestamp](#rfc3339-timestamp)
|
||||
Related entries: [point](#point), [precision](#precision), [RFC3339 timestamp](#rfc3339-timestamp), [unix timestamp](#unix-timestamp),
|
||||
|
||||
### token
|
||||
|
||||
|
|
Loading…
Reference in New Issue