fix(v3): Update Cloud Dedicated and Clustered column limit to 1000 and clarify the potential impact of wide schemas.

pull/5573/head
Jason Stirnaman 2024-08-20 14:21:39 -05:00
parent 78b99259bc
commit 7f8bde4abf
11 changed files with 380 additions and 138 deletions

View File

@ -11,6 +11,14 @@ menu:
parent: Administer InfluxDB Cloud
weight: 101
influxdb/cloud-dedicated/tags: [databases]
related:
- /influxdb/cloud-dedicated/write-data/best-practices/schema-design/
- /influxdb/cloud-dedicated/reference/cli/influxctl/
alt_links:
cloud: /influxdb/cloud/admin/buckets/
cloud_serverless: /influxdb/cloud-serverless/admin/buckets/
clustered: /influxdb/clustered/admin/databases/
oss: /influxdb/v2/admin/buckets/
---
An InfluxDB database is a named location where time series data is stored.
@ -19,11 +27,13 @@ Each InfluxDB database has a [retention period](#retention-periods).
{{% note %}}
**If coming from InfluxDB v1**, the concepts of databases and retention policies
have been combined into a single concept--database. Retention policies are no
longer part of the InfluxDB data model. However, InfluxDB Cloud Dedicated does
longer part of the InfluxDB data model.
However, {{% product-name %}} does
support InfluxQL, which requires databases and retention policies.
See [InfluxQL DBRP naming convention](/influxdb/cloud-dedicated/admin/databases/create/#influxql-dbrp-naming-convention).
**If coming from InfluxDB v2 or InfluxDB Cloud**, _database_ and _bucket_ are synonymous.
**If coming from InfluxDB v2, InfluxDB Cloud (TSM), or InfluxDB Cloud Serverless**,
_database_ and _bucket_ are synonymous.
{{% /note %}}
## Retention periods
@ -40,9 +50,10 @@ never be removed by the retention enforcement service.
## Table and column limits
In {{< product-name >}}, table (measurement) and column limits can be
customized when [creating](#create-a-database) or
[updating a database](#update-a-database).
You can customize [table (measurement) limits](#table-limit) and
[table column limits](#column-limit) when you
[create](#create-a-database) or
[update a database](#update-a-database) in {{< product-name >}}.
### Table limit
@ -72,7 +83,7 @@ data by measurement and time range and stores each partition as a Parquet
file in your cluster's object store. By increasing the number of measurements
(tables) you can store in your database, you also increase the potential for
more `PUT` requests into your object store as InfluxDB creates more partitions.
Each `PUT` request incurs a monetary cost and will increase the operating cost of
Each `PUT` request incurs a monetary cost and increases the operating cost of
your cluster.
{{% /expand %}}
@ -89,22 +100,33 @@ operating cost of your cluster.
### Column limit
**Default maximum number of columns**: 250
**Default maximum number of columns**: 1000
A table can contain **up to 1000 columns**.
Each row must include a time column, with the remaining columns representing
tags and fields.
As a result, a table can have one time column and up to 999 field and tag columns.
When creating or updating a database, you can configure the table column limit to be
lower than 1000, based on your requirements.
After you update the column limit for a database, the limit applies to newly
created tables; it doesn't override the column limit for existing tables.
If you attempt to write to a table and exceed the column limit, the write
request fails and InfluxDB returns an error.
Time, fields, and tags are each represented by a column in a table.
Increasing your column limit affects your {{% product-name omit=" Clustered" %}}
cluster in the following ways:
{{< expand-wrapper >}}
{{% expand "May adversely affect query performance" %}}
{{% expand "May adversely affect system performance" %}}
At query time, the InfluxDB query engine identifies what table contains the queried
data and then evaluates each row in the table to match the conditions of the query.
The more columns that are in each row, the longer it takes to evaluate each row.
Through performance testing, InfluxData has identified 250 columns as the
threshold where query performance may be affected
(depending on the shape of and data types in your schema).
InfluxData identified 1000 columns as the safe limit for maintaining system
performance and stability.
Exceeding this threshold can result in
[wide schemas](/influxdb/cloud-dedicated/write-data/best-practices/schema-design/#avoid-wide-schemas),
which can negatively impact performance and resource use,
depending on the shape of your schema and data types in the schema.
{{% /expand %}}
{{< /expand-wrapper >}}

View File

@ -15,6 +15,7 @@ related:
- /influxdb/cloud-dedicated/query-data/execute-queries/analyze-query-plan/
aliases:
- /influxdb/cloud-dedicated/query-data/execute-queries/optimize-queries/
- /influxdb/cloud-dedicated/query-data/execute-queries/analyze-query-plan/
---
Optimize SQL and InfluxQL queries to improve performance and reduce their memory and compute (CPU) requirements.
@ -22,6 +23,7 @@ Learn how to use observability tools to analyze query execution and view metrics
- [Why is my query slow?](#why-is-my-query-slow)
- [Strategies for improving query performance](#strategies-for-improving-query-performance)
- [Query only the data you need](#query-only-the-data-you-need)
- [Analyze and troubleshoot queries](#analyze-and-troubleshoot-queries)
## Why is my query slow?
@ -29,7 +31,7 @@ Learn how to use observability tools to analyze query execution and view metrics
Query performance depends on time range and complexity.
If a query is slower than you expect, it might be due to the following reasons:
- It queries data from a large time range.
- It queries data from a large time range.
- It includes intensive operations, such as querying many string values or `ORDER BY` sorting or re-sorting large amounts of data.
## Strategies for improving query performance
@ -37,9 +39,7 @@ If a query is slower than you expect, it might be due to the following reasons:
The following design strategies generally improve query performance and resource use:
- Follow [schema design best practices](/influxdb/cloud-dedicated/write-data/best-practices/schema-design/) to make querying easier and more performant.
- Query only the data you need--for example, include a [`WHERE` clause](/influxdb/cloud-dedicated/reference/sql/where/) that filters data by a time range.
InfluxDB v3 stores data in a Parquet file for each measurement and day, and retrieves files from the Object store to answer a query.
The smaller the time range in your query, the fewer files InfluxDB needs to retrieve from the Object store.
- [Query only the data you need](#query-only-the-data-you-need).
- [Downsample data](/influxdb/cloud-dedicated/process-data/downsample/) to reduce the amount of data you need to query.
Some bottlenecks may be out of your control and are the result of a suboptimal execution plan, such as:
@ -52,9 +52,39 @@ Some bottlenecks may be out of your control and are the result of a suboptimal e
{{% note %}}
#### Analyze query plans to view metrics and recognize bottlenecks
To view runtime metrics for a query, such as the number of files scanned, use the [`EXPLAIN ANALYZE` keywords](/influxdb/cloud-dedicated/reference/sql/explain/#explain-analyze) and learn how to [analyze a query plan](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/analyze-query-plan/).
To view runtime metrics for a query, such as the number of files scanned, use
the [`EXPLAIN ANALYZE` keywords](/influxdb/cloud-dedicated/reference/sql/explain/#explain-analyze)
and learn how to [analyze a query plan](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/analyze-query-plan/).
{{% /note %}}
### Query only the data you need
#### Include a WHERE clause
InfluxDB v3 stores data in a Parquet file for each measurement and day, and
retrieves files from the Object store to answer a query.
To reduce the number of files that a query needs to retrieve from the Object store,
include a [`WHERE` clause](/influxdb/cloud-dedicated/reference/sql/where/) that
filters data by a time range.
#### SELECT only columns you need
Because InfluxDB v3 is a columnar database, it only processes the columns
selected in a query, which can mitigate the query performance impact of
[wide schemas](/influxdb/cloud-dedicated/write-data/best-practices/schema-design/#avoid-wide-schemas).
However, a non-specific query that retrieves a large number of columns from a
wide schema can be slower and less efficient than a more targeted
query--for example, consider the following queries:
- `SELECT time,a,b,c`
- `SELECT *`
If the table contains 10 columns, the difference in performance between the
two queries is minimal.
In a table with over 1000 columns, the `SELECT *` query is slower and
less efficient.
## Analyze and troubleshoot queries
Use the following tools to analyze and troubleshoot queries and find performance bottlenecks:

View File

@ -104,7 +104,7 @@ influxctl database create [flags] <DATABASE_NAME>
| :--- | :---------------------- | :--------------------------------------------------------------------------------------------------------------------------------------- |
| | `--retention-period` | Database retention period (default is `0s`, infinite) |
| | `--max-tables` | Maximum tables per database (default is 500, `0` uses default) |
| | `--max-columns` | Maximum columns per table (default is 250, `0` uses default) |
| | `--max-columns` | Maximum columns per table (default is 1000, `0` uses default) |
| | `--template-tag` | Tag to add to partition template (can include multiple of this flag) |
| | `--template-tag-bucket` | Tag and number of buckets to partition tag values into separated by a comma--for example: `tag1,100` (can include multiple of this flag) |
| | `--template-timeformat` | Timestamp format for partition template (default is `%Y-%m-%d`) |

View File

@ -8,6 +8,10 @@ menu:
name: Schema design
weight: 201
parent: write-best-practices
related:
- /influxdb/cloud-dedicated/admin/databases/
- /influxdb/cloud-dedicated/reference/cli/influxctl/
- /influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/
---
Use the following guidelines to design your [schema](/influxdb/cloud-dedicated/reference/glossary/#schema)
@ -18,7 +22,7 @@ for simpler and more performant queries.
- [Tags versus fields](#tags-versus-fields)
- [Schema restrictions](#schema-restrictions)
- [Do not use duplicate names for tags and fields](#do-not-use-duplicate-names-for-tags-and-fields)
- [Tables can contain up to 250 columns](#tables-can-contain-up-to-250-columns)
- [Maximum number of columns per table](#maximum-number-of-columns-per-table)
- [Design for performance](#design-for-performance)
- [Avoid wide schemas](#avoid-wide-schemas)
- [Avoid sparse schemas](#avoid-sparse-schemas)
@ -37,10 +41,13 @@ Tables contain multiple tags and fields.
<!-- vale InfluxDataDocs.v3Schema = NO -->
- **Database**: A named location where time series data is stored.
In {{% product-name %}}, _database_ is synonymous with _bucket_ in InfluxDB Cloud Serverless and InfluxDB TSM implementations.
In {{% product-name %}}, _database_ is synonymous with _bucket_ in InfluxDB
Cloud Serverless and InfluxDB TSM implementations.
A database can contain multiple _tables_.
- **Table**: A logical grouping for time series data.
In {{% product-name %}}, _table_ is synonymous with _measurement_ in InfluxDB Cloud Serverless and InfluxDB TSM implementations.
In {{% product-name %}}, _table_ is synonymous with _measurement_ in
InfluxDB Cloud Serverless and InfluxDB TSM implementations.
All _points_ in a given table should have the same _tags_.
A table contains multiple _tags_ and _fields_.
- **Tags**: Key-value pairs that store metadata string values for each point--for example,
@ -52,7 +59,9 @@ Tables contain multiple tags and fields.
Field values may be null, but at least one field value is not null on any given row.
- **Timestamp**: Timestamp associated with the data.
When stored on disk and queried, all data is ordered by time.
In InfluxDB, a timestamp is a nanosecond-scale [Unix timestamp](/influxdb/cloud-dedicated/reference/glossary/#unix-timestamp) in UTC.
In InfluxDB, a timestamp is a nanosecond-scale
[Unix timestamp](/influxdb/cloud-dedicated/reference/glossary/#unix-timestamp)
in UTC.
A timestamp is never null.
{{% note %}}
@ -91,8 +100,9 @@ question as you design your schema.
- String
- Boolean
{{% product-name %}} doesn't index tag values or field values.
Tag keys, field keys, and other metadata are indexed to optimize performance.
{{% product-name %}} indexes tag keys, field keys, and other metadata
to optimize performance.
It doesn't index tag values or field values.
{{% note %}}
The InfluxDB v3 storage engine supports infinite tag value and series cardinality.
@ -106,26 +116,37 @@ cardinality doesn't affect the overall performance of your database.
### Do not use duplicate names for tags and fields
Tags and fields within the same table can't be named the same.
All tags and fields are stored as unique columns in a table representing the
table on disk.
Use unique names for tags and fields within the same table.
{{% product-name %}} stores tags and fields as unique columns in a table that
represents the table on disk.
If you attempt to write a table that contains tags or fields with the same name,
the write fails due to a column conflict.
### Tables can contain up to 250 columns
### Maximum number of columns per table
A table can contain **up to 250 columns**. Each row requires a time column,
but the rest represent tags and fields stored in the table.
Therefore, a table can contain one time column and 249 total field and tag columns.
If you attempt to write to a table and exceed the 250 column limit, the
write request fails and InfluxDB returns an error.
A table has a [maximum number of columns](/influxdb/cloud-dedicated/admin/databases/#column-limit).
Each row must include a time column.
As a result, a table can have the following:
- a time column
- field and tag columns up to the configured maximum.
If you attempt to write to a table and exceed the column limit, then the write
request fails and InfluxDB returns an error.
InfluxData identified 1000 columns as the safe limit for maintaining system
performance and stability.
Exceeding this threshold can result in
[wide schemas](#avoid-wide-schemas), which can negatively impact performance
and resource use, depending on the shape and data types in your schema.
---
## Design for performance
How you structure your schema within a table can affect the overall
performance of queries against that table.
How you structure your schema within a table can affect resource use and
the performance of queries against that table.
The following guidelines help to optimize query performance:
- [Avoid wide schemas](#avoid-wide-schemas)
@ -135,26 +156,45 @@ The following guidelines help to optimize query performance:
### Avoid wide schemas
A wide schema is one with many tags and fields and corresponding columns for each.
With the InfluxDB v3 storage engine, wide schemas don't impact query execution performance.
Because InfluxDB v3 is a columnar database, it executes queries only against columns selected in the query.
A wide schema refers to a schema with a large number of columns (tags and fields).
Although a wide schema won't affect query performance, it can lead to the following:
Wide schemas can lead to the following issues:
- More resources required for persisting and compacting data during ingestion.
- Decreased sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
- Increased resource usage for persisting and compacting data during ingestion.
- Reduced sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
- Reduced query performance when [using non-specific queries](#avoid-non-specific-queries).
The InfluxDB v3 storage engine has a
[limit of 250 columns per table](#tables-can-contain-up-to-250-columns).
To prevent wide schema issues, limit the number of tags and fields stored in a table.
If you need to store more than the [maximum number of columns](/influxdb/cloud-dedicated/admin/databases/),
consider segmenting your fields into separate tables.
To avoid a wide schema, limit the number of tags and fields stored in a table.
If you need to store more than 249 total tags and fields, consider segmenting
your fields into a separate table.
#### Avoid non-specific queries
Because InfluxDB v3 is a columnar database, it only processes the columns
selected in a query, which can mitigate the query performance impact of wide schemas.
If you [query only the data that you need](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/optimize-queries/#strategies-for-improving-query-performance),
then a wide schema might not impact query performance.
However, a non-specific query that retrieves a large number of columns from a
wide schema
is slower and less efficient than a more targeted query--for example, consider
the following queries:
- `SELECT time,a,b,c`
- `SELECT *`
If the table contains 10 columns, the difference in performance between the
two queries is minimal.
In a table with over 1000 columns, the `SELECT *` query is slower and
less efficient.
#### Avoid too many tags
In InfluxDB, the primary key for a row is the combination of the point's timestamp and _tag set_ - the collection of [tag keys](/influxdb/cloud-dedicated/reference/glossary/#tag-key) and [tag values](/influxdb/cloud-dedicated/reference/glossary/#tag-value) on the point.
A point that contains more tags has a more complex primary key, which could impact sorting performance if you sort using all parts of the key.
In InfluxDB, the primary key for a row is the combination of the point's
timestamp and _tag set_ - the collection of [tag keys](/influxdb/cloud-dedicated/reference/glossary/#tag-key)
and [tag values](/influxdb/cloud-dedicated/reference/glossary/#tag-value) on the point.
A point that contains more tags has a more complex primary key, which could
impact sorting performance if you sort using all parts of the key.
### Avoid sparse schemas
@ -275,7 +315,8 @@ Without regular expressions, your queries will be easier to write and more perfo
#### Not recommended {.orange}
For example, consider the following [line protocol](/influxdb/cloud-dedicated/reference/syntax/line-protocol/) that embeds multiple attributes (location, model, and ID) into a `sensor` tag value:
For example, consider the following [line protocol](/influxdb/cloud-dedicated/reference/syntax/line-protocol/)
that embeds multiple attributes (location, model, and ID) into a `sensor` tag value:
```text
home,sensor=loc-kitchen.model-A612.id-1726ZA temp=72.1

View File

@ -12,8 +12,12 @@ weight: 105
influxdb/cloud-serverless/tags: [buckets]
aliases:
- /influxdb/cloud-serverless/organizations/buckets/
- /influxdb/cloud-serverless/admin/databases/
alt_links:
cloud: /influxdb/cloud/admin/buckets/
cloud_dedicated: /influxdb/cloud-dedicated/admin/databases/
clustered: /influxdb/clustered/admin/databases/
oss: /influxdb/v2/admin/buckets/
---
A **bucket** is a named location where time series data is stored.
@ -30,6 +34,8 @@ support InfluxQL and the InfluxDB v1 API `/write` and `/query` endpoints, which
See how to [map v1 databases and retention policies to buckets](/influxdb/cloud-serverless/guides/api-compatibility/v1/#map-v1-databases-and-retention-policies-to-buckets).
**If coming from InfluxDB v2 or InfluxDB Cloud**, _buckets_ are functionally equivalent.
**If coming from InfluxDB Cloud Dedicated or InfluxDB Clustered**, _database_ and _bucket_ are synonymous.
{{% /note %}}
## Retention period

View File

@ -36,11 +36,9 @@ If a query is slower than you expect, it might be due to the following reasons:
The following design strategies generally improve query performance and resource use:
- Follow [schema design best practices](/influxdb/cloud-serverless/write-data/best-practices/schema-design/) to make querying easier and more performant.
- Query only the data you need--for example, include a [`WHERE` clause](/influxdb/cloud-serverless/reference/sql/where/) that filters data by a time range.
InfluxDB v3 stores data in a Parquet file for each measurement and day, and retrieves files from the Object store to answer a query.
The smaller the time range in your query, the fewer files InfluxDB needs to retrieve from the Object store.
- [Downsample data](/influxdb/cloud-serverless/process-data/downsample/) to reduce the amount of data you need to query.
- Follow [schema design best practices](/influxdb/cloud-dedicated/write-data/best-practices/schema-design/) to make querying easier and more performant.
- [Query only the data you need](#query-only-the-data-you-need).
- [Downsample data](/influxdb/cloud-dedicated/process-data/downsample/) to reduce the amount of data you need to query.
Some bottlenecks may be out of your control and are the result of a suboptimal execution plan, such as:
@ -55,6 +53,34 @@ Some bottlenecks may be out of your control and are the result of a suboptimal e
To view runtime metrics for a query, such as the number of files scanned, use the [`EXPLAIN ANALYZE` keywords](/influxdb/cloud-serverless/reference/sql/explain/#explain-analyze) and learn how to [analyze a query plan](/influxdb/cloud-serverless/query-data/troubleshoot-and-optimize/analyze-query-plan/).
{{% /note %}}
### Query only the data you need
#### Include a WHERE clause
InfluxDB v3 stores data in a Parquet file for each measurement and day, and
retrieves files from the Object store to answer a query.
To reduce the number of files that a query needs to retrieve from the Object store,
include a [`WHERE` clause](/influxdb/cloud-dedicated/reference/sql/where/) that
filters data by a time range.
#### SELECT only columns you need
Because InfluxDB v3 is a columnar database, it only processes the columns
selected in a query, which can mitigate the query performance impact of
[wide schemas](/influxdb/cloud-serverless/write-data/best-practices/schema-design/#avoid-wide-schemas).
However, a non-specific query that retrieves a large number of columns from a
wide schema can be slower and less efficient than a more targeted
query--for example, consider the following queries:
- `SELECT time,a,b,c`
- `SELECT *`
If the table contains 10 columns, the difference in performance between the
two queries is minimal.
In a table with over 1000 columns, the `SELECT *` query is slower and
less efficient.
## Analyze and troubleshoot queries
Use the following tools to analyze and troubleshoot queries and find performance bottlenecks:

View File

@ -8,25 +8,25 @@ menu:
name: Schema design
weight: 201
parent: write-best-practices
related:
- /influxdb/cloud-serverless/admin/buckets/
- /influxdb/cloud-serverless/query-data/troubleshoot-and-optimize/
---
Use the following guidelines to design your [schema](/influxdb/cloud-serverless/reference/glossary/#schema)
for simpler and more performant queries.
<!-- TOC -->
- [InfluxDB data structure](#influxdb-data-structure)
- [Primary keys](#primary-keys)
- [Tags versus fields](#tags-versus-fields)
- [Schema restrictions](#schema-restrictions)
- [Do not use duplicate names for tags and fields](#do-not-use-duplicate-names-for-tags-and-fields)
- [Measurements can contain up to 200 columns](#measurements-can-contain-up-to-200-columns)
- [Maximum number of columns per measurement](#maximum-number-of-columns-per-measurement)
- [Design for performance](#design-for-performance)
- [Avoid wide schemas](#avoid-wide-schemas)
- [Avoid too many tags](#avoid-too-many-tags)
- [Avoid sparse schemas](#avoid-sparse-schemas)
- [Writing individual fields with different timestamps](#writing-individual-fields-with-different-timestamps)
- [Measurement schemas should be homogenous](#measurement-schemas-should-be-homogenous)
- [Use the best data type for your data](#use-the-best-data-type-for-your-data)
- [Design for query simplicity](#design-for-query-simplicity)
- [Keep measurement names, tags, and fields simple](#keep-measurement-names-tags-and-fields-simple)
- [Avoid keywords and special characters](#avoid-keywords-and-special-characters)
@ -55,7 +55,7 @@ tags and fields.
Field values may be null, but at least one field value is not null on any given row.
- **Timestamp**: Timestamp associated with the data.
When stored on disk and queried, all data is ordered by time.
In InfluxDB, a timestamp is a nanosecond-scale [unix timestamp](#unix-timestamp) in UTC.
In InfluxDB, a timestamp is a nanosecond-scale [Unix timestamp](#unix-timestamp) in UTC.
A timestamp is never null.
### Primary keys
@ -80,13 +80,14 @@ question as you design your schema.
- String
- Boolean
{{% product-name %}} doesn't index tag values or field values.
Tag keys, field keys, and other metadata are indexed to optimize performance.
{{% product-name %}} indexes tag keys, field keys, and other metadata
to optimize performance.
It doesn't index tag values or field values.
{{% note %}}
The InfluxDB v3 storage engine supports infinite tag value and series cardinality.
Unlike InfluxDB backed by the TSM storage engine, **tag value**
cardinality doesn't affect the overall performance of your database.
cardinality doesn't affect the overall performance of your bucket.
{{% /note %}}
---
@ -95,19 +96,23 @@ cardinality doesn't affect the overall performance of your database.
### Do not use duplicate names for tags and fields
Tags and fields within the same measurement can't be named the same.
All tags and fields are stored as unique columns in a table representing the
measurement on disk.
Use unique names for tags and fields within the same measurement.
{{% product-name %}} stores tags and fields as unique columns in a measurement that
represents the measurement on disk.
If you attempt to write a measurement that contains tags or fields with the same name,
the write fails due to a column conflict.
### Measurements can contain up to 200 columns
### Maximum number of columns per measurement
A measurement can contain **up to 200 columns**. Each row requires a time column,
but the rest represent tags and fields stored in the measurement.
Therefore, a measurement can contain one time column and 199 total field and tag columns.
If you attempt to write to a measurement and exceed the 200 column limit, the
write request fails and InfluxDB returns an error.
A measurement has a [maximum number of columns](/influxdb/cloud-serverless/admin/buckets/#column-limit).
Each row must include a time column.
As a result, a measurement can have the following:
- a time column
- field and tag columns up to the maximum number of columns.
If you attempt to write to a measurement and exceed the column limit, then the write
request fails and InfluxDB returns an error.
---
@ -124,21 +129,37 @@ The following guidelines help to optimize query performance:
### Avoid wide schemas
A wide schema is one with many tags and fields and corresponding columns for each.
With the InfluxDB v3 storage engine, wide schemas don't impact query execution performance.
Because InfluxDB v3 is a columnar database, it executes queries only against columns selected in the query.
A wide schema refers to a schema with a large number of columns (tags and fields).
Although a wide schema won't affect query performance, it can lead to the following:
Wide schemas can lead to the following issues:
- More resources required for persisting and compacting data during ingestion.
- Decreased sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
- Increased resource usage for persisting and compacting data during ingestion.
- Reduced sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
- Reduced query performance when [using non-specific queries](#avoid-non-specific-queries).
The InfluxDB v3 storage engine has a
[limit of 200 columns per measurement](#measurements-can-contain-up-to-200-columns).
To prevent wide schema issues, limit the number of tags and fields stored in a measurement.
If you need to store more than the [maximum number of columns](/influxdb/cloud-serverless/admin/buckets/),
consider segmenting your fields into separate measurements.
To avoid a wide schema, limit the number of tags and fields stored in a measurement.
If you need to store more than 199 total tags and fields, consider segmenting
your fields into a separate measurement.
#### Avoid non-specific queries
Because InfluxDB v3 is a columnar database, it only processes the columns
selected in a query, which can mitigate the query performance impact of wide schemas.
If you [query only the data that you need](/influxdb/cloud-serverless/query-data/troubleshoot-and-optimize/optimize-queries/#strategies-for-improving-query-performance),
then a wide schema might not impact query performance.
However, a non-specific query that retrieves a large number of columns from a
wide schema
is slower and less efficient than a more targeted query--for example, consider
the following queries:
- `SELECT time,a,b,c`
- `SELECT *`
If the measurement contains 10 columns, the difference in performance between the
two queries is minimal.
In a measurement with over 1000 columns, the `SELECT *` query is slower and
less efficient.
#### Avoid too many tags
@ -225,6 +246,12 @@ full of null values (also known as a _sparse schema_):
{{% /expand %}}
{{< /expand-wrapper >}}
### Use the best data type for your data
When writing data to a field, use the most appropriate [data type](/influxdb/cloud-serverless/reference/glossary/#data-type) for your data--write integers as integers, decimals as floats, and booleans as booleans.
A query against a field that stores integers outperforms a query against string data;
querying over many long string values can negatively affect performance.
## Design for query simplicity
Naming conventions for measurements, tag keys, and field keys can simplify or

View File

@ -11,6 +11,14 @@ menu:
parent: Administer InfluxDB Clustered
weight: 103
influxdb/clustered/tags: [databases]
related:
- /influxdb/clustered/write-data/best-practices/schema-design/
- /influxdb/clustered/reference/cli/influxctl/
alt_links:
cloud: /influxdb/cloud/admin/buckets/
cloud_dedicated: /influxdb/cloud-dedicated/admin/databases/
cloud_serverless: /influxdb/cloud-serverless/admin/buckets/
oss: /influxdb/v2/admin/buckets/
---
An InfluxDB database is a named location where time series data is stored.
@ -19,7 +27,7 @@ Each InfluxDB database has a [retention period](#retention-periods).
{{% note %}}
**If coming from InfluxDB v1**, the concepts of databases and retention policies
have been combined into a single concept--database. Retention policies are no
longer part of the InfluxDB data model. However, InfluxDB Clustered does
longer part of the InfluxDB data model. However, {{% product-name %}} does
support InfluxQL, which requires databases and retention policies.
See [InfluxQL DBRP naming convention](/influxdb/clustered/admin/databases/create/#influxql-dbrp-naming-convention).
@ -41,9 +49,10 @@ never be removed by the retention enforcement service.
## Table and column limits
In {{< product-name >}}, table (measurement) and column limits can be
custom configured when [creating](#create-a-database) or
[updating a database](#update-a-database).
You can customize [table (measurement) limits](#table-limit) and
[table column limits](#column-limit) when you
[create](#create-a-database) or
[update a database](#update-a-database) in {{< product-name >}}.
### Table limit
@ -90,22 +99,33 @@ operating cost of your cluster.
### Column limit
**Default maximum number of columns**: 250
**Default maximum number of columns**: 1000
A table can contain **up to 1000 columns**.
Each row must include a time column, with the remaining columns representing
tags and fields.
As a result, a table can have one time column and up to 999 field and tag columns.
When creating or updating a database, you can configure the column limit to be
lower than 1000, based on your requirements.
After you update the column limit for a database, the limit applies to newly
created tables; doesn't override the column limit for existing tables.
If you attempt to write to a table and exceed the column limit, the write
request fails and InfluxDB returns an error.
Time, fields, and tags are each represented by a column in a table.
Increasing your column limit affects your {{% product-name omit=" Clustered" %}}
cluster in the following ways:
{{< expand-wrapper >}}
{{% expand "May adversely affect query performance" %}}
{{% expand "May adversely affect system performance" %}}
At query time, the InfluxDB query engine identifies what table contains the queried
data and then evaluates each row in the table to match the conditions of the query.
The more columns that are in each row, the longer it takes to evaluate each row.
Through performance testing, InfluxData has identified 250 columns as the
threshold beyond which query performance may be affected
(depending on the shape of and data types in your schema).
InfluxData identified 1000 columns as the safe limit for maintaining system
performance and stability.
Exceeding this threshold can result in
[wide schemas](/influxdb/cloud-dedicated/write-data/best-practices/schema-design/#avoid-wide-schemas),
which can negatively impact performance and resource use,
depending on the shape of your schema and data types in the schema.
{{% /expand %}}
{{< /expand-wrapper >}}

View File

@ -12,6 +12,7 @@ influxdb/clustered/tags: [query, performance, observability, errors, sql, influx
related:
- /influxdb/clustered/query-data/sql/
- /influxdb/clustered/query-data/influxql/
- /influxdb/clustered/query-data/execute-queries/analyze-query-plan/
aliases:
- /influxdb/clustered/query-data/execute-queries/optimize-queries/
- /influxdb/clustered/query-data/execute-queries/analyze-query-plan/
@ -22,6 +23,7 @@ Learn how to use observability tools to analyze query execution and view metrics
- [Why is my query slow?](#why-is-my-query-slow)
- [Strategies for improving query performance](#strategies-for-improving-query-performance)
- [Query only the data you need](#query-only-the-data-you-need)
- [Analyze and troubleshoot queries](#analyze-and-troubleshoot-queries)
## Why is my query slow?
@ -37,10 +39,7 @@ If a query is slower than you expect, it might be due to the following reasons:
The following design strategies generally improve query performance and resource use:
- Follow [schema design best practices](/influxdb/clustered/write-data/best-practices/schema-design/) to make querying easier and more performant.
- Query only the data you need--for example, include a [`WHERE` clause](/influxdb/clustered/reference/sql/where/) that filters data by a time range.
InfluxDB v3 stores data in a Parquet file for each measurement and day, and retrieves files from the Object store to answer a query.
The smaller the time range in your query, the fewer files InfluxDB needs to retrieve from the Object store.
- [Query only the data you need](#query-only-the-data-you-need).
- [Downsample data](/influxdb/clustered/process-data/downsample/) to reduce the amount of data you need to query.
Some bottlenecks may be out of your control and are the result of a suboptimal execution plan, such as:
@ -53,9 +52,39 @@ Some bottlenecks may be out of your control and are the result of a suboptimal e
{{% note %}}
#### Analyze query plans to view metrics and recognize bottlenecks
To view runtime metrics for a query, such as the number of files scanned, use the [`EXPLAIN ANALYZE` keywords](/influxdb/clustered/reference/sql/explain/#explain-analyze) and learn how to [analyze a query plan](/influxdb/clustered/query-data/troubleshoot-and-optimize/analyze-query-plan/).
To view runtime metrics for a query, such as the number of files scanned, use
the [`EXPLAIN ANALYZE` keywords](/influxdb/clustered/reference/sql/explain/#explain-analyze)
and learn how to [analyze a query plan](/influxdb/clustered/query-data/troubleshoot-and-optimize/analyze-query-plan/).
{{% /note %}}
### Query only the data you need
#### Include a WHERE clause
InfluxDB v3 stores data in a Parquet file for each measurement and day, and
retrieves files from the Object store to answer a query.
To reduce the number of files that a query needs to retrieve from the Object store,
include a [`WHERE` clause](/influxdb/clustered/reference/sql/where/) that
filters data by a time range.
#### SELECT only columns you need
Because InfluxDB v3 is a columnar database, it only processes the columns
selected in a query, which can mitigate the query performance impact of
[wide schemas](/influxdb/clustered/write-data/best-practices/schema-design/#avoid-wide-schemas).
However, a non-specific query that retrieves a large number of columns from a
wide schema can be slower and less efficient than a more targeted
query--for example, consider the following queries:
- `SELECT time,a,b,c`
- `SELECT *`
If the table contains 10 columns, the difference in performance between the
two queries is minimal.
In a table with over 1000 columns, the `SELECT *` query is slower and
less efficient.
## Analyze and troubleshoot queries
Learn how to [analyze a query plan](/influxdb/clustered/query-data/troubleshoot-and-optimize/analyze-query-plan/)

View File

@ -103,7 +103,7 @@ influxctl database create [flags] <DATABASE_NAME>
| :--- | :---------------------- | :--------------------------------------------------------------------------------------------------------------------------------------- |
| | `--retention-period` | Database retention period (default is `0s`, infinite) |
| | `--max-tables` | Maximum tables per database (default is 500, `0` uses default) |
| | `--max-columns` | Maximum columns per table (default is 250, `0` uses default) |
| | `--max-columns` | Maximum columns per table (default is 1000, `0` uses default) |
| | `--template-tag` | Tag to add to partition template (can include multiple of this flag) |
| | `--template-tag-bucket` | Tag and number of buckets to partition tag values into separated by a comma--for example: `tag1,100` (can include multiple of this flag) |
| | `--template-timeformat` | Timestamp format for partition template (default is `%Y-%m-%d`) |

View File

@ -8,6 +8,10 @@ menu:
name: Schema design
weight: 201
parent: write-best-practices
related:
- /influxdb/clustered/admin/databases/
- /influxdb/clustered/reference/cli/influxctl/
- /influxdb/clustered/query-data/troubleshoot-and-optimize/
---
Use the following guidelines to design your [schema](/influxdb/clustered/reference/glossary/#schema)
@ -18,7 +22,7 @@ for simpler and more performant queries.
- [Tags versus fields](#tags-versus-fields)
- [Schema restrictions](#schema-restrictions)
- [Do not use duplicate names for tags and fields](#do-not-use-duplicate-names-for-tags-and-fields)
- [Tables can contain up to 250 columns](#tables-can-contain-up-to-250-columns)
- [Maximum number of columns per table](#maximum-number-of-columns-per-table)
- [Design for performance](#design-for-performance)
- [Avoid wide schemas](#avoid-wide-schemas)
- [Avoid sparse schemas](#avoid-sparse-schemas)
@ -37,10 +41,13 @@ Tables contain multiple tags and fields.
<!-- vale InfluxDataDocs.v3Schema = NO -->
- **Database**: A named location where time series data is stored.
In {{% product-name %}}, _database_ is synonymous with _bucket_ in InfluxDB Cloud Serverless and InfluxDB TSM implementations.
In {{% product-name %}}, _database_ is synonymous with _bucket_ in InfluxDB
Cloud Serverless and InfluxDB TSM implementations.
A database can contain multiple _tables_.
- **Table**: A logical grouping for time series data.
In {{% product-name %}}, _table_ is synonymous with _measurement_ in InfluxDB Cloud Serverless and InfluxDB TSM implementations.
In {{% product-name %}}, _table_ is synonymous with _measurement_ in
InfluxDB Cloud Serverless and InfluxDB TSM implementations.
All _points_ in a given table should have the same _tags_.
A table contains multiple _tags_ and _fields_.
- **Tags**: Key-value pairs that store metadata string values for each point--for example,
@ -52,7 +59,9 @@ Tables contain multiple tags and fields.
Field values may be null, but at least one field value is not null on any given row.
- **Timestamp**: Timestamp associated with the data.
When stored on disk and queried, all data is ordered by time.
In InfluxDB, a timestamp is a nanosecond-scale [unix timestamp](/influxdb/clustered/reference/glossary/#unix-timestamp) in UTC.
In InfluxDB, a timestamp is a nanosecond-scale
[Unix timestamp](/influxdb/clustered/reference/glossary/#unix-timestamp)
in UTC.
A timestamp is never null.
{{% note %}}
@ -91,8 +100,9 @@ question as you design your schema.
- String
- Boolean
{{% product-name %}} doesn't index tag values or field values.
Tag keys, field keys, and other metadata are indexed to optimize performance.
{{% product-name %}} indexes tag keys, field keys, and other metadata
to optimize performance.
It doesn't index tag values or field values.
{{% note %}}
The InfluxDB v3 storage engine supports infinite tag value and series cardinality.
@ -106,26 +116,37 @@ cardinality doesn't affect the overall performance of your database.
### Do not use duplicate names for tags and fields
Tags and fields within the same table can't be named the same.
All tags and fields are stored as unique columns in a table representing the
table on disk.
Use unique names for tags and fields within the same table.
{{% product-name %}} stores tags and fields as unique columns in a table that
represents the table on disk.
If you attempt to write a table that contains tags or fields with the same name,
the write fails due to a column conflict.
### Tables can contain up to 250 columns
### Maximum number of columns per table
A table can contain **up to 250 columns**. Each row requires a time column,
but the rest represent tags and fields stored in the table.
Therefore, a table can contain one time column and 249 total field and tag columns.
If you attempt to write to a table and exceed the 250 column limit, the
write request fails and InfluxDB returns an error.
A table has a [maximum number of columns](/influxdb/clustered/admin/databases/#column-limit).
Each row must include a time column.
As a result, a table can have the following:
- a time column
- field and tag columns up to the configured maximum.
If you attempt to write to a table and exceed the column limit, then the write
request fails and InfluxDB returns an error.
InfluxData identified 1000 columns as the safe limit for maintaining system
performance and stability.
Exceeding this threshold can result in
[wide schemas](#avoid-wide-schemas), which can negatively impact performance
and resource use, depending on the shape and data types in your schema.
---
## Design for performance
How you structure your schema within a table can affect the overall
performance of queries against that table.
How you structure your schema within a table can affect resource use and
the performance of queries against that table.
The following guidelines help to optimize query performance:
- [Avoid wide schemas](#avoid-wide-schemas)
@ -135,26 +156,45 @@ The following guidelines help to optimize query performance:
### Avoid wide schemas
A wide schema is one with many tags and fields and corresponding columns for each.
With the InfluxDB v3 storage engine, wide schemas don't impact query execution performance.
Because InfluxDB v3 is a columnar database, it executes queries only against columns selected in the query.
A wide schema refers to a schema with a large number of columns (tags and fields).
Although a wide schema won't affect query performance, it can lead to the following:
Wide schemas can lead to the following issues:
- More resources required for persisting and compacting data during ingestion.
- Decreased sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
- Increased resource usage for persisting and compacting data during ingestion.
- Reduced sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
- Reduced query performance when [using non-specific queries](#avoid-non-specific-queries).
The InfluxDB v3 storage engine has a
[limit of 250 columns per table](#tables-can-contain-up-to-250-columns).
To prevent wide schema issues, limit the number of tags and fields stored in a table.
If you need to store more than the [maximum number of columns](/influxdb/clustered/admin/databases/),
consider segmenting your fields into separate tables.
To avoid a wide schema, limit the number of tags and fields stored in a table.
If you need to store more than 249 total tags and fields, consider segmenting
your fields into a separate table.
#### Avoid non-specific queries
Because InfluxDB v3 is a columnar database, it only processes the columns
selected in a query, which can mitigate the query performance impact of wide schemas.
If you [query only the data that you need](/influxdb/clustered/query-data/troubleshoot-and-optimize/optimize-queries/#strategies-for-improving-query-performance),
then a wide schema might not impact query performance.
However, a non-specific query that retrieves a large number of columns from a
wide schema
is slower and less efficient than a more targeted query--for example, consider
the following queries:
- `SELECT time,a,b,c`
- `SELECT *`
If the table contains 10 columns, the difference in performance between the
two queries is minimal.
In a table with over 1000 columns, the `SELECT *` query is slower and
less efficient.
#### Avoid too many tags
In InfluxDB, the primary key for a row is the combination of the point's timestamp and _tag set_ - the collection of [tag keys](/influxdb/clustered/reference/glossary/#tag-key) and [tag values](/influxdb/clustered/reference/glossary/#tag-value) on the point.
A point that contains more tags has a more complex primary key, which could impact sorting performance if you sort using all parts of the key.
In InfluxDB, the primary key for a row is the combination of the point's
timestamp and _tag set_ - the collection of [tag keys](/influxdb/clustered/reference/glossary/#tag-key)
and [tag values](/influxdb/clustered/reference/glossary/#tag-value) on the point.
A point that contains more tags has a more complex primary key, which could
impact sorting performance if you sort using all parts of the key.
### Avoid sparse schemas
@ -275,7 +315,8 @@ Without regular expressions, your queries will be easier to write and more perfo
#### Not recommended {.orange}
For example, consider the following [line protocol](/influxdb/clustered/reference/syntax/line-protocol/) that embeds multiple attributes (location, model, and ID) into a `sensor` tag value:
For example, consider the following [line protocol](/influxdb/clustered/reference/syntax/line-protocol/)
that embeds multiple attributes (location, model, and ID) into a `sensor` tag value:
```text
home,sensor=loc-kitchen.model-A612.id-1726ZA temp=72.1