fix(v3): Update Cloud Dedicated and Clustered column limit to 1000 and clarify the potential impact of wide schemas.
parent
78b99259bc
commit
7f8bde4abf
|
@ -11,6 +11,14 @@ menu:
|
|||
parent: Administer InfluxDB Cloud
|
||||
weight: 101
|
||||
influxdb/cloud-dedicated/tags: [databases]
|
||||
related:
|
||||
- /influxdb/cloud-dedicated/write-data/best-practices/schema-design/
|
||||
- /influxdb/cloud-dedicated/reference/cli/influxctl/
|
||||
alt_links:
|
||||
cloud: /influxdb/cloud/admin/buckets/
|
||||
cloud_serverless: /influxdb/cloud-serverless/admin/buckets/
|
||||
clustered: /influxdb/clustered/admin/databases/
|
||||
oss: /influxdb/v2/admin/buckets/
|
||||
---
|
||||
|
||||
An InfluxDB database is a named location where time series data is stored.
|
||||
|
@ -19,11 +27,13 @@ Each InfluxDB database has a [retention period](#retention-periods).
|
|||
{{% note %}}
|
||||
**If coming from InfluxDB v1**, the concepts of databases and retention policies
|
||||
have been combined into a single concept--database. Retention policies are no
|
||||
longer part of the InfluxDB data model. However, InfluxDB Cloud Dedicated does
|
||||
longer part of the InfluxDB data model.
|
||||
However, {{% product-name %}} does
|
||||
support InfluxQL, which requires databases and retention policies.
|
||||
See [InfluxQL DBRP naming convention](/influxdb/cloud-dedicated/admin/databases/create/#influxql-dbrp-naming-convention).
|
||||
|
||||
**If coming from InfluxDB v2 or InfluxDB Cloud**, _database_ and _bucket_ are synonymous.
|
||||
**If coming from InfluxDB v2, InfluxDB Cloud (TSM), or InfluxDB Cloud Serverless**,
|
||||
_database_ and _bucket_ are synonymous.
|
||||
{{% /note %}}
|
||||
|
||||
## Retention periods
|
||||
|
@ -40,9 +50,10 @@ never be removed by the retention enforcement service.
|
|||
|
||||
## Table and column limits
|
||||
|
||||
In {{< product-name >}}, table (measurement) and column limits can be
|
||||
customized when [creating](#create-a-database) or
|
||||
[updating a database](#update-a-database).
|
||||
You can customize [table (measurement) limits](#table-limit) and
|
||||
[table column limits](#column-limit) when you
|
||||
[create](#create-a-database) or
|
||||
[update a database](#update-a-database) in {{< product-name >}}.
|
||||
|
||||
### Table limit
|
||||
|
||||
|
@ -72,7 +83,7 @@ data by measurement and time range and stores each partition as a Parquet
|
|||
file in your cluster's object store. By increasing the number of measurements
|
||||
(tables) you can store in your database, you also increase the potential for
|
||||
more `PUT` requests into your object store as InfluxDB creates more partitions.
|
||||
Each `PUT` request incurs a monetary cost and will increase the operating cost of
|
||||
Each `PUT` request incurs a monetary cost and increases the operating cost of
|
||||
your cluster.
|
||||
|
||||
{{% /expand %}}
|
||||
|
@ -89,22 +100,33 @@ operating cost of your cluster.
|
|||
|
||||
### Column limit
|
||||
|
||||
**Default maximum number of columns**: 250
|
||||
**Default maximum number of columns**: 1000
|
||||
|
||||
A table can contain **up to 1000 columns**.
|
||||
Each row must include a time column, with the remaining columns representing
|
||||
tags and fields.
|
||||
As a result, a table can have one time column and up to 999 field and tag columns.
|
||||
|
||||
When creating or updating a database, you can configure the table column limit to be
|
||||
lower than 1000, based on your requirements.
|
||||
After you update the column limit for a database, the limit applies to newly
|
||||
created tables; it doesn't override the column limit for existing tables.
|
||||
|
||||
If you attempt to write to a table and exceed the column limit, the write
|
||||
request fails and InfluxDB returns an error.
|
||||
|
||||
Time, fields, and tags are each represented by a column in a table.
|
||||
Increasing your column limit affects your {{% product-name omit=" Clustered" %}}
|
||||
cluster in the following ways:
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "May adversely affect query performance" %}}
|
||||
{{% expand "May adversely affect system performance" %}}
|
||||
|
||||
At query time, the InfluxDB query engine identifies what table contains the queried
|
||||
data and then evaluates each row in the table to match the conditions of the query.
|
||||
The more columns that are in each row, the longer it takes to evaluate each row.
|
||||
|
||||
Through performance testing, InfluxData has identified 250 columns as the
|
||||
threshold where query performance may be affected
|
||||
(depending on the shape of and data types in your schema).
|
||||
InfluxData identified 1000 columns as the safe limit for maintaining system
|
||||
performance and stability.
|
||||
Exceeding this threshold can result in
|
||||
[wide schemas](/influxdb/cloud-dedicated/write-data/best-practices/schema-design/#avoid-wide-schemas),
|
||||
which can negatively impact performance and resource use,
|
||||
depending on the shape of your schema and data types in the schema.
|
||||
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
|
|
@ -15,6 +15,7 @@ related:
|
|||
- /influxdb/cloud-dedicated/query-data/execute-queries/analyze-query-plan/
|
||||
aliases:
|
||||
- /influxdb/cloud-dedicated/query-data/execute-queries/optimize-queries/
|
||||
- /influxdb/cloud-dedicated/query-data/execute-queries/analyze-query-plan/
|
||||
---
|
||||
|
||||
Optimize SQL and InfluxQL queries to improve performance and reduce their memory and compute (CPU) requirements.
|
||||
|
@ -22,6 +23,7 @@ Learn how to use observability tools to analyze query execution and view metrics
|
|||
|
||||
- [Why is my query slow?](#why-is-my-query-slow)
|
||||
- [Strategies for improving query performance](#strategies-for-improving-query-performance)
|
||||
- [Query only the data you need](#query-only-the-data-you-need)
|
||||
- [Analyze and troubleshoot queries](#analyze-and-troubleshoot-queries)
|
||||
|
||||
## Why is my query slow?
|
||||
|
@ -29,7 +31,7 @@ Learn how to use observability tools to analyze query execution and view metrics
|
|||
Query performance depends on time range and complexity.
|
||||
If a query is slower than you expect, it might be due to the following reasons:
|
||||
|
||||
- It queries data from a large time range.
|
||||
- It queries data from a large time range.
|
||||
- It includes intensive operations, such as querying many string values or `ORDER BY` sorting or re-sorting large amounts of data.
|
||||
|
||||
## Strategies for improving query performance
|
||||
|
@ -37,9 +39,7 @@ If a query is slower than you expect, it might be due to the following reasons:
|
|||
The following design strategies generally improve query performance and resource use:
|
||||
|
||||
- Follow [schema design best practices](/influxdb/cloud-dedicated/write-data/best-practices/schema-design/) to make querying easier and more performant.
|
||||
- Query only the data you need--for example, include a [`WHERE` clause](/influxdb/cloud-dedicated/reference/sql/where/) that filters data by a time range.
|
||||
InfluxDB v3 stores data in a Parquet file for each measurement and day, and retrieves files from the Object store to answer a query.
|
||||
The smaller the time range in your query, the fewer files InfluxDB needs to retrieve from the Object store.
|
||||
- [Query only the data you need](#query-only-the-data-you-need).
|
||||
- [Downsample data](/influxdb/cloud-dedicated/process-data/downsample/) to reduce the amount of data you need to query.
|
||||
|
||||
Some bottlenecks may be out of your control and are the result of a suboptimal execution plan, such as:
|
||||
|
@ -52,9 +52,39 @@ Some bottlenecks may be out of your control and are the result of a suboptimal e
|
|||
{{% note %}}
|
||||
#### Analyze query plans to view metrics and recognize bottlenecks
|
||||
|
||||
To view runtime metrics for a query, such as the number of files scanned, use the [`EXPLAIN ANALYZE` keywords](/influxdb/cloud-dedicated/reference/sql/explain/#explain-analyze) and learn how to [analyze a query plan](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/analyze-query-plan/).
|
||||
To view runtime metrics for a query, such as the number of files scanned, use
|
||||
the [`EXPLAIN ANALYZE` keywords](/influxdb/cloud-dedicated/reference/sql/explain/#explain-analyze)
|
||||
and learn how to [analyze a query plan](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/analyze-query-plan/).
|
||||
{{% /note %}}
|
||||
|
||||
### Query only the data you need
|
||||
|
||||
#### Include a WHERE clause
|
||||
|
||||
InfluxDB v3 stores data in a Parquet file for each measurement and day, and
|
||||
retrieves files from the Object store to answer a query.
|
||||
To reduce the number of files that a query needs to retrieve from the Object store,
|
||||
include a [`WHERE` clause](/influxdb/cloud-dedicated/reference/sql/where/) that
|
||||
filters data by a time range.
|
||||
|
||||
#### SELECT only columns you need
|
||||
|
||||
Because InfluxDB v3 is a columnar database, it only processes the columns
|
||||
selected in a query, which can mitigate the query performance impact of
|
||||
[wide schemas](/influxdb/cloud-dedicated/write-data/best-practices/schema-design/#avoid-wide-schemas).
|
||||
|
||||
However, a non-specific query that retrieves a large number of columns from a
|
||||
wide schema can be slower and less efficient than a more targeted
|
||||
query--for example, consider the following queries:
|
||||
|
||||
- `SELECT time,a,b,c`
|
||||
- `SELECT *`
|
||||
|
||||
If the table contains 10 columns, the difference in performance between the
|
||||
two queries is minimal.
|
||||
In a table with over 1000 columns, the `SELECT *` query is slower and
|
||||
less efficient.
|
||||
|
||||
## Analyze and troubleshoot queries
|
||||
|
||||
Use the following tools to analyze and troubleshoot queries and find performance bottlenecks:
|
||||
|
|
|
@ -104,7 +104,7 @@ influxctl database create [flags] <DATABASE_NAME>
|
|||
| :--- | :---------------------- | :--------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| | `--retention-period` | Database retention period (default is `0s`, infinite) |
|
||||
| | `--max-tables` | Maximum tables per database (default is 500, `0` uses default) |
|
||||
| | `--max-columns` | Maximum columns per table (default is 250, `0` uses default) |
|
||||
| | `--max-columns` | Maximum columns per table (default is 1000, `0` uses default) |
|
||||
| | `--template-tag` | Tag to add to partition template (can include multiple of this flag) |
|
||||
| | `--template-tag-bucket` | Tag and number of buckets to partition tag values into separated by a comma--for example: `tag1,100` (can include multiple of this flag) |
|
||||
| | `--template-timeformat` | Timestamp format for partition template (default is `%Y-%m-%d`) |
|
||||
|
|
|
@ -8,6 +8,10 @@ menu:
|
|||
name: Schema design
|
||||
weight: 201
|
||||
parent: write-best-practices
|
||||
related:
|
||||
- /influxdb/cloud-dedicated/admin/databases/
|
||||
- /influxdb/cloud-dedicated/reference/cli/influxctl/
|
||||
- /influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/
|
||||
---
|
||||
|
||||
Use the following guidelines to design your [schema](/influxdb/cloud-dedicated/reference/glossary/#schema)
|
||||
|
@ -18,7 +22,7 @@ for simpler and more performant queries.
|
|||
- [Tags versus fields](#tags-versus-fields)
|
||||
- [Schema restrictions](#schema-restrictions)
|
||||
- [Do not use duplicate names for tags and fields](#do-not-use-duplicate-names-for-tags-and-fields)
|
||||
- [Tables can contain up to 250 columns](#tables-can-contain-up-to-250-columns)
|
||||
- [Maximum number of columns per table](#maximum-number-of-columns-per-table)
|
||||
- [Design for performance](#design-for-performance)
|
||||
- [Avoid wide schemas](#avoid-wide-schemas)
|
||||
- [Avoid sparse schemas](#avoid-sparse-schemas)
|
||||
|
@ -37,10 +41,13 @@ Tables contain multiple tags and fields.
|
|||
<!-- vale InfluxDataDocs.v3Schema = NO -->
|
||||
|
||||
- **Database**: A named location where time series data is stored.
|
||||
In {{% product-name %}}, _database_ is synonymous with _bucket_ in InfluxDB Cloud Serverless and InfluxDB TSM implementations.
|
||||
In {{% product-name %}}, _database_ is synonymous with _bucket_ in InfluxDB
|
||||
Cloud Serverless and InfluxDB TSM implementations.
|
||||
|
||||
A database can contain multiple _tables_.
|
||||
- **Table**: A logical grouping for time series data.
|
||||
In {{% product-name %}}, _table_ is synonymous with _measurement_ in InfluxDB Cloud Serverless and InfluxDB TSM implementations.
|
||||
In {{% product-name %}}, _table_ is synonymous with _measurement_ in
|
||||
InfluxDB Cloud Serverless and InfluxDB TSM implementations.
|
||||
All _points_ in a given table should have the same _tags_.
|
||||
A table contains multiple _tags_ and _fields_.
|
||||
- **Tags**: Key-value pairs that store metadata string values for each point--for example,
|
||||
|
@ -52,7 +59,9 @@ Tables contain multiple tags and fields.
|
|||
Field values may be null, but at least one field value is not null on any given row.
|
||||
- **Timestamp**: Timestamp associated with the data.
|
||||
When stored on disk and queried, all data is ordered by time.
|
||||
In InfluxDB, a timestamp is a nanosecond-scale [Unix timestamp](/influxdb/cloud-dedicated/reference/glossary/#unix-timestamp) in UTC.
|
||||
In InfluxDB, a timestamp is a nanosecond-scale
|
||||
[Unix timestamp](/influxdb/cloud-dedicated/reference/glossary/#unix-timestamp)
|
||||
in UTC.
|
||||
A timestamp is never null.
|
||||
|
||||
{{% note %}}
|
||||
|
@ -91,8 +100,9 @@ question as you design your schema.
|
|||
- String
|
||||
- Boolean
|
||||
|
||||
{{% product-name %}} doesn't index tag values or field values.
|
||||
Tag keys, field keys, and other metadata are indexed to optimize performance.
|
||||
{{% product-name %}} indexes tag keys, field keys, and other metadata
|
||||
to optimize performance.
|
||||
It doesn't index tag values or field values.
|
||||
|
||||
{{% note %}}
|
||||
The InfluxDB v3 storage engine supports infinite tag value and series cardinality.
|
||||
|
@ -106,26 +116,37 @@ cardinality doesn't affect the overall performance of your database.
|
|||
|
||||
### Do not use duplicate names for tags and fields
|
||||
|
||||
Tags and fields within the same table can't be named the same.
|
||||
All tags and fields are stored as unique columns in a table representing the
|
||||
table on disk.
|
||||
Use unique names for tags and fields within the same table.
|
||||
{{% product-name %}} stores tags and fields as unique columns in a table that
|
||||
represents the table on disk.
|
||||
If you attempt to write a table that contains tags or fields with the same name,
|
||||
the write fails due to a column conflict.
|
||||
|
||||
### Tables can contain up to 250 columns
|
||||
### Maximum number of columns per table
|
||||
|
||||
A table can contain **up to 250 columns**. Each row requires a time column,
|
||||
but the rest represent tags and fields stored in the table.
|
||||
Therefore, a table can contain one time column and 249 total field and tag columns.
|
||||
If you attempt to write to a table and exceed the 250 column limit, the
|
||||
write request fails and InfluxDB returns an error.
|
||||
A table has a [maximum number of columns](/influxdb/cloud-dedicated/admin/databases/#column-limit).
|
||||
Each row must include a time column.
|
||||
As a result, a table can have the following:
|
||||
|
||||
- a time column
|
||||
- field and tag columns up to the configured maximum.
|
||||
|
||||
If you attempt to write to a table and exceed the column limit, then the write
|
||||
request fails and InfluxDB returns an error.
|
||||
|
||||
InfluxData identified 1000 columns as the safe limit for maintaining system
|
||||
performance and stability.
|
||||
Exceeding this threshold can result in
|
||||
[wide schemas](#avoid-wide-schemas), which can negatively impact performance
|
||||
and resource use, depending on the shape and data types in your schema.
|
||||
|
||||
---
|
||||
|
||||
## Design for performance
|
||||
|
||||
How you structure your schema within a table can affect the overall
|
||||
performance of queries against that table.
|
||||
How you structure your schema within a table can affect resource use and
|
||||
the performance of queries against that table.
|
||||
|
||||
The following guidelines help to optimize query performance:
|
||||
|
||||
- [Avoid wide schemas](#avoid-wide-schemas)
|
||||
|
@ -135,26 +156,45 @@ The following guidelines help to optimize query performance:
|
|||
|
||||
### Avoid wide schemas
|
||||
|
||||
A wide schema is one with many tags and fields and corresponding columns for each.
|
||||
With the InfluxDB v3 storage engine, wide schemas don't impact query execution performance.
|
||||
Because InfluxDB v3 is a columnar database, it executes queries only against columns selected in the query.
|
||||
A wide schema refers to a schema with a large number of columns (tags and fields).
|
||||
|
||||
Although a wide schema won't affect query performance, it can lead to the following:
|
||||
Wide schemas can lead to the following issues:
|
||||
|
||||
- More resources required for persisting and compacting data during ingestion.
|
||||
- Decreased sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
|
||||
- Increased resource usage for persisting and compacting data during ingestion.
|
||||
- Reduced sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
|
||||
- Reduced query performance when [using non-specific queries](#avoid-non-specific-queries).
|
||||
|
||||
The InfluxDB v3 storage engine has a
|
||||
[limit of 250 columns per table](#tables-can-contain-up-to-250-columns).
|
||||
To prevent wide schema issues, limit the number of tags and fields stored in a table.
|
||||
If you need to store more than the [maximum number of columns](/influxdb/cloud-dedicated/admin/databases/),
|
||||
consider segmenting your fields into separate tables.
|
||||
|
||||
To avoid a wide schema, limit the number of tags and fields stored in a table.
|
||||
If you need to store more than 249 total tags and fields, consider segmenting
|
||||
your fields into a separate table.
|
||||
#### Avoid non-specific queries
|
||||
|
||||
Because InfluxDB v3 is a columnar database, it only processes the columns
|
||||
selected in a query, which can mitigate the query performance impact of wide schemas.
|
||||
If you [query only the data that you need](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/optimize-queries/#strategies-for-improving-query-performance),
|
||||
then a wide schema might not impact query performance.
|
||||
|
||||
However, a non-specific query that retrieves a large number of columns from a
|
||||
wide schema
|
||||
is slower and less efficient than a more targeted query--for example, consider
|
||||
the following queries:
|
||||
|
||||
- `SELECT time,a,b,c`
|
||||
- `SELECT *`
|
||||
|
||||
If the table contains 10 columns, the difference in performance between the
|
||||
two queries is minimal.
|
||||
In a table with over 1000 columns, the `SELECT *` query is slower and
|
||||
less efficient.
|
||||
|
||||
#### Avoid too many tags
|
||||
|
||||
In InfluxDB, the primary key for a row is the combination of the point's timestamp and _tag set_ - the collection of [tag keys](/influxdb/cloud-dedicated/reference/glossary/#tag-key) and [tag values](/influxdb/cloud-dedicated/reference/glossary/#tag-value) on the point.
|
||||
A point that contains more tags has a more complex primary key, which could impact sorting performance if you sort using all parts of the key.
|
||||
In InfluxDB, the primary key for a row is the combination of the point's
|
||||
timestamp and _tag set_ - the collection of [tag keys](/influxdb/cloud-dedicated/reference/glossary/#tag-key)
|
||||
and [tag values](/influxdb/cloud-dedicated/reference/glossary/#tag-value) on the point.
|
||||
A point that contains more tags has a more complex primary key, which could
|
||||
impact sorting performance if you sort using all parts of the key.
|
||||
|
||||
### Avoid sparse schemas
|
||||
|
||||
|
@ -275,7 +315,8 @@ Without regular expressions, your queries will be easier to write and more perfo
|
|||
|
||||
#### Not recommended {.orange}
|
||||
|
||||
For example, consider the following [line protocol](/influxdb/cloud-dedicated/reference/syntax/line-protocol/) that embeds multiple attributes (location, model, and ID) into a `sensor` tag value:
|
||||
For example, consider the following [line protocol](/influxdb/cloud-dedicated/reference/syntax/line-protocol/)
|
||||
that embeds multiple attributes (location, model, and ID) into a `sensor` tag value:
|
||||
|
||||
```text
|
||||
home,sensor=loc-kitchen.model-A612.id-1726ZA temp=72.1
|
||||
|
|
|
@ -12,8 +12,12 @@ weight: 105
|
|||
influxdb/cloud-serverless/tags: [buckets]
|
||||
aliases:
|
||||
- /influxdb/cloud-serverless/organizations/buckets/
|
||||
- /influxdb/cloud-serverless/admin/databases/
|
||||
alt_links:
|
||||
cloud: /influxdb/cloud/admin/buckets/
|
||||
cloud_dedicated: /influxdb/cloud-dedicated/admin/databases/
|
||||
clustered: /influxdb/clustered/admin/databases/
|
||||
oss: /influxdb/v2/admin/buckets/
|
||||
---
|
||||
|
||||
A **bucket** is a named location where time series data is stored.
|
||||
|
@ -30,6 +34,8 @@ support InfluxQL and the InfluxDB v1 API `/write` and `/query` endpoints, which
|
|||
See how to [map v1 databases and retention policies to buckets](/influxdb/cloud-serverless/guides/api-compatibility/v1/#map-v1-databases-and-retention-policies-to-buckets).
|
||||
|
||||
**If coming from InfluxDB v2 or InfluxDB Cloud**, _buckets_ are functionally equivalent.
|
||||
|
||||
**If coming from InfluxDB Cloud Dedicated or InfluxDB Clustered**, _database_ and _bucket_ are synonymous.
|
||||
{{% /note %}}
|
||||
|
||||
## Retention period
|
||||
|
|
|
@ -36,11 +36,9 @@ If a query is slower than you expect, it might be due to the following reasons:
|
|||
|
||||
The following design strategies generally improve query performance and resource use:
|
||||
|
||||
- Follow [schema design best practices](/influxdb/cloud-serverless/write-data/best-practices/schema-design/) to make querying easier and more performant.
|
||||
- Query only the data you need--for example, include a [`WHERE` clause](/influxdb/cloud-serverless/reference/sql/where/) that filters data by a time range.
|
||||
InfluxDB v3 stores data in a Parquet file for each measurement and day, and retrieves files from the Object store to answer a query.
|
||||
The smaller the time range in your query, the fewer files InfluxDB needs to retrieve from the Object store.
|
||||
- [Downsample data](/influxdb/cloud-serverless/process-data/downsample/) to reduce the amount of data you need to query.
|
||||
- Follow [schema design best practices](/influxdb/cloud-dedicated/write-data/best-practices/schema-design/) to make querying easier and more performant.
|
||||
- [Query only the data you need](#query-only-the-data-you-need).
|
||||
- [Downsample data](/influxdb/cloud-dedicated/process-data/downsample/) to reduce the amount of data you need to query.
|
||||
|
||||
Some bottlenecks may be out of your control and are the result of a suboptimal execution plan, such as:
|
||||
|
||||
|
@ -55,6 +53,34 @@ Some bottlenecks may be out of your control and are the result of a suboptimal e
|
|||
To view runtime metrics for a query, such as the number of files scanned, use the [`EXPLAIN ANALYZE` keywords](/influxdb/cloud-serverless/reference/sql/explain/#explain-analyze) and learn how to [analyze a query plan](/influxdb/cloud-serverless/query-data/troubleshoot-and-optimize/analyze-query-plan/).
|
||||
{{% /note %}}
|
||||
|
||||
### Query only the data you need
|
||||
|
||||
#### Include a WHERE clause
|
||||
|
||||
InfluxDB v3 stores data in a Parquet file for each measurement and day, and
|
||||
retrieves files from the Object store to answer a query.
|
||||
To reduce the number of files that a query needs to retrieve from the Object store,
|
||||
include a [`WHERE` clause](/influxdb/cloud-dedicated/reference/sql/where/) that
|
||||
filters data by a time range.
|
||||
|
||||
#### SELECT only columns you need
|
||||
|
||||
Because InfluxDB v3 is a columnar database, it only processes the columns
|
||||
selected in a query, which can mitigate the query performance impact of
|
||||
[wide schemas](/influxdb/cloud-serverless/write-data/best-practices/schema-design/#avoid-wide-schemas).
|
||||
|
||||
However, a non-specific query that retrieves a large number of columns from a
|
||||
wide schema can be slower and less efficient than a more targeted
|
||||
query--for example, consider the following queries:
|
||||
|
||||
- `SELECT time,a,b,c`
|
||||
- `SELECT *`
|
||||
|
||||
If the table contains 10 columns, the difference in performance between the
|
||||
two queries is minimal.
|
||||
In a table with over 1000 columns, the `SELECT *` query is slower and
|
||||
less efficient.
|
||||
|
||||
## Analyze and troubleshoot queries
|
||||
|
||||
Use the following tools to analyze and troubleshoot queries and find performance bottlenecks:
|
||||
|
|
|
@ -8,25 +8,25 @@ menu:
|
|||
name: Schema design
|
||||
weight: 201
|
||||
parent: write-best-practices
|
||||
related:
|
||||
- /influxdb/cloud-serverless/admin/buckets/
|
||||
- /influxdb/cloud-serverless/query-data/troubleshoot-and-optimize/
|
||||
---
|
||||
|
||||
Use the following guidelines to design your [schema](/influxdb/cloud-serverless/reference/glossary/#schema)
|
||||
for simpler and more performant queries.
|
||||
|
||||
<!-- TOC -->
|
||||
|
||||
- [InfluxDB data structure](#influxdb-data-structure)
|
||||
- [Primary keys](#primary-keys)
|
||||
- [Tags versus fields](#tags-versus-fields)
|
||||
- [Schema restrictions](#schema-restrictions)
|
||||
- [Do not use duplicate names for tags and fields](#do-not-use-duplicate-names-for-tags-and-fields)
|
||||
- [Measurements can contain up to 200 columns](#measurements-can-contain-up-to-200-columns)
|
||||
- [Maximum number of columns per measurement](#maximum-number-of-columns-per-measurement)
|
||||
- [Design for performance](#design-for-performance)
|
||||
- [Avoid wide schemas](#avoid-wide-schemas)
|
||||
- [Avoid too many tags](#avoid-too-many-tags)
|
||||
- [Avoid sparse schemas](#avoid-sparse-schemas)
|
||||
- [Writing individual fields with different timestamps](#writing-individual-fields-with-different-timestamps)
|
||||
- [Measurement schemas should be homogenous](#measurement-schemas-should-be-homogenous)
|
||||
- [Use the best data type for your data](#use-the-best-data-type-for-your-data)
|
||||
- [Design for query simplicity](#design-for-query-simplicity)
|
||||
- [Keep measurement names, tags, and fields simple](#keep-measurement-names-tags-and-fields-simple)
|
||||
- [Avoid keywords and special characters](#avoid-keywords-and-special-characters)
|
||||
|
@ -55,7 +55,7 @@ tags and fields.
|
|||
Field values may be null, but at least one field value is not null on any given row.
|
||||
- **Timestamp**: Timestamp associated with the data.
|
||||
When stored on disk and queried, all data is ordered by time.
|
||||
In InfluxDB, a timestamp is a nanosecond-scale [unix timestamp](#unix-timestamp) in UTC.
|
||||
In InfluxDB, a timestamp is a nanosecond-scale [Unix timestamp](#unix-timestamp) in UTC.
|
||||
A timestamp is never null.
|
||||
|
||||
### Primary keys
|
||||
|
@ -80,13 +80,14 @@ question as you design your schema.
|
|||
- String
|
||||
- Boolean
|
||||
|
||||
{{% product-name %}} doesn't index tag values or field values.
|
||||
Tag keys, field keys, and other metadata are indexed to optimize performance.
|
||||
{{% product-name %}} indexes tag keys, field keys, and other metadata
|
||||
to optimize performance.
|
||||
It doesn't index tag values or field values.
|
||||
|
||||
{{% note %}}
|
||||
The InfluxDB v3 storage engine supports infinite tag value and series cardinality.
|
||||
Unlike InfluxDB backed by the TSM storage engine, **tag value**
|
||||
cardinality doesn't affect the overall performance of your database.
|
||||
cardinality doesn't affect the overall performance of your bucket.
|
||||
{{% /note %}}
|
||||
|
||||
---
|
||||
|
@ -95,19 +96,23 @@ cardinality doesn't affect the overall performance of your database.
|
|||
|
||||
### Do not use duplicate names for tags and fields
|
||||
|
||||
Tags and fields within the same measurement can't be named the same.
|
||||
All tags and fields are stored as unique columns in a table representing the
|
||||
measurement on disk.
|
||||
Use unique names for tags and fields within the same measurement.
|
||||
{{% product-name %}} stores tags and fields as unique columns in a measurement that
|
||||
represents the measurement on disk.
|
||||
If you attempt to write a measurement that contains tags or fields with the same name,
|
||||
the write fails due to a column conflict.
|
||||
|
||||
### Measurements can contain up to 200 columns
|
||||
### Maximum number of columns per measurement
|
||||
|
||||
A measurement can contain **up to 200 columns**. Each row requires a time column,
|
||||
but the rest represent tags and fields stored in the measurement.
|
||||
Therefore, a measurement can contain one time column and 199 total field and tag columns.
|
||||
If you attempt to write to a measurement and exceed the 200 column limit, the
|
||||
write request fails and InfluxDB returns an error.
|
||||
A measurement has a [maximum number of columns](/influxdb/cloud-serverless/admin/buckets/#column-limit).
|
||||
Each row must include a time column.
|
||||
As a result, a measurement can have the following:
|
||||
|
||||
- a time column
|
||||
- field and tag columns up to the maximum number of columns.
|
||||
|
||||
If you attempt to write to a measurement and exceed the column limit, then the write
|
||||
request fails and InfluxDB returns an error.
|
||||
|
||||
---
|
||||
|
||||
|
@ -124,21 +129,37 @@ The following guidelines help to optimize query performance:
|
|||
|
||||
### Avoid wide schemas
|
||||
|
||||
A wide schema is one with many tags and fields and corresponding columns for each.
|
||||
With the InfluxDB v3 storage engine, wide schemas don't impact query execution performance.
|
||||
Because InfluxDB v3 is a columnar database, it executes queries only against columns selected in the query.
|
||||
A wide schema refers to a schema with a large number of columns (tags and fields).
|
||||
|
||||
Although a wide schema won't affect query performance, it can lead to the following:
|
||||
Wide schemas can lead to the following issues:
|
||||
|
||||
- More resources required for persisting and compacting data during ingestion.
|
||||
- Decreased sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
|
||||
- Increased resource usage for persisting and compacting data during ingestion.
|
||||
- Reduced sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
|
||||
- Reduced query performance when [using non-specific queries](#avoid-non-specific-queries).
|
||||
|
||||
The InfluxDB v3 storage engine has a
|
||||
[limit of 200 columns per measurement](#measurements-can-contain-up-to-200-columns).
|
||||
To prevent wide schema issues, limit the number of tags and fields stored in a measurement.
|
||||
If you need to store more than the [maximum number of columns](/influxdb/cloud-serverless/admin/buckets/),
|
||||
consider segmenting your fields into separate measurements.
|
||||
|
||||
To avoid a wide schema, limit the number of tags and fields stored in a measurement.
|
||||
If you need to store more than 199 total tags and fields, consider segmenting
|
||||
your fields into a separate measurement.
|
||||
#### Avoid non-specific queries
|
||||
|
||||
Because InfluxDB v3 is a columnar database, it only processes the columns
|
||||
selected in a query, which can mitigate the query performance impact of wide schemas.
|
||||
If you [query only the data that you need](/influxdb/cloud-serverless/query-data/troubleshoot-and-optimize/optimize-queries/#strategies-for-improving-query-performance),
|
||||
then a wide schema might not impact query performance.
|
||||
|
||||
However, a non-specific query that retrieves a large number of columns from a
|
||||
wide schema
|
||||
is slower and less efficient than a more targeted query--for example, consider
|
||||
the following queries:
|
||||
|
||||
- `SELECT time,a,b,c`
|
||||
- `SELECT *`
|
||||
|
||||
If the measurement contains 10 columns, the difference in performance between the
|
||||
two queries is minimal.
|
||||
In a measurement with over 1000 columns, the `SELECT *` query is slower and
|
||||
less efficient.
|
||||
|
||||
#### Avoid too many tags
|
||||
|
||||
|
@ -225,6 +246,12 @@ full of null values (also known as a _sparse schema_):
|
|||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
### Use the best data type for your data
|
||||
|
||||
When writing data to a field, use the most appropriate [data type](/influxdb/cloud-serverless/reference/glossary/#data-type) for your data--write integers as integers, decimals as floats, and booleans as booleans.
|
||||
A query against a field that stores integers outperforms a query against string data;
|
||||
querying over many long string values can negatively affect performance.
|
||||
|
||||
## Design for query simplicity
|
||||
|
||||
Naming conventions for measurements, tag keys, and field keys can simplify or
|
||||
|
|
|
@ -11,6 +11,14 @@ menu:
|
|||
parent: Administer InfluxDB Clustered
|
||||
weight: 103
|
||||
influxdb/clustered/tags: [databases]
|
||||
related:
|
||||
- /influxdb/clustered/write-data/best-practices/schema-design/
|
||||
- /influxdb/clustered/reference/cli/influxctl/
|
||||
alt_links:
|
||||
cloud: /influxdb/cloud/admin/buckets/
|
||||
cloud_dedicated: /influxdb/cloud-dedicated/admin/databases/
|
||||
cloud_serverless: /influxdb/cloud-serverless/admin/buckets/
|
||||
oss: /influxdb/v2/admin/buckets/
|
||||
---
|
||||
|
||||
An InfluxDB database is a named location where time series data is stored.
|
||||
|
@ -19,7 +27,7 @@ Each InfluxDB database has a [retention period](#retention-periods).
|
|||
{{% note %}}
|
||||
**If coming from InfluxDB v1**, the concepts of databases and retention policies
|
||||
have been combined into a single concept--database. Retention policies are no
|
||||
longer part of the InfluxDB data model. However, InfluxDB Clustered does
|
||||
longer part of the InfluxDB data model. However, {{% product-name %}} does
|
||||
support InfluxQL, which requires databases and retention policies.
|
||||
See [InfluxQL DBRP naming convention](/influxdb/clustered/admin/databases/create/#influxql-dbrp-naming-convention).
|
||||
|
||||
|
@ -41,9 +49,10 @@ never be removed by the retention enforcement service.
|
|||
|
||||
## Table and column limits
|
||||
|
||||
In {{< product-name >}}, table (measurement) and column limits can be
|
||||
custom configured when [creating](#create-a-database) or
|
||||
[updating a database](#update-a-database).
|
||||
You can customize [table (measurement) limits](#table-limit) and
|
||||
[table column limits](#column-limit) when you
|
||||
[create](#create-a-database) or
|
||||
[update a database](#update-a-database) in {{< product-name >}}.
|
||||
|
||||
### Table limit
|
||||
|
||||
|
@ -90,22 +99,33 @@ operating cost of your cluster.
|
|||
|
||||
### Column limit
|
||||
|
||||
**Default maximum number of columns**: 250
|
||||
**Default maximum number of columns**: 1000
|
||||
|
||||
A table can contain **up to 1000 columns**.
|
||||
Each row must include a time column, with the remaining columns representing
|
||||
tags and fields.
|
||||
As a result, a table can have one time column and up to 999 field and tag columns.
|
||||
|
||||
When creating or updating a database, you can configure the column limit to be
|
||||
lower than 1000, based on your requirements.
|
||||
After you update the column limit for a database, the limit applies to newly
|
||||
created tables; doesn't override the column limit for existing tables.
|
||||
|
||||
If you attempt to write to a table and exceed the column limit, the write
|
||||
request fails and InfluxDB returns an error.
|
||||
|
||||
Time, fields, and tags are each represented by a column in a table.
|
||||
Increasing your column limit affects your {{% product-name omit=" Clustered" %}}
|
||||
cluster in the following ways:
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "May adversely affect query performance" %}}
|
||||
{{% expand "May adversely affect system performance" %}}
|
||||
|
||||
At query time, the InfluxDB query engine identifies what table contains the queried
|
||||
data and then evaluates each row in the table to match the conditions of the query.
|
||||
The more columns that are in each row, the longer it takes to evaluate each row.
|
||||
|
||||
Through performance testing, InfluxData has identified 250 columns as the
|
||||
threshold beyond which query performance may be affected
|
||||
(depending on the shape of and data types in your schema).
|
||||
InfluxData identified 1000 columns as the safe limit for maintaining system
|
||||
performance and stability.
|
||||
Exceeding this threshold can result in
|
||||
[wide schemas](/influxdb/cloud-dedicated/write-data/best-practices/schema-design/#avoid-wide-schemas),
|
||||
which can negatively impact performance and resource use,
|
||||
depending on the shape of your schema and data types in the schema.
|
||||
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
|
|
@ -12,6 +12,7 @@ influxdb/clustered/tags: [query, performance, observability, errors, sql, influx
|
|||
related:
|
||||
- /influxdb/clustered/query-data/sql/
|
||||
- /influxdb/clustered/query-data/influxql/
|
||||
- /influxdb/clustered/query-data/execute-queries/analyze-query-plan/
|
||||
aliases:
|
||||
- /influxdb/clustered/query-data/execute-queries/optimize-queries/
|
||||
- /influxdb/clustered/query-data/execute-queries/analyze-query-plan/
|
||||
|
@ -22,6 +23,7 @@ Learn how to use observability tools to analyze query execution and view metrics
|
|||
|
||||
- [Why is my query slow?](#why-is-my-query-slow)
|
||||
- [Strategies for improving query performance](#strategies-for-improving-query-performance)
|
||||
- [Query only the data you need](#query-only-the-data-you-need)
|
||||
- [Analyze and troubleshoot queries](#analyze-and-troubleshoot-queries)
|
||||
|
||||
## Why is my query slow?
|
||||
|
@ -37,10 +39,7 @@ If a query is slower than you expect, it might be due to the following reasons:
|
|||
The following design strategies generally improve query performance and resource use:
|
||||
|
||||
- Follow [schema design best practices](/influxdb/clustered/write-data/best-practices/schema-design/) to make querying easier and more performant.
|
||||
- Query only the data you need--for example, include a [`WHERE` clause](/influxdb/clustered/reference/sql/where/) that filters data by a time range.
|
||||
InfluxDB v3 stores data in a Parquet file for each measurement and day, and retrieves files from the Object store to answer a query.
|
||||
The smaller the time range in your query, the fewer files InfluxDB needs to retrieve from the Object store.
|
||||
|
||||
- [Query only the data you need](#query-only-the-data-you-need).
|
||||
- [Downsample data](/influxdb/clustered/process-data/downsample/) to reduce the amount of data you need to query.
|
||||
|
||||
Some bottlenecks may be out of your control and are the result of a suboptimal execution plan, such as:
|
||||
|
@ -53,9 +52,39 @@ Some bottlenecks may be out of your control and are the result of a suboptimal e
|
|||
{{% note %}}
|
||||
#### Analyze query plans to view metrics and recognize bottlenecks
|
||||
|
||||
To view runtime metrics for a query, such as the number of files scanned, use the [`EXPLAIN ANALYZE` keywords](/influxdb/clustered/reference/sql/explain/#explain-analyze) and learn how to [analyze a query plan](/influxdb/clustered/query-data/troubleshoot-and-optimize/analyze-query-plan/).
|
||||
To view runtime metrics for a query, such as the number of files scanned, use
|
||||
the [`EXPLAIN ANALYZE` keywords](/influxdb/clustered/reference/sql/explain/#explain-analyze)
|
||||
and learn how to [analyze a query plan](/influxdb/clustered/query-data/troubleshoot-and-optimize/analyze-query-plan/).
|
||||
{{% /note %}}
|
||||
|
||||
### Query only the data you need
|
||||
|
||||
#### Include a WHERE clause
|
||||
|
||||
InfluxDB v3 stores data in a Parquet file for each measurement and day, and
|
||||
retrieves files from the Object store to answer a query.
|
||||
To reduce the number of files that a query needs to retrieve from the Object store,
|
||||
include a [`WHERE` clause](/influxdb/clustered/reference/sql/where/) that
|
||||
filters data by a time range.
|
||||
|
||||
#### SELECT only columns you need
|
||||
|
||||
Because InfluxDB v3 is a columnar database, it only processes the columns
|
||||
selected in a query, which can mitigate the query performance impact of
|
||||
[wide schemas](/influxdb/clustered/write-data/best-practices/schema-design/#avoid-wide-schemas).
|
||||
|
||||
However, a non-specific query that retrieves a large number of columns from a
|
||||
wide schema can be slower and less efficient than a more targeted
|
||||
query--for example, consider the following queries:
|
||||
|
||||
- `SELECT time,a,b,c`
|
||||
- `SELECT *`
|
||||
|
||||
If the table contains 10 columns, the difference in performance between the
|
||||
two queries is minimal.
|
||||
In a table with over 1000 columns, the `SELECT *` query is slower and
|
||||
less efficient.
|
||||
|
||||
## Analyze and troubleshoot queries
|
||||
|
||||
Learn how to [analyze a query plan](/influxdb/clustered/query-data/troubleshoot-and-optimize/analyze-query-plan/)
|
||||
|
|
|
@ -103,7 +103,7 @@ influxctl database create [flags] <DATABASE_NAME>
|
|||
| :--- | :---------------------- | :--------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| | `--retention-period` | Database retention period (default is `0s`, infinite) |
|
||||
| | `--max-tables` | Maximum tables per database (default is 500, `0` uses default) |
|
||||
| | `--max-columns` | Maximum columns per table (default is 250, `0` uses default) |
|
||||
| | `--max-columns` | Maximum columns per table (default is 1000, `0` uses default) |
|
||||
| | `--template-tag` | Tag to add to partition template (can include multiple of this flag) |
|
||||
| | `--template-tag-bucket` | Tag and number of buckets to partition tag values into separated by a comma--for example: `tag1,100` (can include multiple of this flag) |
|
||||
| | `--template-timeformat` | Timestamp format for partition template (default is `%Y-%m-%d`) |
|
||||
|
|
|
@ -8,6 +8,10 @@ menu:
|
|||
name: Schema design
|
||||
weight: 201
|
||||
parent: write-best-practices
|
||||
related:
|
||||
- /influxdb/clustered/admin/databases/
|
||||
- /influxdb/clustered/reference/cli/influxctl/
|
||||
- /influxdb/clustered/query-data/troubleshoot-and-optimize/
|
||||
---
|
||||
|
||||
Use the following guidelines to design your [schema](/influxdb/clustered/reference/glossary/#schema)
|
||||
|
@ -18,7 +22,7 @@ for simpler and more performant queries.
|
|||
- [Tags versus fields](#tags-versus-fields)
|
||||
- [Schema restrictions](#schema-restrictions)
|
||||
- [Do not use duplicate names for tags and fields](#do-not-use-duplicate-names-for-tags-and-fields)
|
||||
- [Tables can contain up to 250 columns](#tables-can-contain-up-to-250-columns)
|
||||
- [Maximum number of columns per table](#maximum-number-of-columns-per-table)
|
||||
- [Design for performance](#design-for-performance)
|
||||
- [Avoid wide schemas](#avoid-wide-schemas)
|
||||
- [Avoid sparse schemas](#avoid-sparse-schemas)
|
||||
|
@ -37,10 +41,13 @@ Tables contain multiple tags and fields.
|
|||
<!-- vale InfluxDataDocs.v3Schema = NO -->
|
||||
|
||||
- **Database**: A named location where time series data is stored.
|
||||
In {{% product-name %}}, _database_ is synonymous with _bucket_ in InfluxDB Cloud Serverless and InfluxDB TSM implementations.
|
||||
In {{% product-name %}}, _database_ is synonymous with _bucket_ in InfluxDB
|
||||
Cloud Serverless and InfluxDB TSM implementations.
|
||||
|
||||
A database can contain multiple _tables_.
|
||||
- **Table**: A logical grouping for time series data.
|
||||
In {{% product-name %}}, _table_ is synonymous with _measurement_ in InfluxDB Cloud Serverless and InfluxDB TSM implementations.
|
||||
In {{% product-name %}}, _table_ is synonymous with _measurement_ in
|
||||
InfluxDB Cloud Serverless and InfluxDB TSM implementations.
|
||||
All _points_ in a given table should have the same _tags_.
|
||||
A table contains multiple _tags_ and _fields_.
|
||||
- **Tags**: Key-value pairs that store metadata string values for each point--for example,
|
||||
|
@ -52,7 +59,9 @@ Tables contain multiple tags and fields.
|
|||
Field values may be null, but at least one field value is not null on any given row.
|
||||
- **Timestamp**: Timestamp associated with the data.
|
||||
When stored on disk and queried, all data is ordered by time.
|
||||
In InfluxDB, a timestamp is a nanosecond-scale [unix timestamp](/influxdb/clustered/reference/glossary/#unix-timestamp) in UTC.
|
||||
In InfluxDB, a timestamp is a nanosecond-scale
|
||||
[Unix timestamp](/influxdb/clustered/reference/glossary/#unix-timestamp)
|
||||
in UTC.
|
||||
A timestamp is never null.
|
||||
|
||||
{{% note %}}
|
||||
|
@ -91,8 +100,9 @@ question as you design your schema.
|
|||
- String
|
||||
- Boolean
|
||||
|
||||
{{% product-name %}} doesn't index tag values or field values.
|
||||
Tag keys, field keys, and other metadata are indexed to optimize performance.
|
||||
{{% product-name %}} indexes tag keys, field keys, and other metadata
|
||||
to optimize performance.
|
||||
It doesn't index tag values or field values.
|
||||
|
||||
{{% note %}}
|
||||
The InfluxDB v3 storage engine supports infinite tag value and series cardinality.
|
||||
|
@ -106,26 +116,37 @@ cardinality doesn't affect the overall performance of your database.
|
|||
|
||||
### Do not use duplicate names for tags and fields
|
||||
|
||||
Tags and fields within the same table can't be named the same.
|
||||
All tags and fields are stored as unique columns in a table representing the
|
||||
table on disk.
|
||||
Use unique names for tags and fields within the same table.
|
||||
{{% product-name %}} stores tags and fields as unique columns in a table that
|
||||
represents the table on disk.
|
||||
If you attempt to write a table that contains tags or fields with the same name,
|
||||
the write fails due to a column conflict.
|
||||
|
||||
### Tables can contain up to 250 columns
|
||||
### Maximum number of columns per table
|
||||
|
||||
A table can contain **up to 250 columns**. Each row requires a time column,
|
||||
but the rest represent tags and fields stored in the table.
|
||||
Therefore, a table can contain one time column and 249 total field and tag columns.
|
||||
If you attempt to write to a table and exceed the 250 column limit, the
|
||||
write request fails and InfluxDB returns an error.
|
||||
A table has a [maximum number of columns](/influxdb/clustered/admin/databases/#column-limit).
|
||||
Each row must include a time column.
|
||||
As a result, a table can have the following:
|
||||
|
||||
- a time column
|
||||
- field and tag columns up to the configured maximum.
|
||||
|
||||
If you attempt to write to a table and exceed the column limit, then the write
|
||||
request fails and InfluxDB returns an error.
|
||||
|
||||
InfluxData identified 1000 columns as the safe limit for maintaining system
|
||||
performance and stability.
|
||||
Exceeding this threshold can result in
|
||||
[wide schemas](#avoid-wide-schemas), which can negatively impact performance
|
||||
and resource use, depending on the shape and data types in your schema.
|
||||
|
||||
---
|
||||
|
||||
## Design for performance
|
||||
|
||||
How you structure your schema within a table can affect the overall
|
||||
performance of queries against that table.
|
||||
How you structure your schema within a table can affect resource use and
|
||||
the performance of queries against that table.
|
||||
|
||||
The following guidelines help to optimize query performance:
|
||||
|
||||
- [Avoid wide schemas](#avoid-wide-schemas)
|
||||
|
@ -135,26 +156,45 @@ The following guidelines help to optimize query performance:
|
|||
|
||||
### Avoid wide schemas
|
||||
|
||||
A wide schema is one with many tags and fields and corresponding columns for each.
|
||||
With the InfluxDB v3 storage engine, wide schemas don't impact query execution performance.
|
||||
Because InfluxDB v3 is a columnar database, it executes queries only against columns selected in the query.
|
||||
A wide schema refers to a schema with a large number of columns (tags and fields).
|
||||
|
||||
Although a wide schema won't affect query performance, it can lead to the following:
|
||||
Wide schemas can lead to the following issues:
|
||||
|
||||
- More resources required for persisting and compacting data during ingestion.
|
||||
- Decreased sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
|
||||
- Increased resource usage for persisting and compacting data during ingestion.
|
||||
- Reduced sorting performance due to complex primary keys with [too many tags](#avoid-too-many-tags).
|
||||
- Reduced query performance when [using non-specific queries](#avoid-non-specific-queries).
|
||||
|
||||
The InfluxDB v3 storage engine has a
|
||||
[limit of 250 columns per table](#tables-can-contain-up-to-250-columns).
|
||||
To prevent wide schema issues, limit the number of tags and fields stored in a table.
|
||||
If you need to store more than the [maximum number of columns](/influxdb/clustered/admin/databases/),
|
||||
consider segmenting your fields into separate tables.
|
||||
|
||||
To avoid a wide schema, limit the number of tags and fields stored in a table.
|
||||
If you need to store more than 249 total tags and fields, consider segmenting
|
||||
your fields into a separate table.
|
||||
#### Avoid non-specific queries
|
||||
|
||||
Because InfluxDB v3 is a columnar database, it only processes the columns
|
||||
selected in a query, which can mitigate the query performance impact of wide schemas.
|
||||
If you [query only the data that you need](/influxdb/clustered/query-data/troubleshoot-and-optimize/optimize-queries/#strategies-for-improving-query-performance),
|
||||
then a wide schema might not impact query performance.
|
||||
|
||||
However, a non-specific query that retrieves a large number of columns from a
|
||||
wide schema
|
||||
is slower and less efficient than a more targeted query--for example, consider
|
||||
the following queries:
|
||||
|
||||
- `SELECT time,a,b,c`
|
||||
- `SELECT *`
|
||||
|
||||
If the table contains 10 columns, the difference in performance between the
|
||||
two queries is minimal.
|
||||
In a table with over 1000 columns, the `SELECT *` query is slower and
|
||||
less efficient.
|
||||
|
||||
#### Avoid too many tags
|
||||
|
||||
In InfluxDB, the primary key for a row is the combination of the point's timestamp and _tag set_ - the collection of [tag keys](/influxdb/clustered/reference/glossary/#tag-key) and [tag values](/influxdb/clustered/reference/glossary/#tag-value) on the point.
|
||||
A point that contains more tags has a more complex primary key, which could impact sorting performance if you sort using all parts of the key.
|
||||
In InfluxDB, the primary key for a row is the combination of the point's
|
||||
timestamp and _tag set_ - the collection of [tag keys](/influxdb/clustered/reference/glossary/#tag-key)
|
||||
and [tag values](/influxdb/clustered/reference/glossary/#tag-value) on the point.
|
||||
A point that contains more tags has a more complex primary key, which could
|
||||
impact sorting performance if you sort using all parts of the key.
|
||||
|
||||
### Avoid sparse schemas
|
||||
|
||||
|
@ -275,7 +315,8 @@ Without regular expressions, your queries will be easier to write and more perfo
|
|||
|
||||
#### Not recommended {.orange}
|
||||
|
||||
For example, consider the following [line protocol](/influxdb/clustered/reference/syntax/line-protocol/) that embeds multiple attributes (location, model, and ID) into a `sensor` tag value:
|
||||
For example, consider the following [line protocol](/influxdb/clustered/reference/syntax/line-protocol/)
|
||||
that embeds multiple attributes (location, model, and ID) into a `sensor` tag value:
|
||||
|
||||
```text
|
||||
home,sensor=loc-kitchen.model-A612.id-1726ZA temp=72.1
|
||||
|
|
Loading…
Reference in New Issue