Merge pull request #5734 from influxdata/jstirnaman/DAR-463

docs(partitioning): enhance best practices and time part templates do…
pull/5740/head^2
Jason Stirnaman 2025-01-09 10:41:49 -06:00 committed by GitHub
commit 37183896f4
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
20 changed files with 1071 additions and 1842 deletions

View File

@ -11,409 +11,9 @@ weight: 103
influxdb/cloud-dedicated/tags: [storage]
related:
- /influxdb/cloud-dedicated/reference/internals/storage-engine/
source: /shared/v3-distributed-admin-custom-partitions/_index.md
---
When writing data to {{< product-name >}}, the InfluxDB v3 storage engine stores
data in the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store)
in [Apache Parquet](https://parquet.apache.org/) format.
Each Parquet file represents a _partition_--a logical grouping of data.
By default, InfluxDB partitions each table by day.
{{< product-name >}} lets you customize the partitioning strategy and partition
by tag values and different time intervals.
Customize your partitioning strategy to optimize query performance for your
specific schema and workload.
- [Advantages](#advantages)
- [Disadvantages](#disadvantages)
- [Limitations](#limitations)
- [How partitioning works](#how-partitioning-works)
- [Partition templates](#partition-templates)
- [Partition keys](#partition-keys)
- [Partitions in the query life cycle](#partitions-in-the-query-life-cycle)
- [Partition guides](#partition-guides)
{{< children type="anchored-list" >}}
## Advantages
The primary advantage of custom partitioning is that it lets you customize your
storage structure to improve query performance specific to your schema and workload.
- **Optimized storage for improved performance on specific types of queries**.
For example, if queries often select data with a specific tag value, you can
partition by that tag to improve the performance of those queries.
- **Optimized storage for specific types of data**. For example, if the data you
store is sparse and the time ranges you query are often much larger than a day,
you could partition your data by week instead of by day.
## Disadvantages
Using custom partitioning may increase the load on other parts of the
[InfluxDB v3 storage engine](/influxdb/cloud-dedicated/reference/internals/storage-engine/),
but each can be scaled individually to address the added load.
{{% note %}}
_The following disadvantages assume that your custom partitioning strategy includes
additional tags to partition by or partition intervals smaller than a day._
{{% /note %}}
- **Increased load on the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester)**
as it groups data into smaller partitions and files.
- **Increased load on the [Catalog](/influxdb/cloud-dedicated/reference/internals/storage-engine/#catalog)**
as more references to partition Parquet file locations are stored and queried.
- **Increased load on the [Compactor](/influxdb/cloud-dedicated/reference/internals/storage-engine/#compactor)**
as more partition Parquet files need to be compacted.
- **Increased costs associated with [Object storage](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-storage)**
as more partition Parquet files are created and stored.
- **Risk of decreased performance for queries that don't use tags in the WHERE clause**.
These queries may end up reading many partitions and smaller files, degrading performance.
## Limitations
Custom partitioning has the following limitations:
- Database and table partitions can only be defined on create.
You cannot update the partition strategy of a database or table after it has
been created.
- A partition template must include a time part.
- You can partition by up to eight dimensions (seven tags and a time interval).
## How partitioning works
### Partition templates
A partition template defines the pattern used for _[partition keys](#partition-keys)_
and determines the time interval that data is partitioned by.
Partition templates use tag values and
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html).
_For more detailed information, see [Partition templates](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/)._
### Partition keys
A partition key uniquely identifies a partition.
A _[partition template](#partition-templates)_ defines the partition key format.
Partition keys are
composed of up to 8 dimensions (1 time part and up to 7 tag or tag bucket parts).
Each part is delimited by the partition key separator (`|`).
The default format for partition keys is `%Y-%m-%d` (for example, `2024-01-01`).
{{< expand-wrapper >}}
{{% expand "View example partition templates and keys" %}}
Given the following line protocol with the following timestamps:
- 2023-12-31T23:00:00Z
- 2024-01-01T00:00:00Z
- 2024-01-01T01:00:00Z
```text
production,line=A,station=cnc temp=81.2,qty=35i 1704063600000000000
production,line=A,station=wld temp=92.8,qty=35i 1704063600000000000
production,line=B,station=cnc temp=101.1,qty=43i 1704063600000000000
production,line=B,station=wld temp=102.4,qty=43i 1704063600000000000
production,line=A,station=cnc temp=81.9,qty=36i 1704067200000000000
production,line=A,station=wld temp=110.0,qty=22i 1704067200000000000
production,line=B,station=cnc temp=101.8,qty=44i 1704067200000000000
production,line=B,station=wld temp=105.7,qty=44i 1704067200000000000
production,line=A,station=cnc temp=82.2,qty=35i 1704070800000000000
production,line=A,station=wld temp=92.1,qty=30i 1704070800000000000
production,line=B,station=cnc temp=102.4,qty=43i 1704070800000000000
production,line=B,station=wld temp=106.5,qty=43i 1704070800000000000
```
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 1 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `2023-12-31`
- `2024-01-01`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 1 ---------------------->
{{% /flex %}}
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 2 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `line` <em class="op50">tag</em>
- `%d %b %Y` <em class="op50">time (by day, non-default format)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `A | 31 Dec 2023`
- `B | 31 Dec 2023`
- `A | 01 Jan 2024`
- `B | 01 Jan 2024`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 2 ---------------------->
{{% /flex %}}
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 3 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `line` <em class="op50">tag</em>
- `station` <em class="op50">tag</em>
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `A | cnc | 2023-12-31`
- `A | wld | 2023-12-31`
- `B | cnc | 2023-12-31`
- `B | wld | 2023-12-31`
- `A | cnc | 2024-01-01`
- `A | wld | 2024-01-01`
- `B | cnc | 2024-01-01`
- `B | wld | 2024-01-01`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 3 ---------------------->
{{% /flex %}}
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 4 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `line` <em class="op50">tag</em>
- `station,3` <em class="op50">tag bucket</em>
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `A | 0 | 2023-12-31`
- `B | 0 | 2023-12-31`
- `A | 0 | 2024-01-01`
- `B | 0 | 2024-01-01`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 4 ---------------------->
{{% /flex %}}
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 5 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `line` <em class="op50">tag</em>
- `station` <em class="op50">tag</em>
- `%Y-%m-%d %H:00` <em class="op50">time (by hour)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `A | cnc | 2023-12-31 23:00`
- `A | wld | 2023-12-31 23:00`
- `B | cnc | 2023-12-31 23:00`
- `B | wld | 2023-12-31 23:00`
- `A | cnc | 2024-01-01 00:00`
- `A | wld | 2024-01-01 00:00`
- `B | cnc | 2024-01-01 00:00`
- `B | wld | 2024-01-01 00:00`
- `A | cnc | 2024-01-01 01:00`
- `A | wld | 2024-01-01 01:00`
- `B | cnc | 2024-01-01 01:00`
- `B | wld | 2024-01-01 01:00`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 5 ---------------------->
{{% /flex %}}
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 6 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `line` <em class="op50">tag</em>
- `station,50` <em class="op50">tag bucket</em>
- `%Y-%m-%d %H:00` <em class="op50">time (by hour)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `A | 47 | 2023-12-31 23:00`
- `A | 9 | 2023-12-31 23:00`
- `B | 47 | 2023-12-31 23:00`
- `B | 9 | 2023-12-31 23:00`
- `A | 47 | 2024-01-01 00:00`
- `A | 9 | 2024-01-01 00:00`
- `B | 47 | 2024-01-01 00:00`
- `B | 9 | 2024-01-01 00:00`
- `A | 47 | 2024-01-01 01:00`
- `A | 9 | 2024-01-01 01:00`
- `B | 47 | 2024-01-01 01:00`
- `B | 9 | 2024-01-01 01:00`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 6 ---------------------->
{{% /flex %}}
{{% /expand %}}
{{< /expand-wrapper >}}
## Partitions in the query life cycle
When querying data:
1. The [Catalog](/influxdb/cloud-dedicated/reference/internals/storage-engine/#catalog)
provides the v3 query engine ([Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier))
with the locations of partitions that contain the queried time series data.
2. The query engine reads all rows in the returned partitions to identify what
rows match the logic in the query and should be included in the query result.
The faster the query engine can identify what partitions to read and then read
the data in those partitions, the more performant queries are.
_For more information about the query lifecycle, see
[InfluxDB v3 query life cycle](/influxdb/cloud-dedicated/reference/internals/storage-engine/#query-life-cycle)._
##### Query example
Consider the following query that selects everything in the `production` table
where the `line` tag is `A` and the `station` tag is `cnc`:
```sql
SELECT *
FROM production
WHERE
time >= now() - INTERVAL '1 week'
AND line = 'A'
AND station = 'cnc'
```
Using the default partitioning strategy (by day), the query engine
reads eight separate partitions (one partition for today and one for each of the
last seven days):
- {{< datetime/current-date trimTime=true >}}
- {{< datetime/current-date offset=-1 trimTime=true >}}
- {{< datetime/current-date offset=-2 trimTime=true >}}
- {{< datetime/current-date offset=-3 trimTime=true >}}
- {{< datetime/current-date offset=-4 trimTime=true >}}
- {{< datetime/current-date offset=-5 trimTime=true >}}
- {{< datetime/current-date offset=-6 trimTime=true >}}
- {{< datetime/current-date offset=-7 trimTime=true >}}
The query engine must scan _all_ rows in the partitions to identify rows
where `line` is `A` and `station` is `cnc`. This process takes valuable time
and results in less performant queries.
However, if you partition by other tags, InfluxDB can identify partitions that
contain only the tag values your query needs and spend less time
scanning rows to see if they contain the tag values.
For example, if data is partitioned by `line`, `station`, and day, although
there are more partition files, the query engine can quickly identify and read
only those with data relevant to the query:
{{% columns 4 %}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date trimTime=true >}}
- B | cnc | {{< datetime/current-date trimTime=true >}}
- B | wld | {{< datetime/current-date trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-1 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-1 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-1 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-1 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-2 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-2 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-2 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-2 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-3 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-3 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-3 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-3 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-4 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-4 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-4 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-4 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-5 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-5 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-5 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-5 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-6 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-6 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-6 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-6 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-7 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-7 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-7 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-7 trimTime=true >}}
{{% /columns %}}
---
## Partition guides
{{< children >}}
<!--
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_index.md
-->

View File

@ -8,49 +8,9 @@ menu:
name: Best practices
parent: Manage data partitioning
weight: 202
source: /shared/v3-distributed-admin-custom-partitions/best-practices.md
---
Use the following best practices when defining custom partitioning strategies
for your data stored in {{< product-name >}}.
- [Partition by tags that you commonly query for a specific value](#partition-by-tags-that-you-commonly-query-for-a-specific-value)
- [Only partition by tags that _always_ have a value](#only-partition-by-tags-that-always-have-a-value)
- [Avoid over-partitioning](#avoid-over-partitioning)
## Partition by tags that you commonly query for a specific value
Custom partitioning primarily benefits queries that look for a specific tag
value in the `WHERE` clause. For example, if you often query data related to a
specific ID, partitioning by the tag that stores the ID helps the InfluxDB
query engine to more quickly identify what partitions contain the relevant data.
{{% note %}}
#### Use tag buckets for high-cardinality tags
Partitioning using distinct values of tags with many (10K+) unique values can
actually hurt query performance as partitions are created for each unique tag value.
Instead, use [tag buckets](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#tag-bucket-part-templates)
to partition by high-cardinality tags.
This method of partitioning groups tag values into "buckets" and partitions by bucket.
{{% /note %}}
## Only partition by tags that _always_ have a value
You should only partition by tags that _always_ have a value.
If points don't have a value for the tag, InfluxDB can't store them in the correct partitions and, at query time, must read all the partitions.
## Avoid over-partitioning
As you plan your partitioning strategy, keep in mind that data can be
"over-partitioned"--meaning partitions are so granular that queries end up
having to retrieve and read many partitions from the object store, which
hurts query performance.
- Balance the partition time interval with the actual amount of data written
during each interval. If a single interval doesn't contain a lot of data,
it is better to partition by larger time intervals.
- Don't partition by tags that you typically don't use in your query workload.
- Don't partition by distinct values of high-cardinality tags.
Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to
partition by these tags.
<!--
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/best-practices.md
-->

View File

@ -10,161 +10,9 @@ weight: 202
related:
- /influxdb/cloud-dedicated/reference/cli/influxctl/database/create/
- /influxdb/cloud-dedicated/reference/cli/influxctl/table/create/
source: /shared/v3-distributed-admin-custom-partitions/define-custom-partitions.md
---
Use the [`influxctl` CLI](/influxdb/cloud-dedicated/reference/cli/influxctl/)
to define custom partition strategies when creating a database or table.
By default, {{< product-name >}} partitions data by day.
The partitioning strategy of a database or table is determined by a
[partition template](/influxdb/cloud-dedicated/admin/custom-partitions/#partition-templates)
which defines the naming pattern for [partition keys](/influxdb/cloud-dedicated/admin/custom-partitions/#partition-keys).
Partition keys uniquely identify each partition.
When a partition template is applied to a database, it becomes the default template
for all tables in that database, but can be overridden when creating a
table.
- [Create a database with a custom partition template](#create-a-database-with-a-custom-partition-template)
- [Create a table with a custom partition template](#create-a-table-with-a-custom-partition-template)
- [Example partition templates](#example-partition-templates)
{{% warn %}}
#### Partition templates can only be applied on create
You can only apply a partition template when creating a database or table.
You can't update a partition template on an existing resource.
{{% /warn %}}
Use the following command flags to identify
[partition template parts](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#tag-part-templates):
- `--template-tag`: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
to use in the partition template.
- `--template-tag-bucket`: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
and number of "buckets" to group tag values into.
Provide the tag key and the number of buckets to bucket tag values into
separated by a comma: `tagKey,N`.
- `--template-timeformat`: A [Rust strftime date and time](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates)
string that specifies the time format in the partition template and determines
the time interval to partition by.
{{% note %}}
A partition template can include up to 7 total tag and tag bucket parts
and only 1 time part.
{{% /note %}}
_View [partition template part restrictions](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#restrictions)._
{{% note %}}
#### Always provide a time format when using custom partitioning
When defining a custom partition template for your database or table using any
of the `influxctl` `--template-*` flags, always include the `--template-timeformat`
flag with a time format to use in your partition template.
Otherwise, InfluxDB omits time from the partition template and won't compact partitions.
{{% /note %}}
## Create a database with a custom partition template
The following example creates a new `example-db` database and applies a partition
template that partitions by distinct values of two tags (`room` and `sensor-type`),
bucketed values of the `customerID` tag, and by day using the time format `%Y-%m-%d`:
<!--Skip database create and delete tests: namespaces aren't reusable-->
<!--pytest.mark.skip-->
```sh
influxctl database create \
--template-tag room \
--template-tag sensor-type \
--template-tag-bucket customerID,500 \
--template-timeformat '%Y-%m-%d' \
example-db
```
## Create a table with a custom partition template
The following example creates a new `example-table` table in the specified
database and applies a partition template that partitions by distinct values of
two tags (`room` and `sensor-type`), bucketed values of the `customerID` tag,
and by month using the time format `%Y-%m`:
<!--Skip database create and delete tests: namespaces aren't reusable-->
<!--pytest.mark.skip-->
{{% code-placeholders "DATABASE_NAME" %}}
```sh
influxctl table create \
--template-tag room \
--template-tag sensor-type \
--template-tag-bucket customerID,500 \
--template-timeformat '%Y-%m' \
DATABASE_NAME \
example-table
```
{{% /code-placeholders %}}
Replace the following in your command:
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} [database](/influxdb/cloud-dedicated/admin/databases/)
<!--actual test
```sh
# Test the preceding command outside of the code block.
# influxctl authentication requires TTY interaction--
# output the auth URL to a file that the host can open.
TABLE_NAME=table_TEST_RUN
script -c "influxctl table create \
--template-tag room \
--template-tag sensor-type \
--template-tag-bucket customerID,500 \
--template-timeformat '%Y-%m' \
DATABASE_NAME \
$TABLE_NAME" \
/dev/null > /shared/urls.txt
script -c "influxctl query \
--database DATABASE_NAME \
--token DATABASE_TOKEN \
'SHOW TABLES'" > /shared/temp_tables.txt
grep -q $TABLE_NAME /shared/temp_tables.txt
rm /shared/temp_tables.txt
```
<!--
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_define-custom-partitions.md
-->
## Example partition templates
Given the following [line protocol](/influxdb/cloud-dedicated/reference/syntax/line-protocol/)
with a `2024-01-01T00:00:00Z` timestamp:
```text
prod,line=A,station=weld1 temp=81.9,qty=36i 1704067200000000000
```
##### Partitioning by distinct tag values
| Description | Tag parts | Time part | Resulting partition key |
| :---------------------- | :---------------- | :--------- | :----------------------- |
| By day (default) | | `%Y-%m-%d` | 2024-01-01 |
| By month | | `%Y-%m` | 2024-01 |
| By year | | `%Y` | 2024 |
| Single tag, by day | `line` | `%Y-%m-%d` | A \| 2024-01-01 |
| Single tag, by month | `line` | `%Y-%m` | A \| 2024-01 |
| Single tag, by year | `line` | `%Y` | A \| 2024 |
| Multiple tags, by day | `line`, `station` | `%Y-%m-%d` | A \| weld1 \| 2024-01-01 |
| Multiple tags, by month | `line`, `station` | `%Y-%m` | A \| weld1 \| 2024-01 |
| Multiple tags, by year | `line`, `station` | `%Y` | A \| weld1 \| 2024 |
##### Partition by tag buckets
| Description | Tag part | Tag bucket part | Time part | Resulting partition key |
| :---------------------------------- | :------- | :-------------- | :--------- | :---------------------- |
| Distinct tag, tag buckets, by day | `line` | `station,100` | `%Y-%m-%d` | A \| 3 \| 2024-01-01 |
| Distinct tag, tag buckets, by month | `line` | `station,500` | `%Y-%m` | A \| 303 \| 2024-01 |

View File

@ -8,124 +8,9 @@ menu:
influxdb_cloud_dedicated:
parent: Manage data partitioning
weight: 202
source: /shared/v3-distributed-admin-custom-partitions/partition-templates.md
---
Use partition templates to define the patterns used to generate partition keys.
A partition key uniquely identifies a partition and is used to name the partition
Parquet file in the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store).
A partition template consists of 1-8 _template parts_---dimensions to partition data by.
Three types of template parts exist:
- **tag**: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
to partition by.
- **tag bucket**: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
and number of "buckets" to group tag values into. Data is partitioned by the
tag bucket rather than each distinct tag value.
- {{< req type="key" >}} **time**: A Rust strftime date and time string that specifies the time interval
to partition data by. The smallest unit of time included in the time part
template is the interval used to partition data.
{{% note %}}
A partition template must include 1 [time part](#time-part-templates)
and can include up to 7 total [tag](#tag-part-templates) and [tag bucket](#tag-bucket-part-templates) parts.
{{% /note %}}
<!-- TOC -->
- [Restrictions](#restrictions)
- [Template part size limit](#template-part-size-limit)
- [Reserved keywords](#reserved-keywords)
- [Reserved Characters](#reserved-characters)
- [Tag part templates](#tag-part-templates)
- [Tag bucket part templates](#tag-bucket-part-templates)
- [Time part templates](#time-part-templates)
<!-- /TOC -->
## Restrictions
### Template part size limit
Each template part is limited to 200 bytes in length.
Anything longer will be truncated at 200 bytes and appended with `#`.
### Partition key size limit
With the truncation of template parts, the maximum length of a partition key is
1,607 bytes (1.57 KiB).
### Reserved keywords
The following reserved keywords cannot be used in partition templates:
- `time`
### Reserved Characters
If used in template parts, non-ASCII characters and the following reserved
characters must be [percent encoded](https://developer.mozilla.org/en-US/docs/Glossary/Percent-encoding):
- `|`: Partition key part delimiter
- `!`: Null or missing partition key part
- `^`: Empty string partition key part
- `#`: Key part truncation marker
- `%`: Required for unambiguous reversal of percent encoding
## Tag part templates
Tag part templates consist of a _tag key_ to partition by.
Generated partition keys include the unique _tag value_ specific to each partition.
A partition template may include a given tag key only once in template parts
that operate on tags (tag value and tag bucket)--for example:
If a template partitions on unique values of `tag_A`, then
you can't use `tag_A` as a tag bucket part.
## Tag bucket part templates
Tag bucket part templates consist of a _tag key_ to partition by and the
_number of "buckets" to partition tag values into_--for example:
```
customerID,500
```
Values of the `customerID` tag are bucketed into 500 distinct "buckets."
Each bucket is identified by the remainder of the tag value hashed into a 32bit
integer divided by the specified number of buckets:
```rust
hash(tagValue) % N
```
Generated partition keys include the unique _tag bucket identifier_ specific to
each partition.
**Supported number of tag buckets**: 1-1,000
{{% note %}}
Tag buckets should be used to partition by high cardinality tags or tags with an
unknown number of distinct values.
{{% /note %}}
A partition template may include a given tag key only once in template parts
that operate on tags (tag value and tag bucket)--for example:
If a template partitions on unique values of `tag_A`, then
you can't use `tag_A` as a tag bucket part.
## Time part templates
Time part templates use a limited subset of the
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)
to specify time format in partition keys.
InfluxDB uses the smallest unit of time included in the time part template as
the partition interval.
### Date specifiers
| Variable | Example | Description |
| :------: | :----------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `%Y` | `2001` | The full proleptic Gregorian year, zero-padded to 4 digits. chrono supports years from -262144 to 262143. Note: years before 1 BCE or after 9999 CE, require an initial sign (+/-). |
| `%m` | `07` | Month number (01--12), zero-padded to 2 digits. |
| `%d` | `08` | Day number (01--31), zero-padded to 2 digits. |
<!--
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_partition-templates.md
-->

View File

@ -14,173 +14,9 @@ list_code_example: |
```
related:
- /influxdb/cloud-dedicated/admin/query-system-data/
source: /shared/v3-distributed-admin-custom-partitions/view-partitions.md
---
{{< product-name >}} stores partition information in InfluxDB v3 system tables.
Query partition information to view partition templates and verify partitions
are working as intended.
- [Query partition information from system tables](#query-partition-information-from-system-tables)
- [Partition-related queries](#partition-related-queries)
{{% warn %}}
#### Querying system tables may impact overall cluster performance
Partition information is stored in InfluxDB v3 system tables.
Querying system tables may impact the overall write and query performance of
your {{< product-name omit=" Clustered" >}} cluster.
<!--------------- UPDATE THE DATE BELOW AS EXAMPLES ARE UPDATED --------------->
#### System tables are subject to change
System tables are not part of InfluxDB's stable API and may change with new releases.
The provided schema information and query examples are valid as of **September 24, 2024**.
If you detect a schema change or a non-functioning query example, please
[submit an issue](https://github.com/influxdata/docs-v2/issues/new/choose).
<!--------------- UPDATE THE DATE ABOVE AS EXAMPLES ARE UPDATED --------------->
{{% /warn %}}
## Query partition information from system tables
Use the [`influxctl query` command](/influxdb/cloud-dedicated/reference/cli/influxctl/query/)
and SQL to query partition-related information from InfluxDB system tables.
Provide the following:
- **Enable system tables** with the `--enable-system-tables` command flag.
- **Database token**: A [database token](/influxdb/cloud-dedicated/admin/tokens/#database-tokens)
with read permissions on the specified database. Uses the `token` setting from
the [`influxctl` connection profile](/influxdb/cloud-dedicated/reference/cli/influxctl/#configure-connection-profiles)
or the `--token` command flag.
- **Database name**: The name of the database to query information about.
Uses the `database` setting from the
[`influxctl` connection profile](/influxdb/cloud-dedicated/reference/cli/influxctl/#configure-connection-profiles)
or the `--database` command flag.
- **SQL query**: The SQL query to execute.
Pass the query in one of the following ways:
- a string on the command line
- a path to a file that contains the query
- a single dash (`-`) to read the query from stdin
{{% code-placeholders "DATABASE_(TOKEN|NAME)|SQL_QUERY" %}}
```bash
influxctl query \
--enable-system-tables \
--database DATABASE_NAME \
--token DATABASE_TOKEN \
"SQL_QUERY"
```
{{% /code-placeholders %}}
Replace the following:
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}:
A database token with read access to the specified database
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}:
The name of the database to query information about.
- {{% code-placeholder-key %}}`SQL_QUERY`{{% /code-placeholder-key %}}:
The SQL query to execute. For examples, see
[System query examples](#system-query-examples).
When prompted, enter `y` to acknowledge the potential impact querying system
tables may have on your cluster.
## Partition-related queries
Use the following queries to return information about partitions in your
{{< product-name omit=" Clustered" >}} cluster.
- [View partition templates of all tables](#view-partition-templates-of-all-tables)
- [View the partition template of a specific table](#view-the-partition-template-of-a-specific-table)
- [View all partitions for a table](#view-all-partitions-for-a-table)
- [View the number of partitions per table](#view-the-number-of-partitions-per-table)
- [View the number of partitions for a specific table](#view-the-number-of-partitions-for-a-specific-table)
---
In the examples below, replace {{% code-placeholder-key %}}`TABLE_NAME`{{% /code-placeholder-key %}}
with the name of the table you want to query information about.
---
{{% code-placeholders "TABLE_NAME_(1|2|3)|TABLE_NAME" %}}
### View the partition template of a specific table
```sql
SELECT * FROM system.tables WHERE table_name = 'TABLE_NAME'
```
#### Example results
| table_name | partition_template |
| :--------- | :----------------------------------------------------------------------------------------- |
| weather | `{"parts":[{"timeFormat":"%Y-%m-%d"},{"bucket":{"tagName":"location","numBuckets":250}}]}` |
{{% note %}}
If a table doesn't include a partition template in the output of this command,
the table uses the default (1 day) partition strategy and doesn't partition
by tags or tag buckets.
{{% /note %}}
### View all partitions for a table
```sql
SELECT * FROM system.partitions WHERE table_name = 'TABLE_NAME'
```
### Example results
| partition_id | table_name | partition_key | last_new_file_created_at | num_files | total_size_mb |
| -----------: | :--------- | :---------------- | -----------------------: | --------: | ------------: |
| 1362 | weather | 43 \| 2020-05-27 | 1683747418763813713 | 1 | 0 |
| 800 | weather | 234 \| 2021-08-02 | 1683747421899400796 | 1 | 0 |
| 630 | weather | 325 \| 2022-03-17 | 1683747417616689036 | 1 | 0 |
| 1401 | weather | 12 \| 2021-01-09 | 1683747417786122295 | 1 | 0 |
| 1012 | weather | 115 \| 2022-07-04 | 1683747417614219148 | 1 | 0 |
### View the number of partitions per table
```sql
SELECT
table_name,
COUNT(*) AS partition_count
FROM
system.partitions
WHERE
table_name IN ('TABLE_NAME_1', 'TABLE_NAME_2', 'TABLE_NAME_3')
GROUP BY
table_name
```
### Example results
| table_name | partition_count |
| :--------- | --------------: |
| weather | 1096 |
| home | 24 |
| numbers | 1 |
### View the number of partitions for a specific table
```sql
SELECT
COUNT(*) AS partition_count
FROM
system.partitions
WHERE
table_name = 'TABLE_NAME'
```
### Example results
| table_name | partition_count |
| :--------- | --------------: |
| weather | 1096 |
{{% /code-placeholders %}}
<!--
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/view-partitions.md
-->

View File

@ -46,7 +46,8 @@ Related entries:
### aggregate
A function that returns an aggregated value across a set of points.
For a list of available aggregation functions, see [SQL aggregate functions](/influxdb/cloud-dedicated/reference/sql/functions/aggregate/).
For a list of available aggregation functions,
see [SQL aggregate functions](/influxdb/cloud-dedicated/reference/sql/functions/aggregate/).
<!-- TODO: Add a link to InfluxQL aggregate functions -->
@ -330,6 +331,7 @@ Related entries:
[field](#field),
[field key](#field-key),
[field set](#field-set),
[tag set](#tag-set),
[tag value](#tag-value),
[timestamp](#timestamp)
@ -356,7 +358,7 @@ Related entries:
Flush jitter prevents every Telegraf output plugin from sending writes
simultaneously, which can overwhelm some data sinks.
Each flush interval, every Telegraf output plugin will sleep for a random time
Each flush interval, every Telegraf output plugin sleeps for a random time
between zero and the flush jitter before emitting metrics.
Flush jitter smooths out write spikes when running a large number of Telegraf instances.
@ -400,10 +402,10 @@ Identifiers are tokens that refer to specific database objects such as database
names, field keys, measurement names, tag keys, etc.
Related entries:
[database](#database)
[database](#database),
[field key](#field-key),
[measurement](#measurement),
[tag key](#tag-key),
[tag key](#tag-key)
### influx
@ -422,8 +424,7 @@ and other required processes.
### InfluxDB
An open source time series database (TSDB) developed by InfluxData.
Written in Go and optimized for fast, high-availability storage and retrieval of
An open source time series database (TSDB) developed by InfluxData, optimized for fast, high-availability storage and retrieval of
time series data in fields such as operations monitoring, application metrics,
Internet of Things sensor data, and real-time analytics.
@ -435,8 +436,8 @@ The SQL-like query language used to query data in InfluxDB.
Telegraf input plugins actively gather metrics and deliver them to the core agent,
where aggregator, processor, and output plugins can operate on the metrics.
In order to activate an input plugin, it needs to be enabled and configured in
Telegraf's configuration file.
To activate an input plugin, enable and configure it in the
Telegraf configuration file.
Related entries:
[aggregator plugin](#aggregator-plugin),
@ -760,7 +761,7 @@ in the cluster (replication factor), and the time range covered by shard groups
(shard group duration). RPs are unique per database and along with the measurement
and tag set define a series.
In {{< product-name >}} the equivalent is [retention period](#retention-period),
In {{< product-name >}}, the equivalent is [retention period](#retention-period),
however retention periods are not part of the data model.
The retention period describes the data persistence behavior of a database.
@ -837,8 +838,8 @@ Related entries:
### series
A collection of data in the InfluxDB data structure that share a common
_measurement_, _tag set_, and _field key_.
In the InfluxDB 3 data structure, a collection of data that share a common
_measurement_ and _tag set_.
Related entries:
[field set](#field-set),
@ -847,12 +848,13 @@ Related entries:
### series cardinality
The number of unique measurement, tag set, and field key combinations in an InfluxDB database.
The number of unique measurement (table), tag set, and field key combinations in an InfluxDB database.
For example, assume that an InfluxDB bucket has one measurement.
For example, assume that an InfluxDB database has one measurement.
The single measurement has two tag keys: `email` and `status`.
If there are three different `email`s, and each email address is associated with two
different `status`es, the series cardinality for the measurement is 6
If there are three different `email` tag values,
and each email address is associated with two
different `status` tag values, then the series cardinality for the measurement is 6
(3 × 2 = 6):
| email | status |
@ -867,7 +869,7 @@ different `status`es, the series cardinality for the measurement is 6
In some cases, performing this multiplication may overestimate series cardinality
because of the presence of dependent tags.
Dependent tags are scoped by another tag and do not increase series cardinality.
If we add the tag `firstname` to the example above, the series cardinality
If we add the tag `firstname` to the preceding example, the series cardinality
would not be 18 (3 × 2 × 3 = 18).
The series cardinality would remain unchanged at 6, as `firstname` is already scoped by the `email` tag:
@ -892,7 +894,7 @@ A series key identifies a particular series by measurement, tag set, and field k
For example:
```
```text
# measurement, tag set, field key
h2o_level, location=santa_monica, h2o_feet
```
@ -1129,18 +1131,17 @@ A statement that sets or updates the value stored in a variable.
## W
### WAL (Write Ahead Log) - enterprise
### WAL (Write-Ahead Log)
The temporary cache for recently written points.
To reduce the frequency that permanent storage files are accessed, InfluxDB
caches new points in the WAL until their total size or age triggers a flush to
more permanent storage. This allows for efficient batching of the writes into the TSM.
more permanent storage. This allows for efficient batching of the writes into
the storage engine.
Points in the WAL can be queried and persist through a system reboot.
On process start, all points in the WAL must be flushed before the system accepts new writes.
Related entries:
[tsm](#tsm-time-structured-merge-tree)
Points in the WAL are queryable and persist through a system reboot.
On process start, all points in the WAL must be flushed before the system
accepts new writes.
### windowing

View File

@ -340,6 +340,7 @@ Related entries:
[field](#field),
[field key](#field-key),
[field set](#field-set),
[tag set](#tag-set),
[tag value](#tag-value),
[timestamp](#timestamp)
@ -366,7 +367,7 @@ Related entries:
Flush jitter prevents every Telegraf output plugin from sending writes
simultaneously, which can overwhelm some data sinks.
Each flush interval, every Telegraf output plugin will sleep for a random time
Each flush interval, every Telegraf output plugin sleeps for a random time
between zero and the flush jitter before emitting metrics.
Flush jitter smooths out write spikes when running a large number of Telegraf instances.
@ -434,8 +435,7 @@ and other required processes.
### InfluxDB
An open source time series database (TSDB) developed by InfluxData.
Written in Go and optimized for fast, high-availability storage and retrieval of
An open source time series database (TSDB) developed by InfluxData, optimized for fast, high-availability storage and retrieval of
time series data in fields such as operations monitoring, application metrics,
Internet of Things sensor data, and real-time analytics.
@ -447,8 +447,8 @@ The SQL-like query language used to query data in InfluxDB.
Telegraf input plugins actively gather metrics and deliver them to the core agent,
where aggregator, processor, and output plugins can operate on the metrics.
In order to activate an input plugin, it needs to be enabled and configured in
Telegraf's configuration file.
To activate an input plugin, enable and configure it in the
Telegraf configuration file.
Related entries:
[aggregator plugin](#aggregator-plugin),
@ -471,8 +471,9 @@ Related entries:
### IOx
The IOx (InfluxDB v3) storage engine is a real-time, columnar database optimized for time series
data built in Rust on top of [Apache Arrow](https://arrow.apache.org/) and
The IOx storage engine (InfluxDB v3 storage engine) is a real-time, columnar
database optimized for time series data built in Rust on top of
[Apache Arrow](https://arrow.apache.org/) and
[DataFusion](https://arrow.apache.org/datafusion/user-guide/introduction.html).
IOx replaces the [TSM (Time Structured Merge tree)](#tsm-time-structured-merge-tree) storage engine.
@ -848,8 +849,8 @@ Related entries:
### series
A collection of data in the InfluxDB data structure that share a common
_measurement_, _tag set_, and _field key_.
In the InfluxDB 3 data structure, a collection of data that share a common
_measurement_ and _tag set_.
Related entries:
[field set](#field-set),
@ -860,10 +861,11 @@ Related entries:
The number of unique measurement, tag set, and field key combinations in an {{% product-name %}} bucket.
For example, assume that an InfluxDB bucket has one measurement.
For example, assume that an InfluxDB database has one measurement.
The single measurement has two tag keys: `email` and `status`.
If there are three different `email`s, and each email address is associated with two
different `status`es, the series cardinality for the measurement is 6
If there are three different `email` tag values,
and each email address is associated with two
different `status` tag values, then the series cardinality for the measurement is 6
(3 × 2 = 6):
| email | status |
@ -878,7 +880,7 @@ different `status`es, the series cardinality for the measurement is 6
In some cases, performing this multiplication may overestimate series cardinality
because of the presence of dependent tags.
Dependent tags are scoped by another tag and do not increase series cardinality.
If we add the tag `firstname` to the example above, the series cardinality
If we add the tag `firstname` to the preceding example, the series cardinality
would not be 18 (3 × 2 × 3 = 18).
The series cardinality would remain unchanged at 6, as `firstname` is already scoped by the `email` tag:
@ -1136,18 +1138,17 @@ A statement that sets or updates the value stored in a variable.
## W
### WAL (Write Ahead Log) - enterprise
### WAL (Write-Ahead Log)
The temporary cache for recently written points.
To reduce the frequency that permanent storage files are accessed, InfluxDB
caches new points in the WAL until their total size or age triggers a flush to
more permanent storage. This allows for efficient batching of the writes into the TSM.
more permanent storage. This allows for efficient batching of the writes into
the storage engine.
Points in the WAL can be queried and persist through a system reboot.
On process start, all points in the WAL must be flushed before the system accepts new writes.
Related entries:
[tsm](#tsm-time-structured-merge-tree)
Points in the WAL are queryable and persist through a system reboot.
On process start, all points in the WAL must be flushed before the system
accepts new writes.
### windowing

View File

@ -11,409 +11,9 @@ weight: 104
influxdb/clustered/tags: [storage]
related:
- /influxdb/clustered/reference/internals/storage-engine/
source: /shared/v3-distributed-admin-custom-partitions/_index.md
---
When writing data to {{< product-name >}}, the InfluxDB v3 storage engine stores
data in the [Object store](/influxdb/clustered/reference/internals/storage-engine/#object-store)
in [Apache Parquet](https://parquet.apache.org/) format.
Each Parquet file represents a _partition_--a logical grouping of data.
By default, InfluxDB partitions each table by day.
{{< product-name >}} lets you customize the partitioning strategy and partition
by tag values and different time intervals.
Customize your partitioning strategy to optimize query performance for your
specific schema and workload.
- [Advantages](#advantages)
- [Disadvantages](#disadvantages)
- [Limitations](#limitations)
- [How partitioning works](#how-partitioning-works)
- [Partition templates](#partition-templates)
- [Partition keys](#partition-keys)
- [Partitions in the query life cycle](#partitions-in-the-query-life-cycle)
- [Partition guides](#partition-guides)
{{< children type="anchored-list" >}}
## Advantages
The primary advantage of custom partitioning is that it lets you customize your
storage structure to improve query performance specific to your schema and workload.
- **Optimized storage for improved performance on specific types of queries**.
For example, if queries often select data with a specific tag value, you can
partition by that tag to improve the performance of those queries.
- **Optimized storage for specific types of data**. For example, if the data you
store is sparse and the time ranges you query are often much larger than a day,
you could partition your data by week instead of by day.
## Disadvantages
Using custom partitioning may increase the load on other parts of the
[InfluxDB v3 storage engine](/influxdb/clustered/reference/internals/storage-engine/),
but each can be scaled individually to address the added load.
{{% note %}}
_The following disadvantages assume that your custom partitioning strategy includes
additional tags to partition by or partition intervals smaller than a day._
{{% /note %}}
- **Increased load on the [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester)**
as it groups data into smaller partitions and files.
- **Increased load on the [Catalog](/influxdb/clustered/reference/internals/storage-engine/#catalog)**
as more references to partition Parquet file locations are stored and queried.
- **Increased load on the [Compactor](/influxdb/clustered/reference/internals/storage-engine/#compactor)**
as more partition Parquet files need to be compacted.
- **Increased costs associated with [Object storage](/influxdb/clustered/reference/internals/storage-engine/#object-storage)**
as more partition Parquet files are created and stored.
- **Risk of decreased performance for queries that don't use tags in the WHERE clause**.
These queries may end up reading many partitions and smaller files, degrading performance.
## Limitations
Custom partitioning has the following limitations:
- Database and table partitions can only be defined on create.
You cannot update the partition strategy of a database or table after it has
been created.
- A partition template must include a time part.
- You can partition by up to eight dimensions (seven tags and a time interval).
## How partitioning works
### Partition templates
A partition template defines the pattern used for _[partition keys](#partition-keys)_
and determines the time interval that data is partitioned by.
Partition templates use tag values and
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html).
_For more detailed information, see [Partition templates](/influxdb/clustered/admin/custom-partitions/partition-templates/)._
### Partition keys
A partition key uniquely identifies a partition.
A _[partition template](#partition-templates)_ defines the partition key format.
Partition keys are
composed of up to 8 dimensions (1 time part and up to 7 tag or tag bucket parts).
Each part is delimited by the partition key separator (`|`).
The default format for partition keys is `%Y-%m-%d` (for example, `2024-01-01`).
{{< expand-wrapper >}}
{{% expand "View example partition templates and keys" %}}
Given the following line protocol with the following timestamps:
- 2023-12-31T23:00:00Z
- 2024-01-01T00:00:00Z
- 2024-01-01T01:00:00Z
```text
production,line=A,station=cnc temp=81.2,qty=35i 1704063600000000000
production,line=A,station=wld temp=92.8,qty=35i 1704063600000000000
production,line=B,station=cnc temp=101.1,qty=43i 1704063600000000000
production,line=B,station=wld temp=102.4,qty=43i 1704063600000000000
production,line=A,station=cnc temp=81.9,qty=36i 1704067200000000000
production,line=A,station=wld temp=110.0,qty=22i 1704067200000000000
production,line=B,station=cnc temp=101.8,qty=44i 1704067200000000000
production,line=B,station=wld temp=105.7,qty=44i 1704067200000000000
production,line=A,station=cnc temp=82.2,qty=35i 1704070800000000000
production,line=A,station=wld temp=92.1,qty=30i 1704070800000000000
production,line=B,station=cnc temp=102.4,qty=43i 1704070800000000000
production,line=B,station=wld temp=106.5,qty=43i 1704070800000000000
```
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 1 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `2023-12-31`
- `2024-01-01`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 1 ---------------------->
{{% /flex %}}
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 2 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `line` <em class="op50">tag</em>
- `%d %b %Y` <em class="op50">time (by day, non-default format)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `A | 31 Dec 2023`
- `B | 31 Dec 2023`
- `A | 01 Jan 2024`
- `B | 01 Jan 2024`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 2 ---------------------->
{{% /flex %}}
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 3 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `line` <em class="op50">tag</em>
- `station` <em class="op50">tag</em>
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `A | cnc | 2023-12-31`
- `A | wld | 2023-12-31`
- `B | cnc | 2023-12-31`
- `B | wld | 2023-12-31`
- `A | cnc | 2024-01-01`
- `A | wld | 2024-01-01`
- `B | cnc | 2024-01-01`
- `B | wld | 2024-01-01`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 3 ---------------------->
{{% /flex %}}
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 4 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `line` <em class="op50">tag</em>
- `station,3` <em class="op50">tag bucket</em>
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `A | 0 | 2023-12-31`
- `B | 0 | 2023-12-31`
- `A | 0 | 2024-01-01`
- `B | 0 | 2024-01-01`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 4 ---------------------->
{{% /flex %}}
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 5 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `line` <em class="op50">tag</em>
- `station` <em class="op50">tag</em>
- `%Y-%m-%d %H:00` <em class="op50">time (by hour)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `A | cnc | 2023-12-31 23:00`
- `A | wld | 2023-12-31 23:00`
- `B | cnc | 2023-12-31 23:00`
- `B | wld | 2023-12-31 23:00`
- `A | cnc | 2024-01-01 00:00`
- `A | wld | 2024-01-01 00:00`
- `B | cnc | 2024-01-01 00:00`
- `B | wld | 2024-01-01 00:00`
- `A | cnc | 2024-01-01 01:00`
- `A | wld | 2024-01-01 01:00`
- `B | cnc | 2024-01-01 01:00`
- `B | wld | 2024-01-01 01:00`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 5 ---------------------->
{{% /flex %}}
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 6 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `line` <em class="op50">tag</em>
- `station,50` <em class="op50">tag bucket</em>
- `%Y-%m-%d %H:00` <em class="op50">time (by hour)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `A | 47 | 2023-12-31 23:00`
- `A | 9 | 2023-12-31 23:00`
- `B | 47 | 2023-12-31 23:00`
- `B | 9 | 2023-12-31 23:00`
- `A | 47 | 2024-01-01 00:00`
- `A | 9 | 2024-01-01 00:00`
- `B | 47 | 2024-01-01 00:00`
- `B | 9 | 2024-01-01 00:00`
- `A | 47 | 2024-01-01 01:00`
- `A | 9 | 2024-01-01 01:00`
- `B | 47 | 2024-01-01 01:00`
- `B | 9 | 2024-01-01 01:00`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 6 ---------------------->
{{% /flex %}}
{{% /expand %}}
{{< /expand-wrapper >}}
## Partitions in the query life cycle
When querying data:
1. The [Catalog](/influxdb/clustered/reference/internals/storage-engine/#catalog)
provides the v3 query engine ([Querier](/influxdb/clustered/reference/internals/storage-engine/#querier))
with the locations of partitions that contain the queried time series data.
2. The query engine reads all rows in the returned partitions to identify what
rows match the logic in the query and should be included in the query result.
The faster the query engine can identify what partitions to read and then read
the data in those partitions, the more performant queries are.
_For more information about the query lifecycle, see
[InfluxDB v3 query life cycle](/influxdb/clustered/reference/internals/storage-engine/#query-life-cycle)._
##### Query example
Consider the following query that selects everything in the `production` table
where the `line` tag is `A` and the `station` tag is `cnc`:
```sql
SELECT *
FROM production
WHERE
time >= now() - INTERVAL '1 week'
AND line = 'A'
AND station = 'cnc'
```
Using the default partitioning strategy (by day), the query engine
reads eight separate partitions (one partition for today and one for each of the
last seven days):
- {{< datetime/current-date trimTime=true >}}
- {{< datetime/current-date offset=-1 trimTime=true >}}
- {{< datetime/current-date offset=-2 trimTime=true >}}
- {{< datetime/current-date offset=-3 trimTime=true >}}
- {{< datetime/current-date offset=-4 trimTime=true >}}
- {{< datetime/current-date offset=-5 trimTime=true >}}
- {{< datetime/current-date offset=-6 trimTime=true >}}
- {{< datetime/current-date offset=-7 trimTime=true >}}
The query engine must scan _all_ rows in the partitions to identify rows
where `line` is `A` and `station` is `cnc`. This process takes valuable time
and results in less performant queries.
However, if you partition by other tags, InfluxDB can identify partitions that
contain only the tag values your query needs and spend less time
scanning rows to see if they contain the tag values.
For example, if data is partitioned by `line`, `station`, and day, although
there are more partition files, the query engine can quickly identify and read
only those with data relevant to the query:
{{% columns 4 %}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date trimTime=true >}}
- B | cnc | {{< datetime/current-date trimTime=true >}}
- B | wld | {{< datetime/current-date trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-1 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-1 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-1 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-1 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-2 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-2 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-2 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-2 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-3 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-3 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-3 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-3 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-4 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-4 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-4 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-4 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-5 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-5 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-5 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-5 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-6 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-6 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-6 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-6 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-7 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-7 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-7 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-7 trimTime=true >}}
{{% /columns %}}
---
## Partition guides
{{< children >}}
<!--
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_index.md
-->

View File

@ -8,49 +8,9 @@ menu:
name: Best practices
parent: Manage data partitioning
weight: 202
source: /shared/v3-distributed-admin-custom-partitions/best-practices.md
---
Use the following best practices when defining custom partitioning strategies
for your data stored in {{< product-name >}}.
- [Partition by tags that you commonly query for a specific value](#partition-by-tags-that-you-commonly-query-for-a-specific-value)
- [Only partition by tags that _always_ have a value](#only-partition-by-tags-that-always-have-a-value)
- [Avoid over-partitioning](#avoid-over-partitioning)
## Partition by tags that you commonly query for a specific value
Custom partitioning primarily benefits queries that look for a specific tag
value in the `WHERE` clause. For example, if you often query data related to a
specific ID, partitioning by the tag that stores the ID helps the InfluxDB
query engine to more quickly identify what partitions contain the relevant data.
{{% note %}}
#### Use tag buckets for high-cardinality tags
Partitioning using distinct values of tags with many (10K+) unique values can
actually hurt query performance as partitions are created for each unique tag value.
Instead, use [tag buckets](/influxdb/clustered/admin/custom-partitions/partition-templates/#tag-bucket-part-templates)
to partition by high-cardinality tags.
This method of partitioning groups tag values into "buckets" and partitions by bucket.
{{% /note %}}
## Only partition by tags that _always_ have a value
You should only partition by tags that _always_ have a value.
If points don't have a value for the tag, InfluxDB can't store them in the correct partitions and, at query time, must read all the partitions.
## Avoid over-partitioning
As you plan your partitioning strategy, keep in mind that data can be
"over-partitioned"--meaning partitions are so granular that queries end up
having to retrieve and read many partitions from the object store, which
hurts query performance.
- Balance the partition time interval with the actual amount of data written
during each interval. If a single interval doesn't contain a lot of data,
it is better to partition by larger time intervals.
- Don't partition by tags that you typically don't use in your query workload.
- Don't partition by distinct values of high-cardinality tags.
Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to
partition by these tags.
<!--
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_index.md
-->

View File

@ -10,161 +10,9 @@ weight: 202
related:
- /influxdb/clustered/reference/cli/influxctl/database/create/
- /influxdb/clustered/reference/cli/influxctl/table/create/
source: /shared/v3-distributed-admin-custom-partitions/define-custom-partitions.md
---
Use the [`influxctl` CLI](/influxdb/clustered/reference/cli/influxctl/)
to define custom partition strategies when creating a database or table.
By default, {{< product-name >}} partitions data by day.
The partitioning strategy of a database or table is determined by a
[partition template](/influxdb/clustered/admin/custom-partitions/#partition-templates)
which defines the naming pattern for [partition keys](/influxdb/clustered/admin/custom-partitions/#partition-keys).
Partition keys uniquely identify each partition.
When a partition template is applied to a database, it becomes the default template
for all tables in that database, but can be overridden when creating a
table.
- [Create a database with a custom partition template](#create-a-database-with-a-custom-partition-template)
- [Create a table with a custom partition template](#create-a-table-with-a-custom-partition-template)
- [Example partition templates](#example-partition-templates)
{{% warn %}}
#### Partition templates can only be applied on create
You can only apply a partition template when creating a database or table.
You can't update a partition template on an existing resource.
{{% /warn %}}
Use the following command flags to identify
[partition template parts](/influxdb/clustered/admin/custom-partitions/partition-templates/#tag-part-templates):
- `--template-tag`: An [InfluxDB tag](/influxdb/clustered/reference/glossary/#tag)
to use in the partition template.
- `--template-tag-bucket`: An [InfluxDB tag](/influxdb/clustered/reference/glossary/#tag)
and number of "buckets" to group tag values into.
Provide the tag key and the number of buckets to bucket tag values into
separated by a comma: `tagKey,N`.
- `--template-timeformat`: A [Rust strftime date and time](/influxdb/clustered/admin/custom-partitions/partition-templates/#time-part-templates)
string that specifies the time format in the partition template and determines
the time interval to partition by.
{{% note %}}
A partition template can include up to 7 total tag and tag bucket parts
and only 1 time part.
{{% /note %}}
_View [partition template part restrictions](/influxdb/clustered/admin/custom-partitions/partition-templates/#restrictions)._
{{% note %}}
#### Always provide a time format when using custom partitioning
When defining a custom partition template for your database or table using any
of the `influxctl` `--template-*` flags, always include the `--template-timeformat`
flag with a time format to use in your partition template.
Otherwise, InfluxDB omits time from the partition template and won't compact partitions.
{{% /note %}}
## Create a database with a custom partition template
The following example creates a new `example-db` database and applies a partition
template that partitions by distinct values of two tags (`room` and `sensor-type`),
bucketed values of the `customerID` tag, and by day using the time format `%Y-%m-%d`:
<!--Skip database create and delete tests: namespaces aren't reusable-->
<!--pytest.mark.skip-->
```sh
influxctl database create \
--template-tag room \
--template-tag sensor-type \
--template-tag-bucket customerID,500 \
--template-timeformat '%Y-%m-%d' \
example-db
```
## Create a table with a custom partition template
The following example creates a new `example-table` table in the specified
database and applies a partition template that partitions by distinct values of
two tags (`room` and `sensor-type`), bucketed values of the `customerID` tag,
and by month using the time format `%Y-%m`:
<!--Skip database create and delete tests: namespaces aren't reusable-->
<!--pytest.mark.skip-->
{{% code-placeholders "DATABASE_NAME" %}}
```sh
influxctl table create \
--template-tag room \
--template-tag sensor-type \
--template-tag-bucket customerID,500 \
--template-timeformat '%Y-%m' \
DATABASE_NAME \
example-table
```
{{% /code-placeholders %}}
Replace the following in your command:
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} [database](/influxdb/clustered/admin/databases/)
<!--actual test
```sh
# Test the preceding command outside of the code block.
# influxctl authentication requires TTY interaction--
# output the auth URL to a file that the host can open.
TABLE_NAME=table_TEST_RUN
script -c "influxctl table create \
--template-tag room \
--template-tag sensor-type \
--template-tag-bucket customerID,500 \
--template-timeformat '%Y-%m' \
DATABASE_NAME \
$TABLE_NAME" \
/dev/null > /shared/urls.txt
script -c "influxctl query \
--database DATABASE_NAME \
--token DATABASE_TOKEN \
'SHOW TABLES'" > /shared/temp_tables.txt
grep -q $TABLE_NAME /shared/temp_tables.txt
rm /shared/temp_tables.txt
```
<!--
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_define-custom-partitions.md
-->
## Example partition templates
Given the following [line protocol](/influxdb/clustered/reference/syntax/line-protocol/)
with a `2024-01-01T00:00:00Z` timestamp:
```text
prod,line=A,station=weld1 temp=81.9,qty=36i 1704067200000000000
```
##### Partitioning by distinct tag values
| Description | Tag parts | Time part | Resulting partition key |
| :---------------------- | :---------------- | :--------- | :----------------------- |
| By day (default) | | `%Y-%m-%d` | 2024-01-01 |
| By month | | `%Y-%m` | 2024-01 |
| By year | | `%Y` | 2024 |
| Single tag, by day | `line` | `%Y-%m-%d` | A \| 2024-01-01 |
| Single tag, by month | `line` | `%Y-%m` | A \| 2024-01 |
| Single tag, by year | `line` | `%Y` | A \| 2024 |
| Multiple tags, by day | `line`, `station` | `%Y-%m-%d` | A \| weld1 \| 2024-01-01 |
| Multiple tags, by month | `line`, `station` | `%Y-%m` | A \| weld1 \| 2024-01 |
| Multiple tags, by year | `line`, `station` | `%Y` | A \| weld1 \| 2024 |
##### Partition by tag buckets
| Description | Tag part | Tag bucket part | Time part | Resulting partition key |
| :---------------------------------- | :------- | :-------------- | :--------- | :---------------------- |
| Distinct tag, tag buckets, by day | `line` | `station,100` | `%Y-%m-%d` | A \| 3 \| 2024-01-01 |
| Distinct tag, tag buckets, by month | `line` | `station,500` | `%Y-%m` | A \| 303 \| 2024-01 |

View File

@ -8,124 +8,9 @@ menu:
influxdb_clustered:
parent: Manage data partitioning
weight: 202
source: /shared/v3-distributed-admin-custom-partitions/partition-templates.md
---
Use partition templates to define the patterns used to generate partition keys.
A partition key uniquely identifies a partition and is used to name the partition
Parquet file in the [Object store](/influxdb/clustered/reference/internals/storage-engine/#object-store).
A partition template consists of 1-8 _template parts_---dimensions to partition data by.
Three types of template parts exist:
- **tag**: An [InfluxDB tag](/influxdb/clustered/reference/glossary/#tag)
to partition by.
- **tag bucket**: An [InfluxDB tag](/influxdb/clustered/reference/glossary/#tag)
and number of "buckets" to group tag values into. Data is partitioned by the
tag bucket rather than each distinct tag value.
- {{< req type="key" >}} **time**: A Rust strftime date and time string that specifies the time interval
to partition data by. The smallest unit of time included in the time part
template is the interval used to partition data.
{{% note %}}
A partition template must include 1 [time part](#time-part-templates)
and can include up to 7 total [tag](#tag-part-templates) and [tag bucket](#tag-bucket-part-templates) parts.
{{% /note %}}
<!-- TOC -->
- [Restrictions](#restrictions)
- [Template part size limit](#template-part-size-limit)
- [Reserved keywords](#reserved-keywords)
- [Reserved Characters](#reserved-characters)
- [Tag part templates](#tag-part-templates)
- [Tag bucket part templates](#tag-bucket-part-templates)
- [Time part templates](#time-part-templates)
<!-- /TOC -->
## Restrictions
### Template part size limit
Each template part is limited to 200 bytes in length.
Anything longer will be truncated at 200 bytes and appended with `#`.
### Partition key size limit
With the truncation of template parts, the maximum length of a partition key is
1,607 bytes (1.57 KiB).
### Reserved keywords
The following reserved keywords cannot be used in partition templates:
- `time`
### Reserved Characters
If used in template parts, non-ASCII characters and the following reserved
characters must be [percent encoded](https://developer.mozilla.org/en-US/docs/Glossary/Percent-encoding):
- `|`: Partition key part delimiter
- `!`: Null or missing partition key part
- `^`: Empty string partition key part
- `#`: Key part truncation marker
- `%`: Required for unambiguous reversal of percent encoding
## Tag part templates
Tag part templates consist of a _tag key_ to partition by.
Generated partition keys include the unique _tag value_ specific to each partition.
A partition template may include a given tag key only once in template parts
that operate on tags (tag value and tag bucket)--for example:
If a template partitions on unique values of `tag_A`, then
you can't use `tag_A` as a tag bucket part.
## Tag bucket part templates
Tag bucket part templates consist of a _tag key_ to partition by and the
_number of "buckets" to partition tag values into_--for example:
```
customerID,500
```
Values of the `customerID` tag are bucketed into 500 distinct "buckets."
Each bucket is identified by the remainder of the tag value hashed into a 32bit
integer divided by the specified number of buckets:
```rust
hash(tagValue) % N
```
Generated partition keys include the unique _tag bucket identifier_ specific to
each partition.
**Supported number of tag buckets**: 1-1,000
{{% note %}}
Tag buckets should be used to partition by high cardinality tags or tags with an
unknown number of distinct values.
{{% /note %}}
A partition template may include a given tag key only once in template parts
that operate on tags (tag value and tag bucket)--for example:
If a template partitions on unique values of `tag_A`, then
you can't use `tag_A` as a tag bucket part.
## Time part templates
Time part templates use a limited subset of the
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)
to specify time format in partition keys.
InfluxDB uses the smallest unit of time included in the time part template as
the partition interval.
### Date specifiers
| Variable | Example | Description |
| :------: | :----------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `%Y` | `2001` | The full proleptic Gregorian year, zero-padded to 4 digits. chrono supports years from -262144 to 262143. Note: years before 1 BCE or after 9999 CE, require an initial sign (+/-). |
| `%m` | `07` | Month number (01--12), zero-padded to 2 digits. |
| `%d` | `08` | Day number (01--31), zero-padded to 2 digits. |
<!--
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_partition-templates.md
-->

View File

@ -14,173 +14,9 @@ list_code_example: |
```
related:
- /influxdb/clustered/admin/query-system-data/
source: /shared/v3-distributed-admin-custom-partitions/view-partitions.md
---
{{< product-name >}} stores partition information in InfluxDB v3 system tables.
Query partition information to view partition templates and verify partitions
are working as intended.
- [Query partition information from system tables](#query-partition-information-from-system-tables)
- [Partition-related queries](#partition-related-queries)
{{% warn %}}
#### Querying system tables may impact overall cluster performance
Partition information is stored in InfluxDB v3 system tables.
Querying system tables may impact the overall write and query performance of
your {{< product-name omit=" Clustered" >}} cluster.
<!--------------- UPDATE THE DATE BELOW AS EXAMPLES ARE UPDATED --------------->
#### System tables are subject to change
System tables are not part of InfluxDB's stable API and may change with new releases.
The provided schema information and query examples are valid as of **September 24, 2024**.
If you detect a schema change or a non-functioning query example, please
[submit an issue](https://github.com/influxdata/docs-v2/issues/new/choose).
<!--------------- UPDATE THE DATE ABOVE AS EXAMPLES ARE UPDATED --------------->
{{% /warn %}}
## Query partition information from system tables
Use the [`influxctl query` command](/influxdb/clustered/reference/cli/influxctl/query/)
and SQL to query partition-related information from InfluxDB system tables.
Provide the following:
- **Enable system tables** with the `--enable-system-tables` command flag.
- **Database token**: A [database token](/influxdb/clustered/admin/tokens/#database-tokens)
with read permissions on the specified database. Uses the `token` setting from
the [`influxctl` connection profile](/influxdb/clustered/reference/cli/influxctl/#configure-connection-profiles)
or the `--token` command flag.
- **Database name**: The name of the database to query information about.
Uses the `database` setting from the
[`influxctl` connection profile](/influxdb/clustered/reference/cli/influxctl/#configure-connection-profiles)
or the `--database` command flag.
- **SQL query**: The SQL query to execute.
Pass the query in one of the following ways:
- a string on the command line
- a path to a file that contains the query
- a single dash (`-`) to read the query from stdin
{{% code-placeholders "DATABASE_(TOKEN|NAME)|SQL_QUERY" %}}
```bash
influxctl query \
--enable-system-tables \
--database DATABASE_NAME \
--token DATABASE_TOKEN \
"SQL_QUERY"
```
{{% /code-placeholders %}}
Replace the following:
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}:
A database token with read access to the specified database
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}:
The name of the database to query information about.
- {{% code-placeholder-key %}}`SQL_QUERY`{{% /code-placeholder-key %}}:
The SQL query to execute. For examples, see
[System query examples](#system-query-examples).
When prompted, enter `y` to acknowledge the potential impact querying system
tables may have on your cluster.
## Partition-related queries
Use the following queries to return information about partitions in your
{{< product-name omit=" Clustered" >}} cluster.
- [View partition templates of all tables](#view-partition-templates-of-all-tables)
- [View the partition template of a specific table](#view-the-partition-template-of-a-specific-table)
- [View all partitions for a table](#view-all-partitions-for-a-table)
- [View the number of partitions per table](#view-the-number-of-partitions-per-table)
- [View the number of partitions for a specific table](#view-the-number-of-partitions-for-a-specific-table)
---
In the examples below, replace {{% code-placeholder-key %}}`TABLE_NAME`{{% /code-placeholder-key %}}
with the name of the table you want to query information about.
---
{{% code-placeholders "TABLE_NAME_(1|2|3)|TABLE_NAME" %}}
### View the partition template of a specific table
```sql
SELECT * FROM system.tables WHERE table_name = 'TABLE_NAME'
```
#### Example results
| table_name | partition_template |
| :--------- | :----------------------------------------------------------------------------------------- |
| weather | `{"parts":[{"timeFormat":"%Y-%m-%d"},{"bucket":{"tagName":"location","numBuckets":250}}]}` |
{{% note %}}
If a table doesn't include a partition template in the output of this command,
the table uses the default (1 day) partition strategy and doesn't partition
by tags or tag buckets.
{{% /note %}}
### View all partitions for a table
```sql
SELECT * FROM system.partitions WHERE table_name = 'TABLE_NAME'
```
### Example results
| partition_id | table_name | partition_key | last_new_file_created_at | num_files | total_size_mb |
| -----------: | :--------- | :---------------- | -----------------------: | --------: | ------------: |
| 1362 | weather | 43 \| 2020-05-27 | 1683747418763813713 | 1 | 0 |
| 800 | weather | 234 \| 2021-08-02 | 1683747421899400796 | 1 | 0 |
| 630 | weather | 325 \| 2022-03-17 | 1683747417616689036 | 1 | 0 |
| 1401 | weather | 12 \| 2021-01-09 | 1683747417786122295 | 1 | 0 |
| 1012 | weather | 115 \| 2022-07-04 | 1683747417614219148 | 1 | 0 |
### View the number of partitions per table
```sql
SELECT
table_name,
COUNT(*) AS partition_count
FROM
system.partitions
WHERE
table_name IN ('TABLE_NAME_1', 'TABLE_NAME_2', 'TABLE_NAME_3')
GROUP BY
table_name
```
### Example results
| table_name | partition_count |
| :--------- | --------------: |
| weather | 1096 |
| home | 24 |
| numbers | 1 |
### View the number of partitions for a specific table
```sql
SELECT
COUNT(*) AS partition_count
FROM
system.partitions
WHERE
table_name = 'TABLE_NAME'
```
### Example results
| table_name | partition_count |
| :--------- | --------------: |
| weather | 1096 |
{{% /code-placeholders %}}
<!--
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/view-partitions.md
-->

View File

@ -46,7 +46,8 @@ Related entries:
### aggregate
A function that returns an aggregated value across a set of points.
For a list of available aggregation functions, see [SQL aggregate functions](/influxdb/clustered/reference/sql/functions/aggregate/).
For a list of available aggregation functions,
see [SQL aggregate functions](/influxdb/clustered/reference/sql/functions/aggregate/).
<!-- TODO: Add a link to InfluxQL aggregate functions -->
@ -333,6 +334,7 @@ Related entries:
[field](#field),
[field key](#field-key),
[field set](#field-set),
[tag set](#tag-set),
[tag value](#tag-value),
[timestamp](#timestamp)
@ -403,10 +405,10 @@ Identifiers are tokens that refer to specific database objects such as database
names, field keys, measurement names, tag keys, etc.
Related entries:
[database](#database)
[database](#database),
[field key](#field-key),
[measurement](#measurement),
[tag key](#tag-key),
[tag key](#tag-key)
### influx
@ -425,8 +427,8 @@ and other required processes.
### InfluxDB
An open source time series database (TSDB) developed by InfluxData.
Written in Go and optimized for fast, high-availability storage and retrieval of
An open source time series database (TSDB) developed by InfluxData, optimized
for fast, high-availability storage and retrieval of
time series data in fields such as operations monitoring, application metrics,
Internet of Things sensor data, and real-time analytics.
@ -438,8 +440,8 @@ The SQL-like query language used to query data in InfluxDB.
Telegraf input plugins actively gather metrics and deliver them to the core agent,
where aggregator, processor, and output plugins can operate on the metrics.
In order to activate an input plugin, it needs to be enabled and configured in
Telegraf's configuration file.
To activate an input plugin, enable and configure it in the
Telegraf configuration file.
Related entries:
[aggregator plugin](#aggregator-plugin),
@ -752,7 +754,7 @@ relative to [now](#now).
The minimum retention period is **one hour**.
Related entries:
[bucket](#bucket),
[bucket](#bucket)
### retention policy (RP)
@ -839,8 +841,8 @@ Related entries:
### series
A collection of data in the InfluxDB data structure that share a common
_measurement_, _tag set_, and _field key_.
In the InfluxDB 3 data structure, a collection of data that share a common
_measurement_ and _tag set_.
Related entries:
[field set](#field-set),
@ -849,12 +851,13 @@ Related entries:
### series cardinality
The number of unique measurement, tag set, and field key combinations in an InfluxDB database.
The number of unique measurement (table), tag set, and field key combinations in an InfluxDB database.
For example, assume that an InfluxDB database has one measurement.
The single measurement has two tag keys: `email` and `status`.
If there are three different `email`s, and each email address is associated with two
different `status`es, the series cardinality for the measurement is 6
If there are three different `email` tag values,
and each email address is associated with two
different `status` tag values, then the series cardinality for the measurement is 6
(3 × 2 = 6):
| email | status |
@ -869,7 +872,7 @@ different `status`es, the series cardinality for the measurement is 6
In some cases, performing this multiplication may overestimate series cardinality
because of the presence of dependent tags.
Dependent tags are scoped by another tag and do not increase series cardinality.
If we add the tag `firstname` to the example above, the series cardinality
If we add the tag `firstname` to the preceding example, the series cardinality
would not be 18 (3 × 2 × 3 = 18).
The series cardinality would remain unchanged at 6, as `firstname` is already scoped by the `email` tag:
@ -1048,7 +1051,7 @@ Related entries: [aggregate](#aggregate), [function](#function), [selector](#sel
The InfluxDB v1 and v2 data storage format that allows greater compaction and
higher write and read throughput than B+ or LSM tree implementations.
The TSM storage engine has been replaced by [the InfluxDB v3 storage engine (IOx)](#iox).
The TSM storage engine has been replaced by the [InfluxDB v3 storage engine (IOx)](#iox).
Related entries:
[IOx](#iox)
@ -1143,9 +1146,6 @@ Points in the WAL are queryable and persist through a system reboot.
On process start, all points in the WAL must be flushed before the system
accepts new writes.
Related entries:
[tsm](#tsm-time-structured-merge-tree)
### windowing
Grouping data based on specified time intervals.

25
content/shared/README.md Normal file
View File

@ -0,0 +1,25 @@
# Shared content
This section is for content shared across multiple products and versions.
The `/shared/_index.md` frontmatter, marks the `/shared` directory and its
children as draft so they
don't get rendered when the site is built, but the contents of each shared
documented is included in pages that use the file as a `source` in their
frontmatter.
## Use shared content
1. Create a new folder for the content in the `content/shared/` directory.
2. Copy the markdown files into the new folder.
3. Remove the frontmatter from the markdown files in the shared directory. If the first line starts with a shortcode, add an HTML comment before the first line, otherwise hugo will err.
4. In each of the files that use the shared content, add a source to the frontmatter that points to the shared markdown file—for example:
```markdown
source: /shared/influxql-v3-reference/regular-expressions.md
```
5. In the doc body, remove the shared Markdown text and add a comment that points to the shared file, in case someone happens upon the page in the repo--for example, in `/content/3/core/reference/influxql/regular-expressions.md`, add the following:
<!--
The content of this page is at /content/shared/influxql-v3-reference/regular-expressions.md
-->

View File

@ -9,3 +9,6 @@ The `/shared` directory and all of its children are marked as draft so they
don't get rendered when the site is built, but the contents of each shared
documented is included in pages that use the file as a `source` in their
frontmatter.
See the `/shared/README.md` for instructions on creating and using shared content.

View File

@ -0,0 +1,414 @@
When writing data to {{< product-name >}}, the InfluxDB v3 storage engine stores data in [Apache Parquet](https://parquet.apache.org/) format in the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store). Each Parquet file represents a _partition_--a logical grouping of data.
By default, InfluxDB partitions each table _by day_.
If this default strategy yields unsatisfactory performance for single-series queries,
you can define a custom partitioning strategy by specifying tag values and different time intervals to optimize query performance for your specific schema and workload.
- [Advantages](#advantages)
- [Disadvantages](#disadvantages)
- [Limitations](#limitations)
- [Plan for custom partitioning](#plan-for-custom-partitioning)
- [How partitioning works](#how-partitioning-works)
- [Partition templates](#partition-templates)
- [Partition keys](#partition-keys)
- [Partitions in the query life cycle](#partitions-in-the-query-life-cycle)
- [Partition guides](#partition-guides)
{{< children type="anchored-list" >}}
> [!Note]
>
> #### When to consider custom partitioning
>
> Consider custom partitioning if:
>
> 1. You have taken steps to [optimize your queries](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/optimize-queries/), and
> 2. Performance for _single-series queries_ (querying for a specific [tag value](/influxdb/cloud-dedicated/reference/glossary/#tag-value) or [tag set](/influxdb/cloud-dedicated/reference/glossary/#tag-set)) is still unsatisfactory.
>
> Before choosing a partitioning strategy, weigh the [advantages](#advantages), [disadvantages](#disadvantages), and [limitations](#limitations) of custom partitioning.
## Advantages
The primary advantage of custom partitioning is that it lets you customize your
storage structure to improve query performance specific to your schema and workload.
- **Optimized storage for improved performance on specific types of queries**.
For example, if queries often select data with a specific tag value, you can
partition by that tag to improve the performance of those queries.
- **Optimized storage for specific types of data**. For example, if the data you
store is sparse and the time ranges you query are often much larger than a day,
you could partition your data by month instead of by day.
## Disadvantages
Using custom partitioning may increase the load on other parts of the
[InfluxDB v3 storage engine](/influxdb/cloud-dedicated/reference/internals/storage-engine/),
but you can scale each part individually to address the added load.
{{% note %}}
_The weight of these disadvantages depends upon the cardinality of
tags and the specificity of time intervals used for partitioning._
{{% /note %}}
- **Increased load on the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester)**
as it groups data into smaller partitions and files.
- **Increased load on the [Catalog](/influxdb/cloud-dedicated/reference/internals/storage-engine/#catalog)**
as more references to partition Parquet file locations are stored and queried.
- **Increased load on the [Compactor](/influxdb/cloud-dedicated/reference/internals/storage-engine/#compactor)**
as it needs to compact more partition Parquet files.
- **Increased costs associated with [Object storage](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-storage)**
as more partition Parquet files are created and stored.
- **Increased latency**. The amount of time for InfluxDB to process a query and return results increases linearly, although slightly, with the total partition count for a table.
- **Risk of decreased performance for queries that don't use tags in the WHERE clause**.
These queries might read many partitions and smaller files, which can degrade performance.
## Limitations
Custom partitioning has the following limitations:
- Define database and table partitions only during creation; you can't update the partition strategy afterward.
- Include a time part in a partition template.
- You can partition by up to eight dimensions (seven tags and a time interval).
## Plan for custom partitioning
After you have considered the [advantages](#advantages), [disadvantages](#disadvantages), and [limitations](#limitations) of
custom partitioning, use the guides in this section to:
1. Learn [how partitioning works](#how-partitioning-works)
2. Follow [best practices](/influxdb/cloud-dedicated/admin/custom-partitions/best-practices/) for defining partitions and managing partition
growth
3. [Define custom partitions](/influxdb/cloud-dedicated/admin/custom-partitions/define-custom-partitions/) for your data
4. Take steps to [limit the number of partition files](/influxdb/cloud-dedicated/admin/custom-partitions/best-practices/#limit-the-number-of-partition-files)
## How partitioning works
### Partition templates
A partition template defines the pattern used for _[partition keys](#partition-keys)_
and determines the time interval that InfluxDB partitions data by.
Partition templates use tag values and
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html).
_For more detailed information, see [Partition templates](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/)._
### Partition keys
A partition key uniquely identifies a partition.
A _[partition template](#partition-templates)_ defines the partition key format.
Partition keys are
composed of up to 8 dimensions (1 time part and up to 7 tag or tag bucket parts).
A partition key uses the partition key separator (`|`) to delimit parts.
The default format for partition keys is `%Y-%m-%d` (for example, `2024-01-01`),
which creates 1 partition for each day.
{{< expand-wrapper >}}
{{% expand "View example partition templates and keys" %}}
Given the following line protocol with the following timestamps:
- 2023-12-31T23:00:00Z
- 2024-01-01T00:00:00Z
- 2024-01-01T01:00:00Z
```text
production,line=A,station=cnc temp=81.2,qty=35i 1704063600000000000
production,line=A,station=wld temp=92.8,qty=35i 1704063600000000000
production,line=B,station=cnc temp=101.1,qty=43i 1704063600000000000
production,line=B,station=wld temp=102.4,qty=43i 1704063600000000000
production,line=A,station=cnc temp=81.9,qty=36i 1704067200000000000
production,line=A,station=wld temp=110.0,qty=22i 1704067200000000000
production,line=B,station=cnc temp=101.8,qty=44i 1704067200000000000
production,line=B,station=wld temp=105.7,qty=44i 1704067200000000000
production,line=A,station=cnc temp=82.2,qty=35i 1704070800000000000
production,line=A,station=wld temp=92.1,qty=30i 1704070800000000000
production,line=B,station=cnc temp=102.4,qty=43i 1704070800000000000
production,line=B,station=wld temp=106.5,qty=43i 1704070800000000000
```
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 1 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `2023-12-31`
- `2024-01-01`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 1 ---------------------->
{{% /flex %}}
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 2 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `line` <em class="op50">tag</em>
- `%d %b %Y` <em class="op50">time (by day, non-default format)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `A | 31 Dec 2023`
- `B | 31 Dec 2023`
- `A | 01 Jan 2024`
- `B | 01 Jan 2024`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 2 ---------------------->
{{% /flex %}}
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 3 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `line` <em class="op50">tag</em>
- `station` <em class="op50">tag</em>
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `A | cnc | 2023-12-31`
- `A | wld | 2023-12-31`
- `B | cnc | 2023-12-31`
- `B | wld | 2023-12-31`
- `A | cnc | 2024-01-01`
- `A | wld | 2024-01-01`
- `B | cnc | 2024-01-01`
- `B | wld | 2024-01-01`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 3 ---------------------->
{{% /flex %}}
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 4 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `line` <em class="op50">tag</em>
- `station,3` <em class="op50">tag bucket</em>
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `A | 0 | 2023-12-31`
- `B | 0 | 2023-12-31`
- `A | 0 | 2024-01-01`
- `B | 0 | 2024-01-01`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 4 ---------------------->
{{% /flex %}}
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 5 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `line` <em class="op50">tag</em>
- `station` <em class="op50">tag</em>
- `%Y-%m` <em class="op50">time (by month)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `A | cnc | 2023-12`
- `A | wld | 2023-12`
- `B | cnc | 2023-12`
- `B | wld | 2023-12`
- `A | cnc | 2024-01`
- `A | wld | 2024-01`
- `B | cnc | 2024-01`
- `B | wld | 2024-01`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 5 ---------------------->
{{% /flex %}}
---
{{% flex %}}
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 6 --------------------->
{{% flex-content "half" %}}
##### Partition template parts
- `line` <em class="op50">tag</em>
- `station,50` <em class="op50">tag bucket</em>
- `%Y-%m` <em class="op50">time (by month)</em>
{{% /flex-content %}}
{{% flex-content %}}
##### Partition keys
- `A | 47 | 2023-12`
- `A | 9 | 2023-12`
- `B | 47 | 2023-12`
- `B | 9 | 2023-12`
- `A | 47 | 2024-01`
- `A | 9 | 2024-01`
- `B | 47 | 2024-01`
- `B | 9 | 2024-01`
{{% /flex-content %}}
<!----------------------- END PARTITION EXAMPLES GROUP 6 ---------------------->
{{% /flex %}}
{{% /expand %}}
{{< /expand-wrapper >}}
## Partitions in the query life cycle
When querying data:
1. The [Catalog](/influxdb/cloud-dedicated/reference/internals/storage-engine/#catalog)
provides the v3 query engine ([Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier))
with the locations of partitions that contain the queried time series data.
2. The query engine reads all rows in the returned partitions to identify what
rows match the logic in the query and should be included in the query result.
The faster the query engine can identify what partitions to read and then read
the data in those partitions, the more performant queries are.
_For more information about the query lifecycle, see
[InfluxDB v3 query life cycle](/influxdb/cloud-dedicated/reference/internals/storage-engine/#query-life-cycle)._
##### Query example
Consider the following query that selects everything in the `production` table
where the `line` tag is `A` and the `station` tag is `cnc`:
```sql
SELECT *
FROM production
WHERE
time >= now() - INTERVAL '1 week'
AND line = 'A'
AND station = 'cnc'
```
Using the default partitioning strategy (by day), the query engine
reads eight separate partitions (one partition for today and one for each of the
last seven days):
- {{< datetime/current-date trimTime=true >}}
- {{< datetime/current-date offset=-1 trimTime=true >}}
- {{< datetime/current-date offset=-2 trimTime=true >}}
- {{< datetime/current-date offset=-3 trimTime=true >}}
- {{< datetime/current-date offset=-4 trimTime=true >}}
- {{< datetime/current-date offset=-5 trimTime=true >}}
- {{< datetime/current-date offset=-6 trimTime=true >}}
- {{< datetime/current-date offset=-7 trimTime=true >}}
The query engine must scan _all_ rows in the partitions to identify rows
where `line` is `A` and `station` is `cnc`. This process takes valuable time
and results in less performant queries.
However, including tags in your partitioning strategy allows the query engine to
identify partitions containing only the required tag values.
This avoids scanning rows for tag values.
For example, if you partition data by `line`, `station`, and day, although
the number of files increases, the query engine can quickly identify and read
only those with data relevant to the query:
{{% columns 4 %}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date trimTime=true >}}
- B | cnc | {{< datetime/current-date trimTime=true >}}
- B | wld | {{< datetime/current-date trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-1 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-1 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-1 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-1 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-2 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-2 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-2 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-2 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-3 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-3 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-3 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-3 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-4 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-4 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-4 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-4 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-5 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-5 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-5 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-5 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-6 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-6 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-6 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-6 trimTime=true >}}
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-7 trimTime=true >}}</strong>
- A | wld | {{< datetime/current-date offset=-7 trimTime=true >}}
- B | cnc | {{< datetime/current-date offset=-7 trimTime=true >}}
- B | wld | {{< datetime/current-date offset=-7 trimTime=true >}}
{{% /columns %}}
---
## Partition guides
{{< children >}}

View File

@ -0,0 +1,78 @@
Use the following best practices when defining custom partitioning strategies
for your data stored in {{< product-name >}}.
- [Partition by tags that you commonly query for a specific value](#partition-by-tags-that-you-commonly-query-for-a-specific-value)
- [Only partition by tags that _always_ have a value](#only-partition-by-tags-that-always-have-a-value)
- [Avoid over-partitioning](#avoid-over-partitioning)
- [Limit the number of partition files](#limit-the-number-of-partition-files)
- [Estimate the total partition count](#estimate-the-total-partition-count)
## Partition by tags that you commonly query for a specific value
Custom partitioning primarily benefits single series queries that look for a specific tag
value in the `WHERE` clause.
For example, if you often query data related to a
specific ID, partitioning by the tag that stores the ID helps the InfluxDB
query engine to more quickly identify what partitions contain the relevant data.
{{% note %}}
#### Use tag buckets for high-cardinality tags
Partitioning using distinct values of tags with many (10K+) unique values can
actually hurt query performance as partitions are created for each unique tag value.
Instead, use [tag buckets](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#tag-bucket-part-templates)
to partition by high-cardinality tags.
This method of partitioning groups tag values into "buckets" and partitions by bucket.
{{% /note %}}
## Only partition by tags that _always_ have a value
You should only partition by tags that _always_ have a value.
If points don't have a value for the tag, InfluxDB can't store them in the correct partitions and, at query time, must read all the partitions.
## Avoid over-partitioning
As you plan your partitioning strategy, keep in mind that data can be
"over-partitioned"--meaning partitions are so granular that queries end up
having to retrieve and read many partitions from the object store, which
hurts query performance.
- Balance the partition time interval with the actual amount of data written
during each interval. If a single interval doesn't contain a lot of data,
it is better to partition by larger time intervals.
- Don't partition by tags that you typically don't use in your query workload.
- Don't partition by distinct values of high-cardinality tags.
Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to
partition by these tags.
## Limit the number of partition files
Avoid exceeding **10,000** total partition files.
Limiting the total partition count can help manage system performance and costs.
While planning your strategy include the following steps to keep the total
partition count below 10,000 files over the next few years:
- [Estimate the total partition count](#estimate-the-total-partition-count) for the lifespan of your data
- Take the following steps to limit the total partition count:
- **Set a [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period)**
to prevent the number of files from growing unbounded.
- **Partition by month or year** to [avoid over-partitioning](#avoid-over-partitioning)
and creating too many partition files.
- **Don't partition on high cardinality tags** unless you also use [tag buckets](#use-tag-buckets-for-high-cardinality-tags)
### Estimate the total partition count
Use the following formula to estimate the total partition file count over the
lifetime of the database (or retention period):
```text
total_partition_count = (cardinality_of_partitioned_tag) * (data_lifespan / partition_duration)
```
- `total_partition_count`: The number of partition files in [Object storage](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-storage)
- `cardinality_of_partitioned_tag`: The number of distinct values for a tag
- `data_lifespan`: The [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period), if set, or the expected lifetime of the database
- `partition_duration`: The partition time interval, defined by the [tine part template](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates)

View File

@ -0,0 +1,156 @@
Use the [`influxctl` CLI](/influxdb/cloud-dedicated/reference/cli/influxctl/)
to define custom partition strategies when creating a database or table.
By default, {{< product-name >}} partitions data by day.
The partitioning strategy of a database or table is determined by a
[partition template](/influxdb/cloud-dedicated/admin/custom-partitions/#partition-templates)
which defines the naming pattern for [partition keys](/influxdb/cloud-dedicated/admin/custom-partitions/#partition-keys).
Partition keys uniquely identify each partition.
When a partition template is applied to a database, it becomes the default template
for all tables in that database, but can be overridden when creating a
table.
- [Create a database with a custom partition template](#create-a-database-with-a-custom-partition-template)
- [Create a table with a custom partition template](#create-a-table-with-a-custom-partition-template)
- [Example partition templates](#example-partition-templates)
{{% warn %}}
#### Partition templates can only be applied on create
You can only apply a partition template when creating a database or table.
You can't update a partition template on an existing resource.
{{% /warn %}}
Use the following command flags to identify
[partition template parts](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#tag-part-templates):
- `--template-tag`: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
to use in the partition template.
- `--template-tag-bucket`: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
and number of "buckets" to group tag values into.
Provide the tag key and the number of buckets to bucket tag values into
separated by a comma: `tagKey,N`.
- `--template-timeformat`: A [Rust strftime date and time](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates)
string that specifies the time format in the partition template and determines
the time interval to partition by.
{{% note %}}
A partition template can include up to 7 total tag and tag bucket parts
and only 1 time part.
{{% /note %}}
_View [partition template part restrictions](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#restrictions)._
{{% note %}}
#### Always provide a time format when using custom partitioning
When defining a custom partition template for your database or table using any
of the `influxctl` `--template-*` flags, always include the `--template-timeformat`
flag with a time format to use in your partition template.
Otherwise, InfluxDB omits time from the partition template and won't compact partitions.
{{% /note %}}
## Create a database with a custom partition template
The following example creates a new `example-db` database and applies a partition
template that partitions by distinct values of two tags (`room` and `sensor-type`),
bucketed values of the `customerID` tag, and by day using the time format `%Y-%m-%d`:
<!--Skip database create and delete tests: namespaces aren't reusable-->
<!--pytest.mark.skip-->
```sh
influxctl database create \
--template-tag room \
--template-tag sensor-type \
--template-tag-bucket customerID,500 \
--template-timeformat '%Y-%m-%d' \
example-db
```
## Create a table with a custom partition template
The following example creates a new `example-table` table in the specified
database and applies a partition template that partitions by distinct values of
two tags (`room` and `sensor-type`), bucketed values of the `customerID` tag,
and by month using the time format `%Y-%m`:
<!--Skip database create and delete tests: namespaces aren't reusable-->
<!--pytest.mark.skip-->
{{% code-placeholders "DATABASE_NAME" %}}
```sh
influxctl table create \
--template-tag room \
--template-tag sensor-type \
--template-tag-bucket customerID,500 \
--template-timeformat '%Y-%m' \
DATABASE_NAME \
example-table
```
{{% /code-placeholders %}}
Replace the following in your command:
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} [database](/influxdb/cloud-dedicated/admin/databases/)
<!--actual test
```sh
# Test the preceding command outside of the code block.
# influxctl authentication requires TTY interaction--
# output the auth URL to a file that the host can open.
TABLE_NAME=table_TEST_RUN
script -c "influxctl table create \
--template-tag room \
--template-tag sensor-type \
--template-tag-bucket customerID,500 \
--template-timeformat '%Y-%m' \
DATABASE_NAME \
$TABLE_NAME" \
/dev/null > /shared/urls.txt
script -c "influxctl query \
--database DATABASE_NAME \
--token DATABASE_TOKEN \
'SHOW TABLES'" > /shared/temp_tables.txt
grep -q $TABLE_NAME /shared/temp_tables.txt
rm /shared/temp_tables.txt
```
-->
## Example partition templates
Given the following [line protocol](/influxdb/cloud-dedicated/reference/syntax/line-protocol/)
with a `2024-01-01T00:00:00Z` timestamp:
```text
prod,line=A,station=weld1 temp=81.9,qty=36i 1704067200000000000
```
##### Partitioning by distinct tag values
| Description | Tag parts | Time part | Resulting partition key |
| :---------------------- | :---------------- | :--------- | :----------------------- |
| By day (default) | | `%Y-%m-%d` | 2024-01-01 |
| By month | | `%Y-%m` | 2024-01 |
| By year | | `%Y` | 2024 |
| Single tag, by day | `line` | `%Y-%m-%d` | A \| 2024-01-01 |
| Single tag, by month | `line` | `%Y-%m` | A \| 2024-01 |
| Single tag, by year | `line` | `%Y` | A \| 2024 |
| Multiple tags, by day | `line`, `station` | `%Y-%m-%d` | A \| weld1 \| 2024-01-01 |
| Multiple tags, by month | `line`, `station` | `%Y-%m` | A \| weld1 \| 2024-01 |
| Multiple tags, by year | `line`, `station` | `%Y` | A \| weld1 \| 2024 |
##### Partition by tag buckets
| Description | Tag part | Tag bucket part | Time part | Resulting partition key |
| :---------------------------------- | :------- | :-------------- | :--------- | :---------------------- |
| Distinct tag, tag buckets, by day | `line` | `station,100` | `%Y-%m-%d` | A \| 3 \| 2024-01-01 |
| Distinct tag, tag buckets, by month | `line` | `station,500` | `%Y-%m` | A \| 303 \| 2024-01 |

View File

@ -0,0 +1,124 @@
Use partition templates to define the patterns used to generate partition keys.
A partition key uniquely identifies a partition and is used to name the partition
Parquet file in the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store).
A partition template consists of 1-8 _template parts_---dimensions to partition data by.
Three types of template parts exist:
- **tag**: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
to partition by.
- **tag bucket**: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
and number of "buckets" to group tag values into. Data is partitioned by the
tag bucket rather than each distinct tag value.
- {{< req type="key" >}} **time**: A Rust strftime date and time string that specifies the time interval
to partition data by. The smallest unit of time included in the time part
template is the interval used to partition data.
{{% note %}}
A partition template must include 1 [time part](#time-part-templates)
and can include up to 7 total [tag](#tag-part-templates) and [tag bucket](#tag-bucket-part-templates) parts.
{{% /note %}}
<!-- TOC -->
- [Restrictions](#restrictions)
- [Template part size limit](#template-part-size-limit)
- [Reserved keywords](#reserved-keywords)
- [Reserved Characters](#reserved-characters)
- [Tag part templates](#tag-part-templates)
- [Tag bucket part templates](#tag-bucket-part-templates)
- [Time part templates](#time-part-templates)
<!-- /TOC -->
## Restrictions
### Template part size limit
Each template part is limited to 200 bytes in length.
Anything longer will be truncated at 200 bytes and appended with `#`.
### Partition key size limit
With the truncation of template parts, the maximum length of a partition key is
1,607 bytes (1.57 KiB).
### Reserved keywords
The following reserved keywords cannot be used in partition templates:
- `time`
### Reserved Characters
If used in template parts, non-ASCII characters and the following reserved
characters must be [percent encoded](https://developer.mozilla.org/en-US/docs/Glossary/Percent-encoding):
- `|`: Partition key part delimiter
- `!`: Null or missing partition key part
- `^`: Empty string partition key part
- `#`: Key part truncation marker
- `%`: Required for unambiguous reversal of percent encoding
## Tag part templates
Tag part templates consist of a _tag key_ to partition by.
Generated partition keys include the unique _tag value_ specific to each partition.
A partition template may include a given tag key only once in template parts
that operate on tags (tag value and tag bucket)--for example:
If a template partitions on unique values of `tag_A`, then
you can't use `tag_A` as a tag bucket part.
## Tag bucket part templates
Tag bucket part templates consist of a _tag key_ to partition by and the
_number of "buckets" to partition tag values into_--for example:
```
customerID,500
```
Values of the `customerID` tag are bucketed into 500 distinct "buckets."
Each bucket is identified by the remainder of the tag value hashed into a 32bit
integer divided by the specified number of buckets:
```rust
hash(tagValue) % N
```
Generated partition keys include the unique _tag bucket identifier_ specific to
each partition.
**Supported number of tag buckets**: 1-1,000
{{% note %}}
Tag buckets should be used to partition by high cardinality tags or tags with an
unknown number of distinct values.
{{% /note %}}
A partition template may include a given tag key only once in template parts
that operate on tags (tag value and tag bucket)--for example:
If a template partitions on unique values of `tag_A`, then
you can't use `tag_A` as a tag bucket part.
## Time part templates
Time part templates use a limited subset of the
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)
to specify time format in partition keys.
Time part templates can be daily (`%Y-%m-%d`), monthly (`%Y-%m`), or yearly (`%Y`).
InfluxDB uses the smallest unit of time included in the time part template as
the partition interval.
InfluxDB supports only [date specifiers](#date-specifiers) in time part templates.
### Date specifiers
Time part templates allow only the following date specifiers:
| Variable | Example | Description |
| :------: | :----------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `%Y` | `2001` | The full proleptic Gregorian year, zero-padded to 4 digits. chrono supports years from -262144 to 262143. Note: years before 1 BCE or after 9999 CE, require an initial sign (+/-). |
| `%m` | `07` | Month number (01--12), zero-padded to 2 digits. |
| `%d` | `08` | Day number (01--31), zero-padded to 2 digits. |

View File

@ -0,0 +1,169 @@
<!--Allow shortcode-->
{{< product-name >}} stores partition information in InfluxDB v3 system tables.
Query partition information to view partition templates and verify partitions
are working as intended.
- [Query partition information from system tables](#query-partition-information-from-system-tables)
- [Partition-related queries](#partition-related-queries)
{{% warn %}}
#### Querying system tables may impact overall cluster performance
Partition information is stored in InfluxDB v3 system tables.
Querying system tables may impact the overall write and query performance of
your {{< product-name omit=" Clustered" >}} cluster.
<!--------------- UPDATE THE DATE BELOW AS EXAMPLES ARE UPDATED --------------->
#### System tables are subject to change
System tables are not part of InfluxDB's stable API and may change with new releases.
The provided schema information and query examples are valid as of **September 24, 2024**.
If you detect a schema change or a non-functioning query example, please
[submit an issue](https://github.com/influxdata/docs-v2/issues/new/choose).
<!--------------- UPDATE THE DATE ABOVE AS EXAMPLES ARE UPDATED --------------->
{{% /warn %}}
## Query partition information from system tables
Use the [`influxctl query` command](/influxdb/cloud-dedicated/reference/cli/influxctl/query/)
and SQL to query partition-related information from InfluxDB system tables.
Provide the following:
- **Enable system tables** with the `--enable-system-tables` command flag.
- **Database token**: A [database token](/influxdb/cloud-dedicated/admin/tokens/#database-tokens)
with read permissions on the specified database. Uses the `token` setting from
the [`influxctl` connection profile](/influxdb/cloud-dedicated/reference/cli/influxctl/#configure-connection-profiles)
or the `--token` command flag.
- **Database name**: The name of the database to query information about.
Uses the `database` setting from the
[`influxctl` connection profile](/influxdb/cloud-dedicated/reference/cli/influxctl/#configure-connection-profiles)
or the `--database` command flag.
- **SQL query**: The SQL query to execute.
Pass the query in one of the following ways:
- a string on the command line
- a path to a file that contains the query
- a single dash (`-`) to read the query from stdin
{{% code-placeholders "DATABASE_(TOKEN|NAME)|SQL_QUERY" %}}
```bash
influxctl query \
--enable-system-tables \
--database DATABASE_NAME \
--token DATABASE_TOKEN \
"SQL_QUERY"
```
{{% /code-placeholders %}}
Replace the following:
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}:
A database token with read access to the specified database
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}:
The name of the database to query information about.
- {{% code-placeholder-key %}}`SQL_QUERY`{{% /code-placeholder-key %}}:
The SQL query to execute. For examples, see
[System query examples](#system-query-examples).
When prompted, enter `y` to acknowledge the potential impact querying system
tables may have on your cluster.
## Partition-related queries
Use the following queries to return information about partitions in your
{{< product-name omit=" Clustered" >}} cluster.
- [View partition templates of all tables](#view-partition-templates-of-all-tables)
- [View the partition template of a specific table](#view-the-partition-template-of-a-specific-table)
- [View all partitions for a table](#view-all-partitions-for-a-table)
- [View the number of partitions per table](#view-the-number-of-partitions-per-table)
- [View the number of partitions for a specific table](#view-the-number-of-partitions-for-a-specific-table)
---
In the examples below, replace {{% code-placeholder-key %}}`TABLE_NAME`{{% /code-placeholder-key %}}
with the name of the table you want to query information about.
---
{{% code-placeholders "TABLE_NAME_(1|2|3)|TABLE_NAME" %}}
### View the partition template of a specific table
```sql
SELECT * FROM system.tables WHERE table_name = 'TABLE_NAME'
```
#### Example results
| table_name | partition_template |
| :--------- | :----------------------------------------------------------------------------------------- |
| weather | `{"parts":[{"timeFormat":"%Y-%m-%d"},{"bucket":{"tagName":"location","numBuckets":250}}]}` |
{{% note %}}
If a table doesn't include a partition template in the output of this command,
the table uses the default (1 day) partition strategy and doesn't partition
by tags or tag buckets.
{{% /note %}}
### View all partitions for a table
```sql
SELECT * FROM system.partitions WHERE table_name = 'TABLE_NAME'
```
### Example results
| partition_id | table_name | partition_key | last_new_file_created_at | num_files | total_size_mb |
| -----------: | :--------- | :---------------- | -----------------------: | --------: | ------------: |
| 1362 | weather | 43 \| 2020-05-27 | 1683747418763813713 | 1 | 0 |
| 800 | weather | 234 \| 2021-08-02 | 1683747421899400796 | 1 | 0 |
| 630 | weather | 325 \| 2022-03-17 | 1683747417616689036 | 1 | 0 |
| 1401 | weather | 12 \| 2021-01-09 | 1683747417786122295 | 1 | 0 |
| 1012 | weather | 115 \| 2022-07-04 | 1683747417614219148 | 1 | 0 |
### View the number of partitions per table
```sql
SELECT
table_name,
COUNT(*) AS partition_count
FROM
system.partitions
WHERE
table_name IN ('TABLE_NAME_1', 'TABLE_NAME_2', 'TABLE_NAME_3')
GROUP BY
table_name
```
### Example results
| table_name | partition_count |
| :--------- | --------------: |
| weather | 1096 |
| home | 24 |
| numbers | 1 |
### View the number of partitions for a specific table
```sql
SELECT
COUNT(*) AS partition_count
FROM
system.partitions
WHERE
table_name = 'TABLE_NAME'
```
### Example results
| table_name | partition_count |
| :--------- | --------------: |
| weather | 1096 |
{{% /code-placeholders %}}