Merge pull request #5734 from influxdata/jstirnaman/DAR-463
docs(partitioning): enhance best practices and time part templates do…pull/5740/head^2
commit
37183896f4
|
@ -11,409 +11,9 @@ weight: 103
|
|||
influxdb/cloud-dedicated/tags: [storage]
|
||||
related:
|
||||
- /influxdb/cloud-dedicated/reference/internals/storage-engine/
|
||||
source: /shared/v3-distributed-admin-custom-partitions/_index.md
|
||||
---
|
||||
|
||||
When writing data to {{< product-name >}}, the InfluxDB v3 storage engine stores
|
||||
data in the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store)
|
||||
in [Apache Parquet](https://parquet.apache.org/) format.
|
||||
Each Parquet file represents a _partition_--a logical grouping of data.
|
||||
By default, InfluxDB partitions each table by day.
|
||||
{{< product-name >}} lets you customize the partitioning strategy and partition
|
||||
by tag values and different time intervals.
|
||||
Customize your partitioning strategy to optimize query performance for your
|
||||
specific schema and workload.
|
||||
|
||||
- [Advantages](#advantages)
|
||||
- [Disadvantages](#disadvantages)
|
||||
- [Limitations](#limitations)
|
||||
- [How partitioning works](#how-partitioning-works)
|
||||
- [Partition templates](#partition-templates)
|
||||
- [Partition keys](#partition-keys)
|
||||
- [Partitions in the query life cycle](#partitions-in-the-query-life-cycle)
|
||||
- [Partition guides](#partition-guides)
|
||||
{{< children type="anchored-list" >}}
|
||||
|
||||
## Advantages
|
||||
|
||||
The primary advantage of custom partitioning is that it lets you customize your
|
||||
storage structure to improve query performance specific to your schema and workload.
|
||||
|
||||
- **Optimized storage for improved performance on specific types of queries**.
|
||||
For example, if queries often select data with a specific tag value, you can
|
||||
partition by that tag to improve the performance of those queries.
|
||||
- **Optimized storage for specific types of data**. For example, if the data you
|
||||
store is sparse and the time ranges you query are often much larger than a day,
|
||||
you could partition your data by week instead of by day.
|
||||
|
||||
## Disadvantages
|
||||
|
||||
Using custom partitioning may increase the load on other parts of the
|
||||
[InfluxDB v3 storage engine](/influxdb/cloud-dedicated/reference/internals/storage-engine/),
|
||||
but each can be scaled individually to address the added load.
|
||||
|
||||
{{% note %}}
|
||||
_The following disadvantages assume that your custom partitioning strategy includes
|
||||
additional tags to partition by or partition intervals smaller than a day._
|
||||
{{% /note %}}
|
||||
|
||||
- **Increased load on the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester)**
|
||||
as it groups data into smaller partitions and files.
|
||||
- **Increased load on the [Catalog](/influxdb/cloud-dedicated/reference/internals/storage-engine/#catalog)**
|
||||
as more references to partition Parquet file locations are stored and queried.
|
||||
- **Increased load on the [Compactor](/influxdb/cloud-dedicated/reference/internals/storage-engine/#compactor)**
|
||||
as more partition Parquet files need to be compacted.
|
||||
- **Increased costs associated with [Object storage](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-storage)**
|
||||
as more partition Parquet files are created and stored.
|
||||
- **Risk of decreased performance for queries that don't use tags in the WHERE clause**.
|
||||
These queries may end up reading many partitions and smaller files, degrading performance.
|
||||
|
||||
## Limitations
|
||||
|
||||
Custom partitioning has the following limitations:
|
||||
|
||||
- Database and table partitions can only be defined on create.
|
||||
You cannot update the partition strategy of a database or table after it has
|
||||
been created.
|
||||
- A partition template must include a time part.
|
||||
- You can partition by up to eight dimensions (seven tags and a time interval).
|
||||
|
||||
## How partitioning works
|
||||
|
||||
### Partition templates
|
||||
|
||||
A partition template defines the pattern used for _[partition keys](#partition-keys)_
|
||||
and determines the time interval that data is partitioned by.
|
||||
Partition templates use tag values and
|
||||
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html).
|
||||
|
||||
_For more detailed information, see [Partition templates](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/)._
|
||||
|
||||
### Partition keys
|
||||
|
||||
A partition key uniquely identifies a partition.
|
||||
A _[partition template](#partition-templates)_ defines the partition key format.
|
||||
Partition keys are
|
||||
composed of up to 8 dimensions (1 time part and up to 7 tag or tag bucket parts).
|
||||
Each part is delimited by the partition key separator (`|`).
|
||||
|
||||
The default format for partition keys is `%Y-%m-%d` (for example, `2024-01-01`).
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "View example partition templates and keys" %}}
|
||||
|
||||
Given the following line protocol with the following timestamps:
|
||||
|
||||
- 2023-12-31T23:00:00Z
|
||||
- 2024-01-01T00:00:00Z
|
||||
- 2024-01-01T01:00:00Z
|
||||
|
||||
```text
|
||||
production,line=A,station=cnc temp=81.2,qty=35i 1704063600000000000
|
||||
production,line=A,station=wld temp=92.8,qty=35i 1704063600000000000
|
||||
production,line=B,station=cnc temp=101.1,qty=43i 1704063600000000000
|
||||
production,line=B,station=wld temp=102.4,qty=43i 1704063600000000000
|
||||
production,line=A,station=cnc temp=81.9,qty=36i 1704067200000000000
|
||||
production,line=A,station=wld temp=110.0,qty=22i 1704067200000000000
|
||||
production,line=B,station=cnc temp=101.8,qty=44i 1704067200000000000
|
||||
production,line=B,station=wld temp=105.7,qty=44i 1704067200000000000
|
||||
production,line=A,station=cnc temp=82.2,qty=35i 1704070800000000000
|
||||
production,line=A,station=wld temp=92.1,qty=30i 1704070800000000000
|
||||
production,line=B,station=cnc temp=102.4,qty=43i 1704070800000000000
|
||||
production,line=B,station=wld temp=106.5,qty=43i 1704070800000000000
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 1 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `2023-12-31`
|
||||
- `2024-01-01`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 1 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 2 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `line` <em class="op50">tag</em>
|
||||
- `%d %b %Y` <em class="op50">time (by day, non-default format)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `A | 31 Dec 2023`
|
||||
- `B | 31 Dec 2023`
|
||||
- `A | 01 Jan 2024`
|
||||
- `B | 01 Jan 2024`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 2 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 3 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `line` <em class="op50">tag</em>
|
||||
- `station` <em class="op50">tag</em>
|
||||
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `A | cnc | 2023-12-31`
|
||||
- `A | wld | 2023-12-31`
|
||||
- `B | cnc | 2023-12-31`
|
||||
- `B | wld | 2023-12-31`
|
||||
- `A | cnc | 2024-01-01`
|
||||
- `A | wld | 2024-01-01`
|
||||
- `B | cnc | 2024-01-01`
|
||||
- `B | wld | 2024-01-01`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 3 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 4 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `line` <em class="op50">tag</em>
|
||||
- `station,3` <em class="op50">tag bucket</em>
|
||||
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `A | 0 | 2023-12-31`
|
||||
- `B | 0 | 2023-12-31`
|
||||
- `A | 0 | 2024-01-01`
|
||||
- `B | 0 | 2024-01-01`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 4 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 5 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `line` <em class="op50">tag</em>
|
||||
- `station` <em class="op50">tag</em>
|
||||
- `%Y-%m-%d %H:00` <em class="op50">time (by hour)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `A | cnc | 2023-12-31 23:00`
|
||||
- `A | wld | 2023-12-31 23:00`
|
||||
- `B | cnc | 2023-12-31 23:00`
|
||||
- `B | wld | 2023-12-31 23:00`
|
||||
- `A | cnc | 2024-01-01 00:00`
|
||||
- `A | wld | 2024-01-01 00:00`
|
||||
- `B | cnc | 2024-01-01 00:00`
|
||||
- `B | wld | 2024-01-01 00:00`
|
||||
- `A | cnc | 2024-01-01 01:00`
|
||||
- `A | wld | 2024-01-01 01:00`
|
||||
- `B | cnc | 2024-01-01 01:00`
|
||||
- `B | wld | 2024-01-01 01:00`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 5 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 6 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `line` <em class="op50">tag</em>
|
||||
- `station,50` <em class="op50">tag bucket</em>
|
||||
- `%Y-%m-%d %H:00` <em class="op50">time (by hour)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `A | 47 | 2023-12-31 23:00`
|
||||
- `A | 9 | 2023-12-31 23:00`
|
||||
- `B | 47 | 2023-12-31 23:00`
|
||||
- `B | 9 | 2023-12-31 23:00`
|
||||
- `A | 47 | 2024-01-01 00:00`
|
||||
- `A | 9 | 2024-01-01 00:00`
|
||||
- `B | 47 | 2024-01-01 00:00`
|
||||
- `B | 9 | 2024-01-01 00:00`
|
||||
- `A | 47 | 2024-01-01 01:00`
|
||||
- `A | 9 | 2024-01-01 01:00`
|
||||
- `B | 47 | 2024-01-01 01:00`
|
||||
- `B | 9 | 2024-01-01 01:00`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 6 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
## Partitions in the query life cycle
|
||||
|
||||
When querying data:
|
||||
|
||||
1. The [Catalog](/influxdb/cloud-dedicated/reference/internals/storage-engine/#catalog)
|
||||
provides the v3 query engine ([Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier))
|
||||
with the locations of partitions that contain the queried time series data.
|
||||
2. The query engine reads all rows in the returned partitions to identify what
|
||||
rows match the logic in the query and should be included in the query result.
|
||||
|
||||
The faster the query engine can identify what partitions to read and then read
|
||||
the data in those partitions, the more performant queries are.
|
||||
|
||||
_For more information about the query lifecycle, see
|
||||
[InfluxDB v3 query life cycle](/influxdb/cloud-dedicated/reference/internals/storage-engine/#query-life-cycle)._
|
||||
|
||||
##### Query example
|
||||
|
||||
Consider the following query that selects everything in the `production` table
|
||||
where the `line` tag is `A` and the `station` tag is `cnc`:
|
||||
|
||||
```sql
|
||||
SELECT *
|
||||
FROM production
|
||||
WHERE
|
||||
time >= now() - INTERVAL '1 week'
|
||||
AND line = 'A'
|
||||
AND station = 'cnc'
|
||||
```
|
||||
|
||||
Using the default partitioning strategy (by day), the query engine
|
||||
reads eight separate partitions (one partition for today and one for each of the
|
||||
last seven days):
|
||||
|
||||
- {{< datetime/current-date trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-1 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-2 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-3 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-4 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-5 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-6 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-7 trimTime=true >}}
|
||||
|
||||
The query engine must scan _all_ rows in the partitions to identify rows
|
||||
where `line` is `A` and `station` is `cnc`. This process takes valuable time
|
||||
and results in less performant queries.
|
||||
|
||||
However, if you partition by other tags, InfluxDB can identify partitions that
|
||||
contain only the tag values your query needs and spend less time
|
||||
scanning rows to see if they contain the tag values.
|
||||
|
||||
For example, if data is partitioned by `line`, `station`, and day, although
|
||||
there are more partition files, the query engine can quickly identify and read
|
||||
only those with data relevant to the query:
|
||||
|
||||
{{% columns 4 %}}
|
||||
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-1 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-1 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-1 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-1 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-2 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-2 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-2 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-2 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-3 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-3 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-3 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-3 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-4 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-4 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-4 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-4 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-5 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-5 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-5 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-5 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-6 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-6 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-6 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-6 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-7 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-7 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-7 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-7 trimTime=true >}}
|
||||
|
||||
{{% /columns %}}
|
||||
|
||||
---
|
||||
|
||||
## Partition guides
|
||||
|
||||
{{< children >}}
|
||||
<!--
|
||||
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_index.md
|
||||
-->
|
||||
|
|
|
@ -8,49 +8,9 @@ menu:
|
|||
name: Best practices
|
||||
parent: Manage data partitioning
|
||||
weight: 202
|
||||
source: /shared/v3-distributed-admin-custom-partitions/best-practices.md
|
||||
---
|
||||
|
||||
Use the following best practices when defining custom partitioning strategies
|
||||
for your data stored in {{< product-name >}}.
|
||||
|
||||
- [Partition by tags that you commonly query for a specific value](#partition-by-tags-that-you-commonly-query-for-a-specific-value)
|
||||
- [Only partition by tags that _always_ have a value](#only-partition-by-tags-that-always-have-a-value)
|
||||
- [Avoid over-partitioning](#avoid-over-partitioning)
|
||||
|
||||
## Partition by tags that you commonly query for a specific value
|
||||
|
||||
Custom partitioning primarily benefits queries that look for a specific tag
|
||||
value in the `WHERE` clause. For example, if you often query data related to a
|
||||
specific ID, partitioning by the tag that stores the ID helps the InfluxDB
|
||||
query engine to more quickly identify what partitions contain the relevant data.
|
||||
|
||||
{{% note %}}
|
||||
|
||||
#### Use tag buckets for high-cardinality tags
|
||||
|
||||
Partitioning using distinct values of tags with many (10K+) unique values can
|
||||
actually hurt query performance as partitions are created for each unique tag value.
|
||||
Instead, use [tag buckets](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#tag-bucket-part-templates)
|
||||
to partition by high-cardinality tags.
|
||||
This method of partitioning groups tag values into "buckets" and partitions by bucket.
|
||||
{{% /note %}}
|
||||
|
||||
## Only partition by tags that _always_ have a value
|
||||
|
||||
You should only partition by tags that _always_ have a value.
|
||||
If points don't have a value for the tag, InfluxDB can't store them in the correct partitions and, at query time, must read all the partitions.
|
||||
|
||||
## Avoid over-partitioning
|
||||
|
||||
As you plan your partitioning strategy, keep in mind that data can be
|
||||
"over-partitioned"--meaning partitions are so granular that queries end up
|
||||
having to retrieve and read many partitions from the object store, which
|
||||
hurts query performance.
|
||||
|
||||
- Balance the partition time interval with the actual amount of data written
|
||||
during each interval. If a single interval doesn't contain a lot of data,
|
||||
it is better to partition by larger time intervals.
|
||||
- Don't partition by tags that you typically don't use in your query workload.
|
||||
- Don't partition by distinct values of high-cardinality tags.
|
||||
Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to
|
||||
partition by these tags.
|
||||
<!--
|
||||
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/best-practices.md
|
||||
-->
|
||||
|
|
|
@ -10,161 +10,9 @@ weight: 202
|
|||
related:
|
||||
- /influxdb/cloud-dedicated/reference/cli/influxctl/database/create/
|
||||
- /influxdb/cloud-dedicated/reference/cli/influxctl/table/create/
|
||||
source: /shared/v3-distributed-admin-custom-partitions/define-custom-partitions.md
|
||||
---
|
||||
|
||||
Use the [`influxctl` CLI](/influxdb/cloud-dedicated/reference/cli/influxctl/)
|
||||
to define custom partition strategies when creating a database or table.
|
||||
By default, {{< product-name >}} partitions data by day.
|
||||
|
||||
The partitioning strategy of a database or table is determined by a
|
||||
[partition template](/influxdb/cloud-dedicated/admin/custom-partitions/#partition-templates)
|
||||
which defines the naming pattern for [partition keys](/influxdb/cloud-dedicated/admin/custom-partitions/#partition-keys).
|
||||
Partition keys uniquely identify each partition.
|
||||
When a partition template is applied to a database, it becomes the default template
|
||||
for all tables in that database, but can be overridden when creating a
|
||||
table.
|
||||
|
||||
- [Create a database with a custom partition template](#create-a-database-with-a-custom-partition-template)
|
||||
- [Create a table with a custom partition template](#create-a-table-with-a-custom-partition-template)
|
||||
- [Example partition templates](#example-partition-templates)
|
||||
|
||||
{{% warn %}}
|
||||
|
||||
#### Partition templates can only be applied on create
|
||||
|
||||
You can only apply a partition template when creating a database or table.
|
||||
You can't update a partition template on an existing resource.
|
||||
{{% /warn %}}
|
||||
|
||||
Use the following command flags to identify
|
||||
[partition template parts](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#tag-part-templates):
|
||||
|
||||
- `--template-tag`: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
|
||||
to use in the partition template.
|
||||
- `--template-tag-bucket`: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
|
||||
and number of "buckets" to group tag values into.
|
||||
Provide the tag key and the number of buckets to bucket tag values into
|
||||
separated by a comma: `tagKey,N`.
|
||||
- `--template-timeformat`: A [Rust strftime date and time](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates)
|
||||
string that specifies the time format in the partition template and determines
|
||||
the time interval to partition by.
|
||||
|
||||
{{% note %}}
|
||||
A partition template can include up to 7 total tag and tag bucket parts
|
||||
and only 1 time part.
|
||||
{{% /note %}}
|
||||
|
||||
_View [partition template part restrictions](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#restrictions)._
|
||||
|
||||
{{% note %}}
|
||||
#### Always provide a time format when using custom partitioning
|
||||
|
||||
When defining a custom partition template for your database or table using any
|
||||
of the `influxctl` `--template-*` flags, always include the `--template-timeformat`
|
||||
flag with a time format to use in your partition template.
|
||||
Otherwise, InfluxDB omits time from the partition template and won't compact partitions.
|
||||
{{% /note %}}
|
||||
|
||||
## Create a database with a custom partition template
|
||||
|
||||
The following example creates a new `example-db` database and applies a partition
|
||||
template that partitions by distinct values of two tags (`room` and `sensor-type`),
|
||||
bucketed values of the `customerID` tag, and by day using the time format `%Y-%m-%d`:
|
||||
|
||||
<!--Skip database create and delete tests: namespaces aren't reusable-->
|
||||
<!--pytest.mark.skip-->
|
||||
|
||||
```sh
|
||||
influxctl database create \
|
||||
--template-tag room \
|
||||
--template-tag sensor-type \
|
||||
--template-tag-bucket customerID,500 \
|
||||
--template-timeformat '%Y-%m-%d' \
|
||||
example-db
|
||||
```
|
||||
|
||||
## Create a table with a custom partition template
|
||||
|
||||
The following example creates a new `example-table` table in the specified
|
||||
database and applies a partition template that partitions by distinct values of
|
||||
two tags (`room` and `sensor-type`), bucketed values of the `customerID` tag,
|
||||
and by month using the time format `%Y-%m`:
|
||||
|
||||
<!--Skip database create and delete tests: namespaces aren't reusable-->
|
||||
<!--pytest.mark.skip-->
|
||||
|
||||
{{% code-placeholders "DATABASE_NAME" %}}
|
||||
|
||||
```sh
|
||||
influxctl table create \
|
||||
--template-tag room \
|
||||
--template-tag sensor-type \
|
||||
--template-tag-bucket customerID,500 \
|
||||
--template-timeformat '%Y-%m' \
|
||||
DATABASE_NAME \
|
||||
example-table
|
||||
```
|
||||
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
Replace the following in your command:
|
||||
|
||||
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} [database](/influxdb/cloud-dedicated/admin/databases/)
|
||||
|
||||
<!--actual test
|
||||
|
||||
```sh
|
||||
|
||||
# Test the preceding command outside of the code block.
|
||||
# influxctl authentication requires TTY interaction--
|
||||
# output the auth URL to a file that the host can open.
|
||||
|
||||
TABLE_NAME=table_TEST_RUN
|
||||
script -c "influxctl table create \
|
||||
--template-tag room \
|
||||
--template-tag sensor-type \
|
||||
--template-tag-bucket customerID,500 \
|
||||
--template-timeformat '%Y-%m' \
|
||||
DATABASE_NAME \
|
||||
$TABLE_NAME" \
|
||||
/dev/null > /shared/urls.txt
|
||||
|
||||
script -c "influxctl query \
|
||||
--database DATABASE_NAME \
|
||||
--token DATABASE_TOKEN \
|
||||
'SHOW TABLES'" > /shared/temp_tables.txt
|
||||
grep -q $TABLE_NAME /shared/temp_tables.txt
|
||||
rm /shared/temp_tables.txt
|
||||
```
|
||||
|
||||
<!--
|
||||
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_define-custom-partitions.md
|
||||
-->
|
||||
|
||||
## Example partition templates
|
||||
|
||||
Given the following [line protocol](/influxdb/cloud-dedicated/reference/syntax/line-protocol/)
|
||||
with a `2024-01-01T00:00:00Z` timestamp:
|
||||
|
||||
```text
|
||||
prod,line=A,station=weld1 temp=81.9,qty=36i 1704067200000000000
|
||||
```
|
||||
|
||||
##### Partitioning by distinct tag values
|
||||
|
||||
| Description | Tag parts | Time part | Resulting partition key |
|
||||
| :---------------------- | :---------------- | :--------- | :----------------------- |
|
||||
| By day (default) | | `%Y-%m-%d` | 2024-01-01 |
|
||||
| By month | | `%Y-%m` | 2024-01 |
|
||||
| By year | | `%Y` | 2024 |
|
||||
| Single tag, by day | `line` | `%Y-%m-%d` | A \| 2024-01-01 |
|
||||
| Single tag, by month | `line` | `%Y-%m` | A \| 2024-01 |
|
||||
| Single tag, by year | `line` | `%Y` | A \| 2024 |
|
||||
| Multiple tags, by day | `line`, `station` | `%Y-%m-%d` | A \| weld1 \| 2024-01-01 |
|
||||
| Multiple tags, by month | `line`, `station` | `%Y-%m` | A \| weld1 \| 2024-01 |
|
||||
| Multiple tags, by year | `line`, `station` | `%Y` | A \| weld1 \| 2024 |
|
||||
|
||||
##### Partition by tag buckets
|
||||
|
||||
| Description | Tag part | Tag bucket part | Time part | Resulting partition key |
|
||||
| :---------------------------------- | :------- | :-------------- | :--------- | :---------------------- |
|
||||
| Distinct tag, tag buckets, by day | `line` | `station,100` | `%Y-%m-%d` | A \| 3 \| 2024-01-01 |
|
||||
| Distinct tag, tag buckets, by month | `line` | `station,500` | `%Y-%m` | A \| 303 \| 2024-01 |
|
||||
|
|
|
@ -8,124 +8,9 @@ menu:
|
|||
influxdb_cloud_dedicated:
|
||||
parent: Manage data partitioning
|
||||
weight: 202
|
||||
source: /shared/v3-distributed-admin-custom-partitions/partition-templates.md
|
||||
---
|
||||
|
||||
Use partition templates to define the patterns used to generate partition keys.
|
||||
A partition key uniquely identifies a partition and is used to name the partition
|
||||
Parquet file in the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store).
|
||||
|
||||
A partition template consists of 1-8 _template parts_---dimensions to partition data by.
|
||||
Three types of template parts exist:
|
||||
|
||||
- **tag**: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
|
||||
to partition by.
|
||||
- **tag bucket**: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
|
||||
and number of "buckets" to group tag values into. Data is partitioned by the
|
||||
tag bucket rather than each distinct tag value.
|
||||
- {{< req type="key" >}} **time**: A Rust strftime date and time string that specifies the time interval
|
||||
to partition data by. The smallest unit of time included in the time part
|
||||
template is the interval used to partition data.
|
||||
|
||||
{{% note %}}
|
||||
A partition template must include 1 [time part](#time-part-templates)
|
||||
and can include up to 7 total [tag](#tag-part-templates) and [tag bucket](#tag-bucket-part-templates) parts.
|
||||
{{% /note %}}
|
||||
|
||||
<!-- TOC -->
|
||||
- [Restrictions](#restrictions)
|
||||
- [Template part size limit](#template-part-size-limit)
|
||||
- [Reserved keywords](#reserved-keywords)
|
||||
- [Reserved Characters](#reserved-characters)
|
||||
- [Tag part templates](#tag-part-templates)
|
||||
- [Tag bucket part templates](#tag-bucket-part-templates)
|
||||
- [Time part templates](#time-part-templates)
|
||||
<!-- /TOC -->
|
||||
|
||||
## Restrictions
|
||||
|
||||
### Template part size limit
|
||||
|
||||
Each template part is limited to 200 bytes in length.
|
||||
Anything longer will be truncated at 200 bytes and appended with `#`.
|
||||
|
||||
### Partition key size limit
|
||||
|
||||
With the truncation of template parts, the maximum length of a partition key is
|
||||
1,607 bytes (1.57 KiB).
|
||||
|
||||
### Reserved keywords
|
||||
|
||||
The following reserved keywords cannot be used in partition templates:
|
||||
|
||||
- `time`
|
||||
|
||||
### Reserved Characters
|
||||
|
||||
If used in template parts, non-ASCII characters and the following reserved
|
||||
characters must be [percent encoded](https://developer.mozilla.org/en-US/docs/Glossary/Percent-encoding):
|
||||
|
||||
- `|`: Partition key part delimiter
|
||||
- `!`: Null or missing partition key part
|
||||
- `^`: Empty string partition key part
|
||||
- `#`: Key part truncation marker
|
||||
- `%`: Required for unambiguous reversal of percent encoding
|
||||
|
||||
## Tag part templates
|
||||
|
||||
Tag part templates consist of a _tag key_ to partition by.
|
||||
Generated partition keys include the unique _tag value_ specific to each partition.
|
||||
|
||||
A partition template may include a given tag key only once in template parts
|
||||
that operate on tags (tag value and tag bucket)--for example:
|
||||
|
||||
If a template partitions on unique values of `tag_A`, then
|
||||
you can't use `tag_A` as a tag bucket part.
|
||||
|
||||
## Tag bucket part templates
|
||||
|
||||
Tag bucket part templates consist of a _tag key_ to partition by and the
|
||||
_number of "buckets" to partition tag values into_--for example:
|
||||
|
||||
```
|
||||
customerID,500
|
||||
```
|
||||
|
||||
Values of the `customerID` tag are bucketed into 500 distinct "buckets."
|
||||
Each bucket is identified by the remainder of the tag value hashed into a 32bit
|
||||
integer divided by the specified number of buckets:
|
||||
|
||||
```rust
|
||||
hash(tagValue) % N
|
||||
```
|
||||
|
||||
Generated partition keys include the unique _tag bucket identifier_ specific to
|
||||
each partition.
|
||||
|
||||
**Supported number of tag buckets**: 1-1,000
|
||||
|
||||
{{% note %}}
|
||||
Tag buckets should be used to partition by high cardinality tags or tags with an
|
||||
unknown number of distinct values.
|
||||
{{% /note %}}
|
||||
|
||||
A partition template may include a given tag key only once in template parts
|
||||
that operate on tags (tag value and tag bucket)--for example:
|
||||
|
||||
If a template partitions on unique values of `tag_A`, then
|
||||
you can't use `tag_A` as a tag bucket part.
|
||||
|
||||
## Time part templates
|
||||
|
||||
Time part templates use a limited subset of the
|
||||
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)
|
||||
to specify time format in partition keys.
|
||||
InfluxDB uses the smallest unit of time included in the time part template as
|
||||
the partition interval.
|
||||
|
||||
### Date specifiers
|
||||
|
||||
| Variable | Example | Description |
|
||||
| :------: | :----------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `%Y` | `2001` | The full proleptic Gregorian year, zero-padded to 4 digits. chrono supports years from -262144 to 262143. Note: years before 1 BCE or after 9999 CE, require an initial sign (+/-). |
|
||||
| `%m` | `07` | Month number (01--12), zero-padded to 2 digits. |
|
||||
| `%d` | `08` | Day number (01--31), zero-padded to 2 digits. |
|
||||
<!--
|
||||
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_partition-templates.md
|
||||
-->
|
||||
|
|
|
@ -14,173 +14,9 @@ list_code_example: |
|
|||
```
|
||||
related:
|
||||
- /influxdb/cloud-dedicated/admin/query-system-data/
|
||||
source: /shared/v3-distributed-admin-custom-partitions/view-partitions.md
|
||||
---
|
||||
|
||||
{{< product-name >}} stores partition information in InfluxDB v3 system tables.
|
||||
Query partition information to view partition templates and verify partitions
|
||||
are working as intended.
|
||||
|
||||
- [Query partition information from system tables](#query-partition-information-from-system-tables)
|
||||
- [Partition-related queries](#partition-related-queries)
|
||||
|
||||
{{% warn %}}
|
||||
#### Querying system tables may impact overall cluster performance
|
||||
|
||||
Partition information is stored in InfluxDB v3 system tables.
|
||||
Querying system tables may impact the overall write and query performance of
|
||||
your {{< product-name omit=" Clustered" >}} cluster.
|
||||
|
||||
<!--------------- UPDATE THE DATE BELOW AS EXAMPLES ARE UPDATED --------------->
|
||||
|
||||
#### System tables are subject to change
|
||||
|
||||
System tables are not part of InfluxDB's stable API and may change with new releases.
|
||||
The provided schema information and query examples are valid as of **September 24, 2024**.
|
||||
If you detect a schema change or a non-functioning query example, please
|
||||
[submit an issue](https://github.com/influxdata/docs-v2/issues/new/choose).
|
||||
|
||||
<!--------------- UPDATE THE DATE ABOVE AS EXAMPLES ARE UPDATED --------------->
|
||||
|
||||
{{% /warn %}}
|
||||
|
||||
## Query partition information from system tables
|
||||
|
||||
Use the [`influxctl query` command](/influxdb/cloud-dedicated/reference/cli/influxctl/query/)
|
||||
and SQL to query partition-related information from InfluxDB system tables.
|
||||
Provide the following:
|
||||
|
||||
- **Enable system tables** with the `--enable-system-tables` command flag.
|
||||
- **Database token**: A [database token](/influxdb/cloud-dedicated/admin/tokens/#database-tokens)
|
||||
with read permissions on the specified database. Uses the `token` setting from
|
||||
the [`influxctl` connection profile](/influxdb/cloud-dedicated/reference/cli/influxctl/#configure-connection-profiles)
|
||||
or the `--token` command flag.
|
||||
- **Database name**: The name of the database to query information about.
|
||||
Uses the `database` setting from the
|
||||
[`influxctl` connection profile](/influxdb/cloud-dedicated/reference/cli/influxctl/#configure-connection-profiles)
|
||||
or the `--database` command flag.
|
||||
- **SQL query**: The SQL query to execute.
|
||||
Pass the query in one of the following ways:
|
||||
|
||||
- a string on the command line
|
||||
- a path to a file that contains the query
|
||||
- a single dash (`-`) to read the query from stdin
|
||||
|
||||
{{% code-placeholders "DATABASE_(TOKEN|NAME)|SQL_QUERY" %}}
|
||||
|
||||
```bash
|
||||
influxctl query \
|
||||
--enable-system-tables \
|
||||
--database DATABASE_NAME \
|
||||
--token DATABASE_TOKEN \
|
||||
"SQL_QUERY"
|
||||
```
|
||||
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
Replace the following:
|
||||
|
||||
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}:
|
||||
A database token with read access to the specified database
|
||||
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}:
|
||||
The name of the database to query information about.
|
||||
- {{% code-placeholder-key %}}`SQL_QUERY`{{% /code-placeholder-key %}}:
|
||||
The SQL query to execute. For examples, see
|
||||
[System query examples](#system-query-examples).
|
||||
|
||||
When prompted, enter `y` to acknowledge the potential impact querying system
|
||||
tables may have on your cluster.
|
||||
|
||||
## Partition-related queries
|
||||
|
||||
Use the following queries to return information about partitions in your
|
||||
{{< product-name omit=" Clustered" >}} cluster.
|
||||
|
||||
- [View partition templates of all tables](#view-partition-templates-of-all-tables)
|
||||
- [View the partition template of a specific table](#view-the-partition-template-of-a-specific-table)
|
||||
- [View all partitions for a table](#view-all-partitions-for-a-table)
|
||||
- [View the number of partitions per table](#view-the-number-of-partitions-per-table)
|
||||
- [View the number of partitions for a specific table](#view-the-number-of-partitions-for-a-specific-table)
|
||||
|
||||
---
|
||||
|
||||
In the examples below, replace {{% code-placeholder-key %}}`TABLE_NAME`{{% /code-placeholder-key %}}
|
||||
with the name of the table you want to query information about.
|
||||
|
||||
---
|
||||
|
||||
{{% code-placeholders "TABLE_NAME_(1|2|3)|TABLE_NAME" %}}
|
||||
|
||||
### View the partition template of a specific table
|
||||
|
||||
```sql
|
||||
SELECT * FROM system.tables WHERE table_name = 'TABLE_NAME'
|
||||
```
|
||||
|
||||
#### Example results
|
||||
|
||||
| table_name | partition_template |
|
||||
| :--------- | :----------------------------------------------------------------------------------------- |
|
||||
| weather | `{"parts":[{"timeFormat":"%Y-%m-%d"},{"bucket":{"tagName":"location","numBuckets":250}}]}` |
|
||||
|
||||
{{% note %}}
|
||||
If a table doesn't include a partition template in the output of this command,
|
||||
the table uses the default (1 day) partition strategy and doesn't partition
|
||||
by tags or tag buckets.
|
||||
{{% /note %}}
|
||||
|
||||
### View all partitions for a table
|
||||
|
||||
```sql
|
||||
SELECT * FROM system.partitions WHERE table_name = 'TABLE_NAME'
|
||||
```
|
||||
|
||||
### Example results
|
||||
|
||||
| partition_id | table_name | partition_key | last_new_file_created_at | num_files | total_size_mb |
|
||||
| -----------: | :--------- | :---------------- | -----------------------: | --------: | ------------: |
|
||||
| 1362 | weather | 43 \| 2020-05-27 | 1683747418763813713 | 1 | 0 |
|
||||
| 800 | weather | 234 \| 2021-08-02 | 1683747421899400796 | 1 | 0 |
|
||||
| 630 | weather | 325 \| 2022-03-17 | 1683747417616689036 | 1 | 0 |
|
||||
| 1401 | weather | 12 \| 2021-01-09 | 1683747417786122295 | 1 | 0 |
|
||||
| 1012 | weather | 115 \| 2022-07-04 | 1683747417614219148 | 1 | 0 |
|
||||
|
||||
### View the number of partitions per table
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
table_name,
|
||||
COUNT(*) AS partition_count
|
||||
FROM
|
||||
system.partitions
|
||||
WHERE
|
||||
table_name IN ('TABLE_NAME_1', 'TABLE_NAME_2', 'TABLE_NAME_3')
|
||||
GROUP BY
|
||||
table_name
|
||||
```
|
||||
|
||||
### Example results
|
||||
|
||||
| table_name | partition_count |
|
||||
| :--------- | --------------: |
|
||||
| weather | 1096 |
|
||||
| home | 24 |
|
||||
| numbers | 1 |
|
||||
|
||||
### View the number of partitions for a specific table
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
COUNT(*) AS partition_count
|
||||
FROM
|
||||
system.partitions
|
||||
WHERE
|
||||
table_name = 'TABLE_NAME'
|
||||
```
|
||||
|
||||
### Example results
|
||||
|
||||
| table_name | partition_count |
|
||||
| :--------- | --------------: |
|
||||
| weather | 1096 |
|
||||
|
||||
{{% /code-placeholders %}}
|
||||
<!--
|
||||
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/view-partitions.md
|
||||
-->
|
||||
|
|
|
@ -46,7 +46,8 @@ Related entries:
|
|||
### aggregate
|
||||
|
||||
A function that returns an aggregated value across a set of points.
|
||||
For a list of available aggregation functions, see [SQL aggregate functions](/influxdb/cloud-dedicated/reference/sql/functions/aggregate/).
|
||||
For a list of available aggregation functions,
|
||||
see [SQL aggregate functions](/influxdb/cloud-dedicated/reference/sql/functions/aggregate/).
|
||||
|
||||
<!-- TODO: Add a link to InfluxQL aggregate functions -->
|
||||
|
||||
|
@ -330,6 +331,7 @@ Related entries:
|
|||
[field](#field),
|
||||
[field key](#field-key),
|
||||
[field set](#field-set),
|
||||
[tag set](#tag-set),
|
||||
[tag value](#tag-value),
|
||||
[timestamp](#timestamp)
|
||||
|
||||
|
@ -356,7 +358,7 @@ Related entries:
|
|||
|
||||
Flush jitter prevents every Telegraf output plugin from sending writes
|
||||
simultaneously, which can overwhelm some data sinks.
|
||||
Each flush interval, every Telegraf output plugin will sleep for a random time
|
||||
Each flush interval, every Telegraf output plugin sleeps for a random time
|
||||
between zero and the flush jitter before emitting metrics.
|
||||
Flush jitter smooths out write spikes when running a large number of Telegraf instances.
|
||||
|
||||
|
@ -400,10 +402,10 @@ Identifiers are tokens that refer to specific database objects such as database
|
|||
names, field keys, measurement names, tag keys, etc.
|
||||
|
||||
Related entries:
|
||||
[database](#database)
|
||||
[database](#database),
|
||||
[field key](#field-key),
|
||||
[measurement](#measurement),
|
||||
[tag key](#tag-key),
|
||||
[tag key](#tag-key)
|
||||
|
||||
### influx
|
||||
|
||||
|
@ -422,8 +424,7 @@ and other required processes.
|
|||
|
||||
### InfluxDB
|
||||
|
||||
An open source time series database (TSDB) developed by InfluxData.
|
||||
Written in Go and optimized for fast, high-availability storage and retrieval of
|
||||
An open source time series database (TSDB) developed by InfluxData, optimized for fast, high-availability storage and retrieval of
|
||||
time series data in fields such as operations monitoring, application metrics,
|
||||
Internet of Things sensor data, and real-time analytics.
|
||||
|
||||
|
@ -435,8 +436,8 @@ The SQL-like query language used to query data in InfluxDB.
|
|||
|
||||
Telegraf input plugins actively gather metrics and deliver them to the core agent,
|
||||
where aggregator, processor, and output plugins can operate on the metrics.
|
||||
In order to activate an input plugin, it needs to be enabled and configured in
|
||||
Telegraf's configuration file.
|
||||
To activate an input plugin, enable and configure it in the
|
||||
Telegraf configuration file.
|
||||
|
||||
Related entries:
|
||||
[aggregator plugin](#aggregator-plugin),
|
||||
|
@ -760,7 +761,7 @@ in the cluster (replication factor), and the time range covered by shard groups
|
|||
(shard group duration). RPs are unique per database and along with the measurement
|
||||
and tag set define a series.
|
||||
|
||||
In {{< product-name >}} the equivalent is [retention period](#retention-period),
|
||||
In {{< product-name >}}, the equivalent is [retention period](#retention-period),
|
||||
however retention periods are not part of the data model.
|
||||
The retention period describes the data persistence behavior of a database.
|
||||
|
||||
|
@ -837,8 +838,8 @@ Related entries:
|
|||
|
||||
### series
|
||||
|
||||
A collection of data in the InfluxDB data structure that share a common
|
||||
_measurement_, _tag set_, and _field key_.
|
||||
In the InfluxDB 3 data structure, a collection of data that share a common
|
||||
_measurement_ and _tag set_.
|
||||
|
||||
Related entries:
|
||||
[field set](#field-set),
|
||||
|
@ -847,12 +848,13 @@ Related entries:
|
|||
|
||||
### series cardinality
|
||||
|
||||
The number of unique measurement, tag set, and field key combinations in an InfluxDB database.
|
||||
The number of unique measurement (table), tag set, and field key combinations in an InfluxDB database.
|
||||
|
||||
For example, assume that an InfluxDB bucket has one measurement.
|
||||
For example, assume that an InfluxDB database has one measurement.
|
||||
The single measurement has two tag keys: `email` and `status`.
|
||||
If there are three different `email`s, and each email address is associated with two
|
||||
different `status`es, the series cardinality for the measurement is 6
|
||||
If there are three different `email` tag values,
|
||||
and each email address is associated with two
|
||||
different `status` tag values, then the series cardinality for the measurement is 6
|
||||
(3 × 2 = 6):
|
||||
|
||||
| email | status |
|
||||
|
@ -867,7 +869,7 @@ different `status`es, the series cardinality for the measurement is 6
|
|||
In some cases, performing this multiplication may overestimate series cardinality
|
||||
because of the presence of dependent tags.
|
||||
Dependent tags are scoped by another tag and do not increase series cardinality.
|
||||
If we add the tag `firstname` to the example above, the series cardinality
|
||||
If we add the tag `firstname` to the preceding example, the series cardinality
|
||||
would not be 18 (3 × 2 × 3 = 18).
|
||||
The series cardinality would remain unchanged at 6, as `firstname` is already scoped by the `email` tag:
|
||||
|
||||
|
@ -892,7 +894,7 @@ A series key identifies a particular series by measurement, tag set, and field k
|
|||
|
||||
For example:
|
||||
|
||||
```
|
||||
```text
|
||||
# measurement, tag set, field key
|
||||
h2o_level, location=santa_monica, h2o_feet
|
||||
```
|
||||
|
@ -1129,18 +1131,17 @@ A statement that sets or updates the value stored in a variable.
|
|||
|
||||
## W
|
||||
|
||||
### WAL (Write Ahead Log) - enterprise
|
||||
### WAL (Write-Ahead Log)
|
||||
|
||||
The temporary cache for recently written points.
|
||||
To reduce the frequency that permanent storage files are accessed, InfluxDB
|
||||
caches new points in the WAL until their total size or age triggers a flush to
|
||||
more permanent storage. This allows for efficient batching of the writes into the TSM.
|
||||
more permanent storage. This allows for efficient batching of the writes into
|
||||
the storage engine.
|
||||
|
||||
Points in the WAL can be queried and persist through a system reboot.
|
||||
On process start, all points in the WAL must be flushed before the system accepts new writes.
|
||||
|
||||
Related entries:
|
||||
[tsm](#tsm-time-structured-merge-tree)
|
||||
Points in the WAL are queryable and persist through a system reboot.
|
||||
On process start, all points in the WAL must be flushed before the system
|
||||
accepts new writes.
|
||||
|
||||
### windowing
|
||||
|
||||
|
|
|
@ -340,6 +340,7 @@ Related entries:
|
|||
[field](#field),
|
||||
[field key](#field-key),
|
||||
[field set](#field-set),
|
||||
[tag set](#tag-set),
|
||||
[tag value](#tag-value),
|
||||
[timestamp](#timestamp)
|
||||
|
||||
|
@ -366,7 +367,7 @@ Related entries:
|
|||
|
||||
Flush jitter prevents every Telegraf output plugin from sending writes
|
||||
simultaneously, which can overwhelm some data sinks.
|
||||
Each flush interval, every Telegraf output plugin will sleep for a random time
|
||||
Each flush interval, every Telegraf output plugin sleeps for a random time
|
||||
between zero and the flush jitter before emitting metrics.
|
||||
Flush jitter smooths out write spikes when running a large number of Telegraf instances.
|
||||
|
||||
|
@ -434,8 +435,7 @@ and other required processes.
|
|||
|
||||
### InfluxDB
|
||||
|
||||
An open source time series database (TSDB) developed by InfluxData.
|
||||
Written in Go and optimized for fast, high-availability storage and retrieval of
|
||||
An open source time series database (TSDB) developed by InfluxData, optimized for fast, high-availability storage and retrieval of
|
||||
time series data in fields such as operations monitoring, application metrics,
|
||||
Internet of Things sensor data, and real-time analytics.
|
||||
|
||||
|
@ -447,8 +447,8 @@ The SQL-like query language used to query data in InfluxDB.
|
|||
|
||||
Telegraf input plugins actively gather metrics and deliver them to the core agent,
|
||||
where aggregator, processor, and output plugins can operate on the metrics.
|
||||
In order to activate an input plugin, it needs to be enabled and configured in
|
||||
Telegraf's configuration file.
|
||||
To activate an input plugin, enable and configure it in the
|
||||
Telegraf configuration file.
|
||||
|
||||
Related entries:
|
||||
[aggregator plugin](#aggregator-plugin),
|
||||
|
@ -471,8 +471,9 @@ Related entries:
|
|||
|
||||
### IOx
|
||||
|
||||
The IOx (InfluxDB v3) storage engine is a real-time, columnar database optimized for time series
|
||||
data built in Rust on top of [Apache Arrow](https://arrow.apache.org/) and
|
||||
The IOx storage engine (InfluxDB v3 storage engine) is a real-time, columnar
|
||||
database optimized for time series data built in Rust on top of
|
||||
[Apache Arrow](https://arrow.apache.org/) and
|
||||
[DataFusion](https://arrow.apache.org/datafusion/user-guide/introduction.html).
|
||||
IOx replaces the [TSM (Time Structured Merge tree)](#tsm-time-structured-merge-tree) storage engine.
|
||||
|
||||
|
@ -848,8 +849,8 @@ Related entries:
|
|||
|
||||
### series
|
||||
|
||||
A collection of data in the InfluxDB data structure that share a common
|
||||
_measurement_, _tag set_, and _field key_.
|
||||
In the InfluxDB 3 data structure, a collection of data that share a common
|
||||
_measurement_ and _tag set_.
|
||||
|
||||
Related entries:
|
||||
[field set](#field-set),
|
||||
|
@ -860,10 +861,11 @@ Related entries:
|
|||
|
||||
The number of unique measurement, tag set, and field key combinations in an {{% product-name %}} bucket.
|
||||
|
||||
For example, assume that an InfluxDB bucket has one measurement.
|
||||
For example, assume that an InfluxDB database has one measurement.
|
||||
The single measurement has two tag keys: `email` and `status`.
|
||||
If there are three different `email`s, and each email address is associated with two
|
||||
different `status`es, the series cardinality for the measurement is 6
|
||||
If there are three different `email` tag values,
|
||||
and each email address is associated with two
|
||||
different `status` tag values, then the series cardinality for the measurement is 6
|
||||
(3 × 2 = 6):
|
||||
|
||||
| email | status |
|
||||
|
@ -878,7 +880,7 @@ different `status`es, the series cardinality for the measurement is 6
|
|||
In some cases, performing this multiplication may overestimate series cardinality
|
||||
because of the presence of dependent tags.
|
||||
Dependent tags are scoped by another tag and do not increase series cardinality.
|
||||
If we add the tag `firstname` to the example above, the series cardinality
|
||||
If we add the tag `firstname` to the preceding example, the series cardinality
|
||||
would not be 18 (3 × 2 × 3 = 18).
|
||||
The series cardinality would remain unchanged at 6, as `firstname` is already scoped by the `email` tag:
|
||||
|
||||
|
@ -1136,18 +1138,17 @@ A statement that sets or updates the value stored in a variable.
|
|||
|
||||
## W
|
||||
|
||||
### WAL (Write Ahead Log) - enterprise
|
||||
### WAL (Write-Ahead Log)
|
||||
|
||||
The temporary cache for recently written points.
|
||||
To reduce the frequency that permanent storage files are accessed, InfluxDB
|
||||
caches new points in the WAL until their total size or age triggers a flush to
|
||||
more permanent storage. This allows for efficient batching of the writes into the TSM.
|
||||
more permanent storage. This allows for efficient batching of the writes into
|
||||
the storage engine.
|
||||
|
||||
Points in the WAL can be queried and persist through a system reboot.
|
||||
On process start, all points in the WAL must be flushed before the system accepts new writes.
|
||||
|
||||
Related entries:
|
||||
[tsm](#tsm-time-structured-merge-tree)
|
||||
Points in the WAL are queryable and persist through a system reboot.
|
||||
On process start, all points in the WAL must be flushed before the system
|
||||
accepts new writes.
|
||||
|
||||
### windowing
|
||||
|
||||
|
|
|
@ -11,409 +11,9 @@ weight: 104
|
|||
influxdb/clustered/tags: [storage]
|
||||
related:
|
||||
- /influxdb/clustered/reference/internals/storage-engine/
|
||||
source: /shared/v3-distributed-admin-custom-partitions/_index.md
|
||||
---
|
||||
|
||||
When writing data to {{< product-name >}}, the InfluxDB v3 storage engine stores
|
||||
data in the [Object store](/influxdb/clustered/reference/internals/storage-engine/#object-store)
|
||||
in [Apache Parquet](https://parquet.apache.org/) format.
|
||||
Each Parquet file represents a _partition_--a logical grouping of data.
|
||||
By default, InfluxDB partitions each table by day.
|
||||
{{< product-name >}} lets you customize the partitioning strategy and partition
|
||||
by tag values and different time intervals.
|
||||
Customize your partitioning strategy to optimize query performance for your
|
||||
specific schema and workload.
|
||||
|
||||
- [Advantages](#advantages)
|
||||
- [Disadvantages](#disadvantages)
|
||||
- [Limitations](#limitations)
|
||||
- [How partitioning works](#how-partitioning-works)
|
||||
- [Partition templates](#partition-templates)
|
||||
- [Partition keys](#partition-keys)
|
||||
- [Partitions in the query life cycle](#partitions-in-the-query-life-cycle)
|
||||
- [Partition guides](#partition-guides)
|
||||
{{< children type="anchored-list" >}}
|
||||
|
||||
## Advantages
|
||||
|
||||
The primary advantage of custom partitioning is that it lets you customize your
|
||||
storage structure to improve query performance specific to your schema and workload.
|
||||
|
||||
- **Optimized storage for improved performance on specific types of queries**.
|
||||
For example, if queries often select data with a specific tag value, you can
|
||||
partition by that tag to improve the performance of those queries.
|
||||
- **Optimized storage for specific types of data**. For example, if the data you
|
||||
store is sparse and the time ranges you query are often much larger than a day,
|
||||
you could partition your data by week instead of by day.
|
||||
|
||||
## Disadvantages
|
||||
|
||||
Using custom partitioning may increase the load on other parts of the
|
||||
[InfluxDB v3 storage engine](/influxdb/clustered/reference/internals/storage-engine/),
|
||||
but each can be scaled individually to address the added load.
|
||||
|
||||
{{% note %}}
|
||||
_The following disadvantages assume that your custom partitioning strategy includes
|
||||
additional tags to partition by or partition intervals smaller than a day._
|
||||
{{% /note %}}
|
||||
|
||||
- **Increased load on the [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester)**
|
||||
as it groups data into smaller partitions and files.
|
||||
- **Increased load on the [Catalog](/influxdb/clustered/reference/internals/storage-engine/#catalog)**
|
||||
as more references to partition Parquet file locations are stored and queried.
|
||||
- **Increased load on the [Compactor](/influxdb/clustered/reference/internals/storage-engine/#compactor)**
|
||||
as more partition Parquet files need to be compacted.
|
||||
- **Increased costs associated with [Object storage](/influxdb/clustered/reference/internals/storage-engine/#object-storage)**
|
||||
as more partition Parquet files are created and stored.
|
||||
- **Risk of decreased performance for queries that don't use tags in the WHERE clause**.
|
||||
These queries may end up reading many partitions and smaller files, degrading performance.
|
||||
|
||||
## Limitations
|
||||
|
||||
Custom partitioning has the following limitations:
|
||||
|
||||
- Database and table partitions can only be defined on create.
|
||||
You cannot update the partition strategy of a database or table after it has
|
||||
been created.
|
||||
- A partition template must include a time part.
|
||||
- You can partition by up to eight dimensions (seven tags and a time interval).
|
||||
|
||||
## How partitioning works
|
||||
|
||||
### Partition templates
|
||||
|
||||
A partition template defines the pattern used for _[partition keys](#partition-keys)_
|
||||
and determines the time interval that data is partitioned by.
|
||||
Partition templates use tag values and
|
||||
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html).
|
||||
|
||||
_For more detailed information, see [Partition templates](/influxdb/clustered/admin/custom-partitions/partition-templates/)._
|
||||
|
||||
### Partition keys
|
||||
|
||||
A partition key uniquely identifies a partition.
|
||||
A _[partition template](#partition-templates)_ defines the partition key format.
|
||||
Partition keys are
|
||||
composed of up to 8 dimensions (1 time part and up to 7 tag or tag bucket parts).
|
||||
Each part is delimited by the partition key separator (`|`).
|
||||
|
||||
The default format for partition keys is `%Y-%m-%d` (for example, `2024-01-01`).
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "View example partition templates and keys" %}}
|
||||
|
||||
Given the following line protocol with the following timestamps:
|
||||
|
||||
- 2023-12-31T23:00:00Z
|
||||
- 2024-01-01T00:00:00Z
|
||||
- 2024-01-01T01:00:00Z
|
||||
|
||||
```text
|
||||
production,line=A,station=cnc temp=81.2,qty=35i 1704063600000000000
|
||||
production,line=A,station=wld temp=92.8,qty=35i 1704063600000000000
|
||||
production,line=B,station=cnc temp=101.1,qty=43i 1704063600000000000
|
||||
production,line=B,station=wld temp=102.4,qty=43i 1704063600000000000
|
||||
production,line=A,station=cnc temp=81.9,qty=36i 1704067200000000000
|
||||
production,line=A,station=wld temp=110.0,qty=22i 1704067200000000000
|
||||
production,line=B,station=cnc temp=101.8,qty=44i 1704067200000000000
|
||||
production,line=B,station=wld temp=105.7,qty=44i 1704067200000000000
|
||||
production,line=A,station=cnc temp=82.2,qty=35i 1704070800000000000
|
||||
production,line=A,station=wld temp=92.1,qty=30i 1704070800000000000
|
||||
production,line=B,station=cnc temp=102.4,qty=43i 1704070800000000000
|
||||
production,line=B,station=wld temp=106.5,qty=43i 1704070800000000000
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 1 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `2023-12-31`
|
||||
- `2024-01-01`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 1 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 2 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `line` <em class="op50">tag</em>
|
||||
- `%d %b %Y` <em class="op50">time (by day, non-default format)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `A | 31 Dec 2023`
|
||||
- `B | 31 Dec 2023`
|
||||
- `A | 01 Jan 2024`
|
||||
- `B | 01 Jan 2024`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 2 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 3 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `line` <em class="op50">tag</em>
|
||||
- `station` <em class="op50">tag</em>
|
||||
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `A | cnc | 2023-12-31`
|
||||
- `A | wld | 2023-12-31`
|
||||
- `B | cnc | 2023-12-31`
|
||||
- `B | wld | 2023-12-31`
|
||||
- `A | cnc | 2024-01-01`
|
||||
- `A | wld | 2024-01-01`
|
||||
- `B | cnc | 2024-01-01`
|
||||
- `B | wld | 2024-01-01`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 3 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 4 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `line` <em class="op50">tag</em>
|
||||
- `station,3` <em class="op50">tag bucket</em>
|
||||
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `A | 0 | 2023-12-31`
|
||||
- `B | 0 | 2023-12-31`
|
||||
- `A | 0 | 2024-01-01`
|
||||
- `B | 0 | 2024-01-01`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 4 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 5 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `line` <em class="op50">tag</em>
|
||||
- `station` <em class="op50">tag</em>
|
||||
- `%Y-%m-%d %H:00` <em class="op50">time (by hour)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `A | cnc | 2023-12-31 23:00`
|
||||
- `A | wld | 2023-12-31 23:00`
|
||||
- `B | cnc | 2023-12-31 23:00`
|
||||
- `B | wld | 2023-12-31 23:00`
|
||||
- `A | cnc | 2024-01-01 00:00`
|
||||
- `A | wld | 2024-01-01 00:00`
|
||||
- `B | cnc | 2024-01-01 00:00`
|
||||
- `B | wld | 2024-01-01 00:00`
|
||||
- `A | cnc | 2024-01-01 01:00`
|
||||
- `A | wld | 2024-01-01 01:00`
|
||||
- `B | cnc | 2024-01-01 01:00`
|
||||
- `B | wld | 2024-01-01 01:00`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 5 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 6 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `line` <em class="op50">tag</em>
|
||||
- `station,50` <em class="op50">tag bucket</em>
|
||||
- `%Y-%m-%d %H:00` <em class="op50">time (by hour)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `A | 47 | 2023-12-31 23:00`
|
||||
- `A | 9 | 2023-12-31 23:00`
|
||||
- `B | 47 | 2023-12-31 23:00`
|
||||
- `B | 9 | 2023-12-31 23:00`
|
||||
- `A | 47 | 2024-01-01 00:00`
|
||||
- `A | 9 | 2024-01-01 00:00`
|
||||
- `B | 47 | 2024-01-01 00:00`
|
||||
- `B | 9 | 2024-01-01 00:00`
|
||||
- `A | 47 | 2024-01-01 01:00`
|
||||
- `A | 9 | 2024-01-01 01:00`
|
||||
- `B | 47 | 2024-01-01 01:00`
|
||||
- `B | 9 | 2024-01-01 01:00`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 6 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
## Partitions in the query life cycle
|
||||
|
||||
When querying data:
|
||||
|
||||
1. The [Catalog](/influxdb/clustered/reference/internals/storage-engine/#catalog)
|
||||
provides the v3 query engine ([Querier](/influxdb/clustered/reference/internals/storage-engine/#querier))
|
||||
with the locations of partitions that contain the queried time series data.
|
||||
2. The query engine reads all rows in the returned partitions to identify what
|
||||
rows match the logic in the query and should be included in the query result.
|
||||
|
||||
The faster the query engine can identify what partitions to read and then read
|
||||
the data in those partitions, the more performant queries are.
|
||||
|
||||
_For more information about the query lifecycle, see
|
||||
[InfluxDB v3 query life cycle](/influxdb/clustered/reference/internals/storage-engine/#query-life-cycle)._
|
||||
|
||||
##### Query example
|
||||
|
||||
Consider the following query that selects everything in the `production` table
|
||||
where the `line` tag is `A` and the `station` tag is `cnc`:
|
||||
|
||||
```sql
|
||||
SELECT *
|
||||
FROM production
|
||||
WHERE
|
||||
time >= now() - INTERVAL '1 week'
|
||||
AND line = 'A'
|
||||
AND station = 'cnc'
|
||||
```
|
||||
|
||||
Using the default partitioning strategy (by day), the query engine
|
||||
reads eight separate partitions (one partition for today and one for each of the
|
||||
last seven days):
|
||||
|
||||
- {{< datetime/current-date trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-1 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-2 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-3 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-4 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-5 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-6 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-7 trimTime=true >}}
|
||||
|
||||
The query engine must scan _all_ rows in the partitions to identify rows
|
||||
where `line` is `A` and `station` is `cnc`. This process takes valuable time
|
||||
and results in less performant queries.
|
||||
|
||||
However, if you partition by other tags, InfluxDB can identify partitions that
|
||||
contain only the tag values your query needs and spend less time
|
||||
scanning rows to see if they contain the tag values.
|
||||
|
||||
For example, if data is partitioned by `line`, `station`, and day, although
|
||||
there are more partition files, the query engine can quickly identify and read
|
||||
only those with data relevant to the query:
|
||||
|
||||
{{% columns 4 %}}
|
||||
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-1 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-1 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-1 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-1 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-2 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-2 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-2 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-2 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-3 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-3 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-3 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-3 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-4 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-4 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-4 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-4 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-5 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-5 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-5 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-5 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-6 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-6 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-6 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-6 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-7 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-7 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-7 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-7 trimTime=true >}}
|
||||
|
||||
{{% /columns %}}
|
||||
|
||||
---
|
||||
|
||||
## Partition guides
|
||||
|
||||
{{< children >}}
|
||||
<!--
|
||||
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_index.md
|
||||
-->
|
||||
|
|
|
@ -8,49 +8,9 @@ menu:
|
|||
name: Best practices
|
||||
parent: Manage data partitioning
|
||||
weight: 202
|
||||
source: /shared/v3-distributed-admin-custom-partitions/best-practices.md
|
||||
---
|
||||
|
||||
Use the following best practices when defining custom partitioning strategies
|
||||
for your data stored in {{< product-name >}}.
|
||||
|
||||
- [Partition by tags that you commonly query for a specific value](#partition-by-tags-that-you-commonly-query-for-a-specific-value)
|
||||
- [Only partition by tags that _always_ have a value](#only-partition-by-tags-that-always-have-a-value)
|
||||
- [Avoid over-partitioning](#avoid-over-partitioning)
|
||||
|
||||
## Partition by tags that you commonly query for a specific value
|
||||
|
||||
Custom partitioning primarily benefits queries that look for a specific tag
|
||||
value in the `WHERE` clause. For example, if you often query data related to a
|
||||
specific ID, partitioning by the tag that stores the ID helps the InfluxDB
|
||||
query engine to more quickly identify what partitions contain the relevant data.
|
||||
|
||||
{{% note %}}
|
||||
|
||||
#### Use tag buckets for high-cardinality tags
|
||||
|
||||
Partitioning using distinct values of tags with many (10K+) unique values can
|
||||
actually hurt query performance as partitions are created for each unique tag value.
|
||||
Instead, use [tag buckets](/influxdb/clustered/admin/custom-partitions/partition-templates/#tag-bucket-part-templates)
|
||||
to partition by high-cardinality tags.
|
||||
This method of partitioning groups tag values into "buckets" and partitions by bucket.
|
||||
{{% /note %}}
|
||||
|
||||
## Only partition by tags that _always_ have a value
|
||||
|
||||
You should only partition by tags that _always_ have a value.
|
||||
If points don't have a value for the tag, InfluxDB can't store them in the correct partitions and, at query time, must read all the partitions.
|
||||
|
||||
## Avoid over-partitioning
|
||||
|
||||
As you plan your partitioning strategy, keep in mind that data can be
|
||||
"over-partitioned"--meaning partitions are so granular that queries end up
|
||||
having to retrieve and read many partitions from the object store, which
|
||||
hurts query performance.
|
||||
|
||||
- Balance the partition time interval with the actual amount of data written
|
||||
during each interval. If a single interval doesn't contain a lot of data,
|
||||
it is better to partition by larger time intervals.
|
||||
- Don't partition by tags that you typically don't use in your query workload.
|
||||
- Don't partition by distinct values of high-cardinality tags.
|
||||
Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to
|
||||
partition by these tags.
|
||||
<!--
|
||||
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_index.md
|
||||
-->
|
||||
|
|
|
@ -10,161 +10,9 @@ weight: 202
|
|||
related:
|
||||
- /influxdb/clustered/reference/cli/influxctl/database/create/
|
||||
- /influxdb/clustered/reference/cli/influxctl/table/create/
|
||||
source: /shared/v3-distributed-admin-custom-partitions/define-custom-partitions.md
|
||||
---
|
||||
|
||||
Use the [`influxctl` CLI](/influxdb/clustered/reference/cli/influxctl/)
|
||||
to define custom partition strategies when creating a database or table.
|
||||
By default, {{< product-name >}} partitions data by day.
|
||||
|
||||
The partitioning strategy of a database or table is determined by a
|
||||
[partition template](/influxdb/clustered/admin/custom-partitions/#partition-templates)
|
||||
which defines the naming pattern for [partition keys](/influxdb/clustered/admin/custom-partitions/#partition-keys).
|
||||
Partition keys uniquely identify each partition.
|
||||
When a partition template is applied to a database, it becomes the default template
|
||||
for all tables in that database, but can be overridden when creating a
|
||||
table.
|
||||
|
||||
- [Create a database with a custom partition template](#create-a-database-with-a-custom-partition-template)
|
||||
- [Create a table with a custom partition template](#create-a-table-with-a-custom-partition-template)
|
||||
- [Example partition templates](#example-partition-templates)
|
||||
|
||||
{{% warn %}}
|
||||
|
||||
#### Partition templates can only be applied on create
|
||||
|
||||
You can only apply a partition template when creating a database or table.
|
||||
You can't update a partition template on an existing resource.
|
||||
{{% /warn %}}
|
||||
|
||||
Use the following command flags to identify
|
||||
[partition template parts](/influxdb/clustered/admin/custom-partitions/partition-templates/#tag-part-templates):
|
||||
|
||||
- `--template-tag`: An [InfluxDB tag](/influxdb/clustered/reference/glossary/#tag)
|
||||
to use in the partition template.
|
||||
- `--template-tag-bucket`: An [InfluxDB tag](/influxdb/clustered/reference/glossary/#tag)
|
||||
and number of "buckets" to group tag values into.
|
||||
Provide the tag key and the number of buckets to bucket tag values into
|
||||
separated by a comma: `tagKey,N`.
|
||||
- `--template-timeformat`: A [Rust strftime date and time](/influxdb/clustered/admin/custom-partitions/partition-templates/#time-part-templates)
|
||||
string that specifies the time format in the partition template and determines
|
||||
the time interval to partition by.
|
||||
|
||||
{{% note %}}
|
||||
A partition template can include up to 7 total tag and tag bucket parts
|
||||
and only 1 time part.
|
||||
{{% /note %}}
|
||||
|
||||
_View [partition template part restrictions](/influxdb/clustered/admin/custom-partitions/partition-templates/#restrictions)._
|
||||
|
||||
{{% note %}}
|
||||
#### Always provide a time format when using custom partitioning
|
||||
|
||||
When defining a custom partition template for your database or table using any
|
||||
of the `influxctl` `--template-*` flags, always include the `--template-timeformat`
|
||||
flag with a time format to use in your partition template.
|
||||
Otherwise, InfluxDB omits time from the partition template and won't compact partitions.
|
||||
{{% /note %}}
|
||||
|
||||
## Create a database with a custom partition template
|
||||
|
||||
The following example creates a new `example-db` database and applies a partition
|
||||
template that partitions by distinct values of two tags (`room` and `sensor-type`),
|
||||
bucketed values of the `customerID` tag, and by day using the time format `%Y-%m-%d`:
|
||||
|
||||
<!--Skip database create and delete tests: namespaces aren't reusable-->
|
||||
<!--pytest.mark.skip-->
|
||||
|
||||
```sh
|
||||
influxctl database create \
|
||||
--template-tag room \
|
||||
--template-tag sensor-type \
|
||||
--template-tag-bucket customerID,500 \
|
||||
--template-timeformat '%Y-%m-%d' \
|
||||
example-db
|
||||
```
|
||||
|
||||
## Create a table with a custom partition template
|
||||
|
||||
The following example creates a new `example-table` table in the specified
|
||||
database and applies a partition template that partitions by distinct values of
|
||||
two tags (`room` and `sensor-type`), bucketed values of the `customerID` tag,
|
||||
and by month using the time format `%Y-%m`:
|
||||
|
||||
<!--Skip database create and delete tests: namespaces aren't reusable-->
|
||||
<!--pytest.mark.skip-->
|
||||
|
||||
{{% code-placeholders "DATABASE_NAME" %}}
|
||||
|
||||
```sh
|
||||
influxctl table create \
|
||||
--template-tag room \
|
||||
--template-tag sensor-type \
|
||||
--template-tag-bucket customerID,500 \
|
||||
--template-timeformat '%Y-%m' \
|
||||
DATABASE_NAME \
|
||||
example-table
|
||||
```
|
||||
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
Replace the following in your command:
|
||||
|
||||
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} [database](/influxdb/clustered/admin/databases/)
|
||||
|
||||
<!--actual test
|
||||
|
||||
```sh
|
||||
|
||||
# Test the preceding command outside of the code block.
|
||||
# influxctl authentication requires TTY interaction--
|
||||
# output the auth URL to a file that the host can open.
|
||||
|
||||
TABLE_NAME=table_TEST_RUN
|
||||
script -c "influxctl table create \
|
||||
--template-tag room \
|
||||
--template-tag sensor-type \
|
||||
--template-tag-bucket customerID,500 \
|
||||
--template-timeformat '%Y-%m' \
|
||||
DATABASE_NAME \
|
||||
$TABLE_NAME" \
|
||||
/dev/null > /shared/urls.txt
|
||||
|
||||
script -c "influxctl query \
|
||||
--database DATABASE_NAME \
|
||||
--token DATABASE_TOKEN \
|
||||
'SHOW TABLES'" > /shared/temp_tables.txt
|
||||
grep -q $TABLE_NAME /shared/temp_tables.txt
|
||||
rm /shared/temp_tables.txt
|
||||
```
|
||||
|
||||
<!--
|
||||
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_define-custom-partitions.md
|
||||
-->
|
||||
|
||||
## Example partition templates
|
||||
|
||||
Given the following [line protocol](/influxdb/clustered/reference/syntax/line-protocol/)
|
||||
with a `2024-01-01T00:00:00Z` timestamp:
|
||||
|
||||
```text
|
||||
prod,line=A,station=weld1 temp=81.9,qty=36i 1704067200000000000
|
||||
```
|
||||
|
||||
##### Partitioning by distinct tag values
|
||||
|
||||
| Description | Tag parts | Time part | Resulting partition key |
|
||||
| :---------------------- | :---------------- | :--------- | :----------------------- |
|
||||
| By day (default) | | `%Y-%m-%d` | 2024-01-01 |
|
||||
| By month | | `%Y-%m` | 2024-01 |
|
||||
| By year | | `%Y` | 2024 |
|
||||
| Single tag, by day | `line` | `%Y-%m-%d` | A \| 2024-01-01 |
|
||||
| Single tag, by month | `line` | `%Y-%m` | A \| 2024-01 |
|
||||
| Single tag, by year | `line` | `%Y` | A \| 2024 |
|
||||
| Multiple tags, by day | `line`, `station` | `%Y-%m-%d` | A \| weld1 \| 2024-01-01 |
|
||||
| Multiple tags, by month | `line`, `station` | `%Y-%m` | A \| weld1 \| 2024-01 |
|
||||
| Multiple tags, by year | `line`, `station` | `%Y` | A \| weld1 \| 2024 |
|
||||
|
||||
##### Partition by tag buckets
|
||||
|
||||
| Description | Tag part | Tag bucket part | Time part | Resulting partition key |
|
||||
| :---------------------------------- | :------- | :-------------- | :--------- | :---------------------- |
|
||||
| Distinct tag, tag buckets, by day | `line` | `station,100` | `%Y-%m-%d` | A \| 3 \| 2024-01-01 |
|
||||
| Distinct tag, tag buckets, by month | `line` | `station,500` | `%Y-%m` | A \| 303 \| 2024-01 |
|
||||
|
|
|
@ -8,124 +8,9 @@ menu:
|
|||
influxdb_clustered:
|
||||
parent: Manage data partitioning
|
||||
weight: 202
|
||||
source: /shared/v3-distributed-admin-custom-partitions/partition-templates.md
|
||||
---
|
||||
|
||||
Use partition templates to define the patterns used to generate partition keys.
|
||||
A partition key uniquely identifies a partition and is used to name the partition
|
||||
Parquet file in the [Object store](/influxdb/clustered/reference/internals/storage-engine/#object-store).
|
||||
|
||||
A partition template consists of 1-8 _template parts_---dimensions to partition data by.
|
||||
Three types of template parts exist:
|
||||
|
||||
- **tag**: An [InfluxDB tag](/influxdb/clustered/reference/glossary/#tag)
|
||||
to partition by.
|
||||
- **tag bucket**: An [InfluxDB tag](/influxdb/clustered/reference/glossary/#tag)
|
||||
and number of "buckets" to group tag values into. Data is partitioned by the
|
||||
tag bucket rather than each distinct tag value.
|
||||
- {{< req type="key" >}} **time**: A Rust strftime date and time string that specifies the time interval
|
||||
to partition data by. The smallest unit of time included in the time part
|
||||
template is the interval used to partition data.
|
||||
|
||||
{{% note %}}
|
||||
A partition template must include 1 [time part](#time-part-templates)
|
||||
and can include up to 7 total [tag](#tag-part-templates) and [tag bucket](#tag-bucket-part-templates) parts.
|
||||
{{% /note %}}
|
||||
|
||||
<!-- TOC -->
|
||||
- [Restrictions](#restrictions)
|
||||
- [Template part size limit](#template-part-size-limit)
|
||||
- [Reserved keywords](#reserved-keywords)
|
||||
- [Reserved Characters](#reserved-characters)
|
||||
- [Tag part templates](#tag-part-templates)
|
||||
- [Tag bucket part templates](#tag-bucket-part-templates)
|
||||
- [Time part templates](#time-part-templates)
|
||||
<!-- /TOC -->
|
||||
|
||||
## Restrictions
|
||||
|
||||
### Template part size limit
|
||||
|
||||
Each template part is limited to 200 bytes in length.
|
||||
Anything longer will be truncated at 200 bytes and appended with `#`.
|
||||
|
||||
### Partition key size limit
|
||||
|
||||
With the truncation of template parts, the maximum length of a partition key is
|
||||
1,607 bytes (1.57 KiB).
|
||||
|
||||
### Reserved keywords
|
||||
|
||||
The following reserved keywords cannot be used in partition templates:
|
||||
|
||||
- `time`
|
||||
|
||||
### Reserved Characters
|
||||
|
||||
If used in template parts, non-ASCII characters and the following reserved
|
||||
characters must be [percent encoded](https://developer.mozilla.org/en-US/docs/Glossary/Percent-encoding):
|
||||
|
||||
- `|`: Partition key part delimiter
|
||||
- `!`: Null or missing partition key part
|
||||
- `^`: Empty string partition key part
|
||||
- `#`: Key part truncation marker
|
||||
- `%`: Required for unambiguous reversal of percent encoding
|
||||
|
||||
## Tag part templates
|
||||
|
||||
Tag part templates consist of a _tag key_ to partition by.
|
||||
Generated partition keys include the unique _tag value_ specific to each partition.
|
||||
|
||||
A partition template may include a given tag key only once in template parts
|
||||
that operate on tags (tag value and tag bucket)--for example:
|
||||
|
||||
If a template partitions on unique values of `tag_A`, then
|
||||
you can't use `tag_A` as a tag bucket part.
|
||||
|
||||
## Tag bucket part templates
|
||||
|
||||
Tag bucket part templates consist of a _tag key_ to partition by and the
|
||||
_number of "buckets" to partition tag values into_--for example:
|
||||
|
||||
```
|
||||
customerID,500
|
||||
```
|
||||
|
||||
Values of the `customerID` tag are bucketed into 500 distinct "buckets."
|
||||
Each bucket is identified by the remainder of the tag value hashed into a 32bit
|
||||
integer divided by the specified number of buckets:
|
||||
|
||||
```rust
|
||||
hash(tagValue) % N
|
||||
```
|
||||
|
||||
Generated partition keys include the unique _tag bucket identifier_ specific to
|
||||
each partition.
|
||||
|
||||
**Supported number of tag buckets**: 1-1,000
|
||||
|
||||
{{% note %}}
|
||||
Tag buckets should be used to partition by high cardinality tags or tags with an
|
||||
unknown number of distinct values.
|
||||
{{% /note %}}
|
||||
|
||||
A partition template may include a given tag key only once in template parts
|
||||
that operate on tags (tag value and tag bucket)--for example:
|
||||
|
||||
If a template partitions on unique values of `tag_A`, then
|
||||
you can't use `tag_A` as a tag bucket part.
|
||||
|
||||
## Time part templates
|
||||
|
||||
Time part templates use a limited subset of the
|
||||
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)
|
||||
to specify time format in partition keys.
|
||||
InfluxDB uses the smallest unit of time included in the time part template as
|
||||
the partition interval.
|
||||
|
||||
### Date specifiers
|
||||
|
||||
| Variable | Example | Description |
|
||||
| :------: | :----------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `%Y` | `2001` | The full proleptic Gregorian year, zero-padded to 4 digits. chrono supports years from -262144 to 262143. Note: years before 1 BCE or after 9999 CE, require an initial sign (+/-). |
|
||||
| `%m` | `07` | Month number (01--12), zero-padded to 2 digits. |
|
||||
| `%d` | `08` | Day number (01--31), zero-padded to 2 digits. |
|
||||
<!--
|
||||
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/_partition-templates.md
|
||||
-->
|
||||
|
|
|
@ -14,173 +14,9 @@ list_code_example: |
|
|||
```
|
||||
related:
|
||||
- /influxdb/clustered/admin/query-system-data/
|
||||
source: /shared/v3-distributed-admin-custom-partitions/view-partitions.md
|
||||
---
|
||||
|
||||
{{< product-name >}} stores partition information in InfluxDB v3 system tables.
|
||||
Query partition information to view partition templates and verify partitions
|
||||
are working as intended.
|
||||
|
||||
- [Query partition information from system tables](#query-partition-information-from-system-tables)
|
||||
- [Partition-related queries](#partition-related-queries)
|
||||
|
||||
{{% warn %}}
|
||||
#### Querying system tables may impact overall cluster performance
|
||||
|
||||
Partition information is stored in InfluxDB v3 system tables.
|
||||
Querying system tables may impact the overall write and query performance of
|
||||
your {{< product-name omit=" Clustered" >}} cluster.
|
||||
|
||||
<!--------------- UPDATE THE DATE BELOW AS EXAMPLES ARE UPDATED --------------->
|
||||
|
||||
#### System tables are subject to change
|
||||
|
||||
System tables are not part of InfluxDB's stable API and may change with new releases.
|
||||
The provided schema information and query examples are valid as of **September 24, 2024**.
|
||||
If you detect a schema change or a non-functioning query example, please
|
||||
[submit an issue](https://github.com/influxdata/docs-v2/issues/new/choose).
|
||||
|
||||
<!--------------- UPDATE THE DATE ABOVE AS EXAMPLES ARE UPDATED --------------->
|
||||
|
||||
{{% /warn %}}
|
||||
|
||||
## Query partition information from system tables
|
||||
|
||||
Use the [`influxctl query` command](/influxdb/clustered/reference/cli/influxctl/query/)
|
||||
and SQL to query partition-related information from InfluxDB system tables.
|
||||
Provide the following:
|
||||
|
||||
- **Enable system tables** with the `--enable-system-tables` command flag.
|
||||
- **Database token**: A [database token](/influxdb/clustered/admin/tokens/#database-tokens)
|
||||
with read permissions on the specified database. Uses the `token` setting from
|
||||
the [`influxctl` connection profile](/influxdb/clustered/reference/cli/influxctl/#configure-connection-profiles)
|
||||
or the `--token` command flag.
|
||||
- **Database name**: The name of the database to query information about.
|
||||
Uses the `database` setting from the
|
||||
[`influxctl` connection profile](/influxdb/clustered/reference/cli/influxctl/#configure-connection-profiles)
|
||||
or the `--database` command flag.
|
||||
- **SQL query**: The SQL query to execute.
|
||||
Pass the query in one of the following ways:
|
||||
|
||||
- a string on the command line
|
||||
- a path to a file that contains the query
|
||||
- a single dash (`-`) to read the query from stdin
|
||||
|
||||
{{% code-placeholders "DATABASE_(TOKEN|NAME)|SQL_QUERY" %}}
|
||||
|
||||
```bash
|
||||
influxctl query \
|
||||
--enable-system-tables \
|
||||
--database DATABASE_NAME \
|
||||
--token DATABASE_TOKEN \
|
||||
"SQL_QUERY"
|
||||
```
|
||||
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
Replace the following:
|
||||
|
||||
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}:
|
||||
A database token with read access to the specified database
|
||||
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}:
|
||||
The name of the database to query information about.
|
||||
- {{% code-placeholder-key %}}`SQL_QUERY`{{% /code-placeholder-key %}}:
|
||||
The SQL query to execute. For examples, see
|
||||
[System query examples](#system-query-examples).
|
||||
|
||||
When prompted, enter `y` to acknowledge the potential impact querying system
|
||||
tables may have on your cluster.
|
||||
|
||||
## Partition-related queries
|
||||
|
||||
Use the following queries to return information about partitions in your
|
||||
{{< product-name omit=" Clustered" >}} cluster.
|
||||
|
||||
- [View partition templates of all tables](#view-partition-templates-of-all-tables)
|
||||
- [View the partition template of a specific table](#view-the-partition-template-of-a-specific-table)
|
||||
- [View all partitions for a table](#view-all-partitions-for-a-table)
|
||||
- [View the number of partitions per table](#view-the-number-of-partitions-per-table)
|
||||
- [View the number of partitions for a specific table](#view-the-number-of-partitions-for-a-specific-table)
|
||||
|
||||
---
|
||||
|
||||
In the examples below, replace {{% code-placeholder-key %}}`TABLE_NAME`{{% /code-placeholder-key %}}
|
||||
with the name of the table you want to query information about.
|
||||
|
||||
---
|
||||
|
||||
{{% code-placeholders "TABLE_NAME_(1|2|3)|TABLE_NAME" %}}
|
||||
|
||||
### View the partition template of a specific table
|
||||
|
||||
```sql
|
||||
SELECT * FROM system.tables WHERE table_name = 'TABLE_NAME'
|
||||
```
|
||||
|
||||
#### Example results
|
||||
|
||||
| table_name | partition_template |
|
||||
| :--------- | :----------------------------------------------------------------------------------------- |
|
||||
| weather | `{"parts":[{"timeFormat":"%Y-%m-%d"},{"bucket":{"tagName":"location","numBuckets":250}}]}` |
|
||||
|
||||
{{% note %}}
|
||||
If a table doesn't include a partition template in the output of this command,
|
||||
the table uses the default (1 day) partition strategy and doesn't partition
|
||||
by tags or tag buckets.
|
||||
{{% /note %}}
|
||||
|
||||
### View all partitions for a table
|
||||
|
||||
```sql
|
||||
SELECT * FROM system.partitions WHERE table_name = 'TABLE_NAME'
|
||||
```
|
||||
|
||||
### Example results
|
||||
|
||||
| partition_id | table_name | partition_key | last_new_file_created_at | num_files | total_size_mb |
|
||||
| -----------: | :--------- | :---------------- | -----------------------: | --------: | ------------: |
|
||||
| 1362 | weather | 43 \| 2020-05-27 | 1683747418763813713 | 1 | 0 |
|
||||
| 800 | weather | 234 \| 2021-08-02 | 1683747421899400796 | 1 | 0 |
|
||||
| 630 | weather | 325 \| 2022-03-17 | 1683747417616689036 | 1 | 0 |
|
||||
| 1401 | weather | 12 \| 2021-01-09 | 1683747417786122295 | 1 | 0 |
|
||||
| 1012 | weather | 115 \| 2022-07-04 | 1683747417614219148 | 1 | 0 |
|
||||
|
||||
### View the number of partitions per table
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
table_name,
|
||||
COUNT(*) AS partition_count
|
||||
FROM
|
||||
system.partitions
|
||||
WHERE
|
||||
table_name IN ('TABLE_NAME_1', 'TABLE_NAME_2', 'TABLE_NAME_3')
|
||||
GROUP BY
|
||||
table_name
|
||||
```
|
||||
|
||||
### Example results
|
||||
|
||||
| table_name | partition_count |
|
||||
| :--------- | --------------: |
|
||||
| weather | 1096 |
|
||||
| home | 24 |
|
||||
| numbers | 1 |
|
||||
|
||||
### View the number of partitions for a specific table
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
COUNT(*) AS partition_count
|
||||
FROM
|
||||
system.partitions
|
||||
WHERE
|
||||
table_name = 'TABLE_NAME'
|
||||
```
|
||||
|
||||
### Example results
|
||||
|
||||
| table_name | partition_count |
|
||||
| :--------- | --------------: |
|
||||
| weather | 1096 |
|
||||
|
||||
{{% /code-placeholders %}}
|
||||
<!--
|
||||
The content of this page is at /content/shared/v3-distributed-admin-custom-partitions/view-partitions.md
|
||||
-->
|
||||
|
|
|
@ -46,7 +46,8 @@ Related entries:
|
|||
### aggregate
|
||||
|
||||
A function that returns an aggregated value across a set of points.
|
||||
For a list of available aggregation functions, see [SQL aggregate functions](/influxdb/clustered/reference/sql/functions/aggregate/).
|
||||
For a list of available aggregation functions,
|
||||
see [SQL aggregate functions](/influxdb/clustered/reference/sql/functions/aggregate/).
|
||||
|
||||
<!-- TODO: Add a link to InfluxQL aggregate functions -->
|
||||
|
||||
|
@ -333,6 +334,7 @@ Related entries:
|
|||
[field](#field),
|
||||
[field key](#field-key),
|
||||
[field set](#field-set),
|
||||
[tag set](#tag-set),
|
||||
[tag value](#tag-value),
|
||||
[timestamp](#timestamp)
|
||||
|
||||
|
@ -403,10 +405,10 @@ Identifiers are tokens that refer to specific database objects such as database
|
|||
names, field keys, measurement names, tag keys, etc.
|
||||
|
||||
Related entries:
|
||||
[database](#database)
|
||||
[database](#database),
|
||||
[field key](#field-key),
|
||||
[measurement](#measurement),
|
||||
[tag key](#tag-key),
|
||||
[tag key](#tag-key)
|
||||
|
||||
### influx
|
||||
|
||||
|
@ -425,8 +427,8 @@ and other required processes.
|
|||
|
||||
### InfluxDB
|
||||
|
||||
An open source time series database (TSDB) developed by InfluxData.
|
||||
Written in Go and optimized for fast, high-availability storage and retrieval of
|
||||
An open source time series database (TSDB) developed by InfluxData, optimized
|
||||
for fast, high-availability storage and retrieval of
|
||||
time series data in fields such as operations monitoring, application metrics,
|
||||
Internet of Things sensor data, and real-time analytics.
|
||||
|
||||
|
@ -438,8 +440,8 @@ The SQL-like query language used to query data in InfluxDB.
|
|||
|
||||
Telegraf input plugins actively gather metrics and deliver them to the core agent,
|
||||
where aggregator, processor, and output plugins can operate on the metrics.
|
||||
In order to activate an input plugin, it needs to be enabled and configured in
|
||||
Telegraf's configuration file.
|
||||
To activate an input plugin, enable and configure it in the
|
||||
Telegraf configuration file.
|
||||
|
||||
Related entries:
|
||||
[aggregator plugin](#aggregator-plugin),
|
||||
|
@ -752,7 +754,7 @@ relative to [now](#now).
|
|||
The minimum retention period is **one hour**.
|
||||
|
||||
Related entries:
|
||||
[bucket](#bucket),
|
||||
[bucket](#bucket)
|
||||
|
||||
### retention policy (RP)
|
||||
|
||||
|
@ -839,8 +841,8 @@ Related entries:
|
|||
|
||||
### series
|
||||
|
||||
A collection of data in the InfluxDB data structure that share a common
|
||||
_measurement_, _tag set_, and _field key_.
|
||||
In the InfluxDB 3 data structure, a collection of data that share a common
|
||||
_measurement_ and _tag set_.
|
||||
|
||||
Related entries:
|
||||
[field set](#field-set),
|
||||
|
@ -849,12 +851,13 @@ Related entries:
|
|||
|
||||
### series cardinality
|
||||
|
||||
The number of unique measurement, tag set, and field key combinations in an InfluxDB database.
|
||||
The number of unique measurement (table), tag set, and field key combinations in an InfluxDB database.
|
||||
|
||||
For example, assume that an InfluxDB database has one measurement.
|
||||
The single measurement has two tag keys: `email` and `status`.
|
||||
If there are three different `email`s, and each email address is associated with two
|
||||
different `status`es, the series cardinality for the measurement is 6
|
||||
If there are three different `email` tag values,
|
||||
and each email address is associated with two
|
||||
different `status` tag values, then the series cardinality for the measurement is 6
|
||||
(3 × 2 = 6):
|
||||
|
||||
| email | status |
|
||||
|
@ -869,7 +872,7 @@ different `status`es, the series cardinality for the measurement is 6
|
|||
In some cases, performing this multiplication may overestimate series cardinality
|
||||
because of the presence of dependent tags.
|
||||
Dependent tags are scoped by another tag and do not increase series cardinality.
|
||||
If we add the tag `firstname` to the example above, the series cardinality
|
||||
If we add the tag `firstname` to the preceding example, the series cardinality
|
||||
would not be 18 (3 × 2 × 3 = 18).
|
||||
The series cardinality would remain unchanged at 6, as `firstname` is already scoped by the `email` tag:
|
||||
|
||||
|
@ -1048,7 +1051,7 @@ Related entries: [aggregate](#aggregate), [function](#function), [selector](#sel
|
|||
|
||||
The InfluxDB v1 and v2 data storage format that allows greater compaction and
|
||||
higher write and read throughput than B+ or LSM tree implementations.
|
||||
The TSM storage engine has been replaced by [the InfluxDB v3 storage engine (IOx)](#iox).
|
||||
The TSM storage engine has been replaced by the [InfluxDB v3 storage engine (IOx)](#iox).
|
||||
|
||||
Related entries:
|
||||
[IOx](#iox)
|
||||
|
@ -1143,9 +1146,6 @@ Points in the WAL are queryable and persist through a system reboot.
|
|||
On process start, all points in the WAL must be flushed before the system
|
||||
accepts new writes.
|
||||
|
||||
Related entries:
|
||||
[tsm](#tsm-time-structured-merge-tree)
|
||||
|
||||
### windowing
|
||||
|
||||
Grouping data based on specified time intervals.
|
||||
|
|
|
@ -0,0 +1,25 @@
|
|||
# Shared content
|
||||
|
||||
This section is for content shared across multiple products and versions.
|
||||
The `/shared/_index.md` frontmatter, marks the `/shared` directory and its
|
||||
children as draft so they
|
||||
don't get rendered when the site is built, but the contents of each shared
|
||||
documented is included in pages that use the file as a `source` in their
|
||||
frontmatter.
|
||||
|
||||
## Use shared content
|
||||
|
||||
1. Create a new folder for the content in the `content/shared/` directory.
|
||||
2. Copy the markdown files into the new folder.
|
||||
3. Remove the frontmatter from the markdown files in the shared directory. If the first line starts with a shortcode, add an HTML comment before the first line, otherwise hugo will err.
|
||||
4. In each of the files that use the shared content, add a source to the frontmatter that points to the shared markdown file—for example:
|
||||
|
||||
```markdown
|
||||
source: /shared/influxql-v3-reference/regular-expressions.md
|
||||
```
|
||||
|
||||
5. In the doc body, remove the shared Markdown text and add a comment that points to the shared file, in case someone happens upon the page in the repo--for example, in `/content/3/core/reference/influxql/regular-expressions.md`, add the following:
|
||||
|
||||
<!--
|
||||
The content of this page is at /content/shared/influxql-v3-reference/regular-expressions.md
|
||||
-->
|
|
@ -9,3 +9,6 @@ The `/shared` directory and all of its children are marked as draft so they
|
|||
don't get rendered when the site is built, but the contents of each shared
|
||||
documented is included in pages that use the file as a `source` in their
|
||||
frontmatter.
|
||||
|
||||
See the `/shared/README.md` for instructions on creating and using shared content.
|
||||
|
||||
|
|
|
@ -0,0 +1,414 @@
|
|||
When writing data to {{< product-name >}}, the InfluxDB v3 storage engine stores data in [Apache Parquet](https://parquet.apache.org/) format in the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store). Each Parquet file represents a _partition_--a logical grouping of data.
|
||||
By default, InfluxDB partitions each table _by day_.
|
||||
If this default strategy yields unsatisfactory performance for single-series queries,
|
||||
you can define a custom partitioning strategy by specifying tag values and different time intervals to optimize query performance for your specific schema and workload.
|
||||
|
||||
- [Advantages](#advantages)
|
||||
- [Disadvantages](#disadvantages)
|
||||
- [Limitations](#limitations)
|
||||
- [Plan for custom partitioning](#plan-for-custom-partitioning)
|
||||
- [How partitioning works](#how-partitioning-works)
|
||||
- [Partition templates](#partition-templates)
|
||||
- [Partition keys](#partition-keys)
|
||||
- [Partitions in the query life cycle](#partitions-in-the-query-life-cycle)
|
||||
- [Partition guides](#partition-guides)
|
||||
{{< children type="anchored-list" >}}
|
||||
|
||||
> [!Note]
|
||||
>
|
||||
> #### When to consider custom partitioning
|
||||
>
|
||||
> Consider custom partitioning if:
|
||||
>
|
||||
> 1. You have taken steps to [optimize your queries](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/optimize-queries/), and
|
||||
> 2. Performance for _single-series queries_ (querying for a specific [tag value](/influxdb/cloud-dedicated/reference/glossary/#tag-value) or [tag set](/influxdb/cloud-dedicated/reference/glossary/#tag-set)) is still unsatisfactory.
|
||||
>
|
||||
> Before choosing a partitioning strategy, weigh the [advantages](#advantages), [disadvantages](#disadvantages), and [limitations](#limitations) of custom partitioning.
|
||||
|
||||
## Advantages
|
||||
|
||||
The primary advantage of custom partitioning is that it lets you customize your
|
||||
storage structure to improve query performance specific to your schema and workload.
|
||||
|
||||
- **Optimized storage for improved performance on specific types of queries**.
|
||||
For example, if queries often select data with a specific tag value, you can
|
||||
partition by that tag to improve the performance of those queries.
|
||||
- **Optimized storage for specific types of data**. For example, if the data you
|
||||
store is sparse and the time ranges you query are often much larger than a day,
|
||||
you could partition your data by month instead of by day.
|
||||
|
||||
## Disadvantages
|
||||
|
||||
Using custom partitioning may increase the load on other parts of the
|
||||
[InfluxDB v3 storage engine](/influxdb/cloud-dedicated/reference/internals/storage-engine/),
|
||||
but you can scale each part individually to address the added load.
|
||||
|
||||
{{% note %}}
|
||||
_The weight of these disadvantages depends upon the cardinality of
|
||||
tags and the specificity of time intervals used for partitioning._
|
||||
{{% /note %}}
|
||||
|
||||
- **Increased load on the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester)**
|
||||
as it groups data into smaller partitions and files.
|
||||
- **Increased load on the [Catalog](/influxdb/cloud-dedicated/reference/internals/storage-engine/#catalog)**
|
||||
as more references to partition Parquet file locations are stored and queried.
|
||||
- **Increased load on the [Compactor](/influxdb/cloud-dedicated/reference/internals/storage-engine/#compactor)**
|
||||
as it needs to compact more partition Parquet files.
|
||||
- **Increased costs associated with [Object storage](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-storage)**
|
||||
as more partition Parquet files are created and stored.
|
||||
- **Increased latency**. The amount of time for InfluxDB to process a query and return results increases linearly, although slightly, with the total partition count for a table.
|
||||
- **Risk of decreased performance for queries that don't use tags in the WHERE clause**.
|
||||
These queries might read many partitions and smaller files, which can degrade performance.
|
||||
|
||||
## Limitations
|
||||
|
||||
Custom partitioning has the following limitations:
|
||||
|
||||
- Define database and table partitions only during creation; you can't update the partition strategy afterward.
|
||||
- Include a time part in a partition template.
|
||||
- You can partition by up to eight dimensions (seven tags and a time interval).
|
||||
|
||||
## Plan for custom partitioning
|
||||
|
||||
After you have considered the [advantages](#advantages), [disadvantages](#disadvantages), and [limitations](#limitations) of
|
||||
custom partitioning, use the guides in this section to:
|
||||
|
||||
1. Learn [how partitioning works](#how-partitioning-works)
|
||||
2. Follow [best practices](/influxdb/cloud-dedicated/admin/custom-partitions/best-practices/) for defining partitions and managing partition
|
||||
growth
|
||||
3. [Define custom partitions](/influxdb/cloud-dedicated/admin/custom-partitions/define-custom-partitions/) for your data
|
||||
4. Take steps to [limit the number of partition files](/influxdb/cloud-dedicated/admin/custom-partitions/best-practices/#limit-the-number-of-partition-files)
|
||||
|
||||
## How partitioning works
|
||||
|
||||
### Partition templates
|
||||
|
||||
A partition template defines the pattern used for _[partition keys](#partition-keys)_
|
||||
and determines the time interval that InfluxDB partitions data by.
|
||||
Partition templates use tag values and
|
||||
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html).
|
||||
|
||||
_For more detailed information, see [Partition templates](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/)._
|
||||
|
||||
### Partition keys
|
||||
|
||||
A partition key uniquely identifies a partition.
|
||||
A _[partition template](#partition-templates)_ defines the partition key format.
|
||||
Partition keys are
|
||||
composed of up to 8 dimensions (1 time part and up to 7 tag or tag bucket parts).
|
||||
A partition key uses the partition key separator (`|`) to delimit parts.
|
||||
|
||||
The default format for partition keys is `%Y-%m-%d` (for example, `2024-01-01`),
|
||||
which creates 1 partition for each day.
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "View example partition templates and keys" %}}
|
||||
|
||||
Given the following line protocol with the following timestamps:
|
||||
|
||||
- 2023-12-31T23:00:00Z
|
||||
- 2024-01-01T00:00:00Z
|
||||
- 2024-01-01T01:00:00Z
|
||||
|
||||
```text
|
||||
production,line=A,station=cnc temp=81.2,qty=35i 1704063600000000000
|
||||
production,line=A,station=wld temp=92.8,qty=35i 1704063600000000000
|
||||
production,line=B,station=cnc temp=101.1,qty=43i 1704063600000000000
|
||||
production,line=B,station=wld temp=102.4,qty=43i 1704063600000000000
|
||||
production,line=A,station=cnc temp=81.9,qty=36i 1704067200000000000
|
||||
production,line=A,station=wld temp=110.0,qty=22i 1704067200000000000
|
||||
production,line=B,station=cnc temp=101.8,qty=44i 1704067200000000000
|
||||
production,line=B,station=wld temp=105.7,qty=44i 1704067200000000000
|
||||
production,line=A,station=cnc temp=82.2,qty=35i 1704070800000000000
|
||||
production,line=A,station=wld temp=92.1,qty=30i 1704070800000000000
|
||||
production,line=B,station=cnc temp=102.4,qty=43i 1704070800000000000
|
||||
production,line=B,station=wld temp=106.5,qty=43i 1704070800000000000
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 1 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `2023-12-31`
|
||||
- `2024-01-01`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 1 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 2 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `line` <em class="op50">tag</em>
|
||||
- `%d %b %Y` <em class="op50">time (by day, non-default format)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `A | 31 Dec 2023`
|
||||
- `B | 31 Dec 2023`
|
||||
- `A | 01 Jan 2024`
|
||||
- `B | 01 Jan 2024`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 2 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 3 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `line` <em class="op50">tag</em>
|
||||
- `station` <em class="op50">tag</em>
|
||||
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `A | cnc | 2023-12-31`
|
||||
- `A | wld | 2023-12-31`
|
||||
- `B | cnc | 2023-12-31`
|
||||
- `B | wld | 2023-12-31`
|
||||
- `A | cnc | 2024-01-01`
|
||||
- `A | wld | 2024-01-01`
|
||||
- `B | cnc | 2024-01-01`
|
||||
- `B | wld | 2024-01-01`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 3 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 4 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `line` <em class="op50">tag</em>
|
||||
- `station,3` <em class="op50">tag bucket</em>
|
||||
- `%Y-%m-%d` <em class="op50">time (by day, default format)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `A | 0 | 2023-12-31`
|
||||
- `B | 0 | 2023-12-31`
|
||||
- `A | 0 | 2024-01-01`
|
||||
- `B | 0 | 2024-01-01`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 4 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 5 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `line` <em class="op50">tag</em>
|
||||
- `station` <em class="op50">tag</em>
|
||||
- `%Y-%m` <em class="op50">time (by month)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `A | cnc | 2023-12`
|
||||
- `A | wld | 2023-12`
|
||||
- `B | cnc | 2023-12`
|
||||
- `B | wld | 2023-12`
|
||||
- `A | cnc | 2024-01`
|
||||
- `A | wld | 2024-01`
|
||||
- `B | cnc | 2024-01`
|
||||
- `B | wld | 2024-01`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 5 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
---
|
||||
|
||||
{{% flex %}}
|
||||
|
||||
<!---------------------- BEGIN PARTITION EXAMPLES GROUP 6 --------------------->
|
||||
|
||||
{{% flex-content "half" %}}
|
||||
|
||||
##### Partition template parts
|
||||
|
||||
- `line` <em class="op50">tag</em>
|
||||
- `station,50` <em class="op50">tag bucket</em>
|
||||
- `%Y-%m` <em class="op50">time (by month)</em>
|
||||
|
||||
{{% /flex-content %}}
|
||||
{{% flex-content %}}
|
||||
|
||||
##### Partition keys
|
||||
|
||||
- `A | 47 | 2023-12`
|
||||
- `A | 9 | 2023-12`
|
||||
- `B | 47 | 2023-12`
|
||||
- `B | 9 | 2023-12`
|
||||
- `A | 47 | 2024-01`
|
||||
- `A | 9 | 2024-01`
|
||||
- `B | 47 | 2024-01`
|
||||
- `B | 9 | 2024-01`
|
||||
|
||||
{{% /flex-content %}}
|
||||
|
||||
<!----------------------- END PARTITION EXAMPLES GROUP 6 ---------------------->
|
||||
|
||||
{{% /flex %}}
|
||||
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
## Partitions in the query life cycle
|
||||
|
||||
When querying data:
|
||||
|
||||
1. The [Catalog](/influxdb/cloud-dedicated/reference/internals/storage-engine/#catalog)
|
||||
provides the v3 query engine ([Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier))
|
||||
with the locations of partitions that contain the queried time series data.
|
||||
2. The query engine reads all rows in the returned partitions to identify what
|
||||
rows match the logic in the query and should be included in the query result.
|
||||
|
||||
The faster the query engine can identify what partitions to read and then read
|
||||
the data in those partitions, the more performant queries are.
|
||||
|
||||
_For more information about the query lifecycle, see
|
||||
[InfluxDB v3 query life cycle](/influxdb/cloud-dedicated/reference/internals/storage-engine/#query-life-cycle)._
|
||||
|
||||
##### Query example
|
||||
|
||||
Consider the following query that selects everything in the `production` table
|
||||
where the `line` tag is `A` and the `station` tag is `cnc`:
|
||||
|
||||
```sql
|
||||
SELECT *
|
||||
FROM production
|
||||
WHERE
|
||||
time >= now() - INTERVAL '1 week'
|
||||
AND line = 'A'
|
||||
AND station = 'cnc'
|
||||
```
|
||||
|
||||
Using the default partitioning strategy (by day), the query engine
|
||||
reads eight separate partitions (one partition for today and one for each of the
|
||||
last seven days):
|
||||
|
||||
- {{< datetime/current-date trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-1 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-2 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-3 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-4 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-5 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-6 trimTime=true >}}
|
||||
- {{< datetime/current-date offset=-7 trimTime=true >}}
|
||||
|
||||
The query engine must scan _all_ rows in the partitions to identify rows
|
||||
where `line` is `A` and `station` is `cnc`. This process takes valuable time
|
||||
and results in less performant queries.
|
||||
|
||||
However, including tags in your partitioning strategy allows the query engine to
|
||||
identify partitions containing only the required tag values.
|
||||
This avoids scanning rows for tag values.
|
||||
|
||||
For example, if you partition data by `line`, `station`, and day, although
|
||||
the number of files increases, the query engine can quickly identify and read
|
||||
only those with data relevant to the query:
|
||||
|
||||
{{% columns 4 %}}
|
||||
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-1 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-1 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-1 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-1 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-2 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-2 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-2 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-2 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-3 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-3 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-3 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-3 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-4 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-4 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-4 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-4 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-5 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-5 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-5 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-5 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-6 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-6 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-6 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-6 trimTime=true >}}
|
||||
- <strong class="req normal green">A | cnc | {{< datetime/current-date offset=-7 trimTime=true >}}</strong>
|
||||
- A | wld | {{< datetime/current-date offset=-7 trimTime=true >}}
|
||||
- B | cnc | {{< datetime/current-date offset=-7 trimTime=true >}}
|
||||
- B | wld | {{< datetime/current-date offset=-7 trimTime=true >}}
|
||||
|
||||
{{% /columns %}}
|
||||
|
||||
---
|
||||
|
||||
## Partition guides
|
||||
|
||||
{{< children >}}
|
|
@ -0,0 +1,78 @@
|
|||
Use the following best practices when defining custom partitioning strategies
|
||||
for your data stored in {{< product-name >}}.
|
||||
|
||||
- [Partition by tags that you commonly query for a specific value](#partition-by-tags-that-you-commonly-query-for-a-specific-value)
|
||||
- [Only partition by tags that _always_ have a value](#only-partition-by-tags-that-always-have-a-value)
|
||||
- [Avoid over-partitioning](#avoid-over-partitioning)
|
||||
- [Limit the number of partition files](#limit-the-number-of-partition-files)
|
||||
- [Estimate the total partition count](#estimate-the-total-partition-count)
|
||||
|
||||
## Partition by tags that you commonly query for a specific value
|
||||
|
||||
Custom partitioning primarily benefits single series queries that look for a specific tag
|
||||
value in the `WHERE` clause.
|
||||
For example, if you often query data related to a
|
||||
specific ID, partitioning by the tag that stores the ID helps the InfluxDB
|
||||
query engine to more quickly identify what partitions contain the relevant data.
|
||||
|
||||
{{% note %}}
|
||||
|
||||
#### Use tag buckets for high-cardinality tags
|
||||
|
||||
Partitioning using distinct values of tags with many (10K+) unique values can
|
||||
actually hurt query performance as partitions are created for each unique tag value.
|
||||
Instead, use [tag buckets](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#tag-bucket-part-templates)
|
||||
to partition by high-cardinality tags.
|
||||
This method of partitioning groups tag values into "buckets" and partitions by bucket.
|
||||
{{% /note %}}
|
||||
|
||||
## Only partition by tags that _always_ have a value
|
||||
|
||||
You should only partition by tags that _always_ have a value.
|
||||
If points don't have a value for the tag, InfluxDB can't store them in the correct partitions and, at query time, must read all the partitions.
|
||||
|
||||
## Avoid over-partitioning
|
||||
|
||||
As you plan your partitioning strategy, keep in mind that data can be
|
||||
"over-partitioned"--meaning partitions are so granular that queries end up
|
||||
having to retrieve and read many partitions from the object store, which
|
||||
hurts query performance.
|
||||
|
||||
- Balance the partition time interval with the actual amount of data written
|
||||
during each interval. If a single interval doesn't contain a lot of data,
|
||||
it is better to partition by larger time intervals.
|
||||
- Don't partition by tags that you typically don't use in your query workload.
|
||||
- Don't partition by distinct values of high-cardinality tags.
|
||||
Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to
|
||||
partition by these tags.
|
||||
|
||||
## Limit the number of partition files
|
||||
|
||||
Avoid exceeding **10,000** total partition files.
|
||||
Limiting the total partition count can help manage system performance and costs.
|
||||
|
||||
While planning your strategy include the following steps to keep the total
|
||||
partition count below 10,000 files over the next few years:
|
||||
|
||||
- [Estimate the total partition count](#estimate-the-total-partition-count) for the lifespan of your data
|
||||
- Take the following steps to limit the total partition count:
|
||||
|
||||
- **Set a [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period)**
|
||||
to prevent the number of files from growing unbounded.
|
||||
- **Partition by month or year** to [avoid over-partitioning](#avoid-over-partitioning)
|
||||
and creating too many partition files.
|
||||
- **Don't partition on high cardinality tags** unless you also use [tag buckets](#use-tag-buckets-for-high-cardinality-tags)
|
||||
|
||||
### Estimate the total partition count
|
||||
|
||||
Use the following formula to estimate the total partition file count over the
|
||||
lifetime of the database (or retention period):
|
||||
|
||||
```text
|
||||
total_partition_count = (cardinality_of_partitioned_tag) * (data_lifespan / partition_duration)
|
||||
```
|
||||
|
||||
- `total_partition_count`: The number of partition files in [Object storage](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-storage)
|
||||
- `cardinality_of_partitioned_tag`: The number of distinct values for a tag
|
||||
- `data_lifespan`: The [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period), if set, or the expected lifetime of the database
|
||||
- `partition_duration`: The partition time interval, defined by the [tine part template](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates)
|
|
@ -0,0 +1,156 @@
|
|||
Use the [`influxctl` CLI](/influxdb/cloud-dedicated/reference/cli/influxctl/)
|
||||
to define custom partition strategies when creating a database or table.
|
||||
By default, {{< product-name >}} partitions data by day.
|
||||
|
||||
The partitioning strategy of a database or table is determined by a
|
||||
[partition template](/influxdb/cloud-dedicated/admin/custom-partitions/#partition-templates)
|
||||
which defines the naming pattern for [partition keys](/influxdb/cloud-dedicated/admin/custom-partitions/#partition-keys).
|
||||
Partition keys uniquely identify each partition.
|
||||
When a partition template is applied to a database, it becomes the default template
|
||||
for all tables in that database, but can be overridden when creating a
|
||||
table.
|
||||
|
||||
- [Create a database with a custom partition template](#create-a-database-with-a-custom-partition-template)
|
||||
- [Create a table with a custom partition template](#create-a-table-with-a-custom-partition-template)
|
||||
- [Example partition templates](#example-partition-templates)
|
||||
|
||||
{{% warn %}}
|
||||
|
||||
#### Partition templates can only be applied on create
|
||||
|
||||
You can only apply a partition template when creating a database or table.
|
||||
You can't update a partition template on an existing resource.
|
||||
{{% /warn %}}
|
||||
|
||||
Use the following command flags to identify
|
||||
[partition template parts](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#tag-part-templates):
|
||||
|
||||
- `--template-tag`: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
|
||||
to use in the partition template.
|
||||
- `--template-tag-bucket`: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
|
||||
and number of "buckets" to group tag values into.
|
||||
Provide the tag key and the number of buckets to bucket tag values into
|
||||
separated by a comma: `tagKey,N`.
|
||||
- `--template-timeformat`: A [Rust strftime date and time](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates)
|
||||
string that specifies the time format in the partition template and determines
|
||||
the time interval to partition by.
|
||||
|
||||
{{% note %}}
|
||||
A partition template can include up to 7 total tag and tag bucket parts
|
||||
and only 1 time part.
|
||||
{{% /note %}}
|
||||
|
||||
_View [partition template part restrictions](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#restrictions)._
|
||||
|
||||
{{% note %}}
|
||||
#### Always provide a time format when using custom partitioning
|
||||
|
||||
When defining a custom partition template for your database or table using any
|
||||
of the `influxctl` `--template-*` flags, always include the `--template-timeformat`
|
||||
flag with a time format to use in your partition template.
|
||||
Otherwise, InfluxDB omits time from the partition template and won't compact partitions.
|
||||
{{% /note %}}
|
||||
|
||||
## Create a database with a custom partition template
|
||||
|
||||
The following example creates a new `example-db` database and applies a partition
|
||||
template that partitions by distinct values of two tags (`room` and `sensor-type`),
|
||||
bucketed values of the `customerID` tag, and by day using the time format `%Y-%m-%d`:
|
||||
|
||||
<!--Skip database create and delete tests: namespaces aren't reusable-->
|
||||
<!--pytest.mark.skip-->
|
||||
|
||||
```sh
|
||||
influxctl database create \
|
||||
--template-tag room \
|
||||
--template-tag sensor-type \
|
||||
--template-tag-bucket customerID,500 \
|
||||
--template-timeformat '%Y-%m-%d' \
|
||||
example-db
|
||||
```
|
||||
|
||||
## Create a table with a custom partition template
|
||||
|
||||
The following example creates a new `example-table` table in the specified
|
||||
database and applies a partition template that partitions by distinct values of
|
||||
two tags (`room` and `sensor-type`), bucketed values of the `customerID` tag,
|
||||
and by month using the time format `%Y-%m`:
|
||||
|
||||
<!--Skip database create and delete tests: namespaces aren't reusable-->
|
||||
<!--pytest.mark.skip-->
|
||||
|
||||
{{% code-placeholders "DATABASE_NAME" %}}
|
||||
|
||||
```sh
|
||||
influxctl table create \
|
||||
--template-tag room \
|
||||
--template-tag sensor-type \
|
||||
--template-tag-bucket customerID,500 \
|
||||
--template-timeformat '%Y-%m' \
|
||||
DATABASE_NAME \
|
||||
example-table
|
||||
```
|
||||
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
Replace the following in your command:
|
||||
|
||||
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} [database](/influxdb/cloud-dedicated/admin/databases/)
|
||||
|
||||
<!--actual test
|
||||
|
||||
```sh
|
||||
|
||||
# Test the preceding command outside of the code block.
|
||||
# influxctl authentication requires TTY interaction--
|
||||
# output the auth URL to a file that the host can open.
|
||||
|
||||
TABLE_NAME=table_TEST_RUN
|
||||
script -c "influxctl table create \
|
||||
--template-tag room \
|
||||
--template-tag sensor-type \
|
||||
--template-tag-bucket customerID,500 \
|
||||
--template-timeformat '%Y-%m' \
|
||||
DATABASE_NAME \
|
||||
$TABLE_NAME" \
|
||||
/dev/null > /shared/urls.txt
|
||||
|
||||
script -c "influxctl query \
|
||||
--database DATABASE_NAME \
|
||||
--token DATABASE_TOKEN \
|
||||
'SHOW TABLES'" > /shared/temp_tables.txt
|
||||
grep -q $TABLE_NAME /shared/temp_tables.txt
|
||||
rm /shared/temp_tables.txt
|
||||
```
|
||||
|
||||
-->
|
||||
|
||||
## Example partition templates
|
||||
|
||||
Given the following [line protocol](/influxdb/cloud-dedicated/reference/syntax/line-protocol/)
|
||||
with a `2024-01-01T00:00:00Z` timestamp:
|
||||
|
||||
```text
|
||||
prod,line=A,station=weld1 temp=81.9,qty=36i 1704067200000000000
|
||||
```
|
||||
|
||||
##### Partitioning by distinct tag values
|
||||
|
||||
| Description | Tag parts | Time part | Resulting partition key |
|
||||
| :---------------------- | :---------------- | :--------- | :----------------------- |
|
||||
| By day (default) | | `%Y-%m-%d` | 2024-01-01 |
|
||||
| By month | | `%Y-%m` | 2024-01 |
|
||||
| By year | | `%Y` | 2024 |
|
||||
| Single tag, by day | `line` | `%Y-%m-%d` | A \| 2024-01-01 |
|
||||
| Single tag, by month | `line` | `%Y-%m` | A \| 2024-01 |
|
||||
| Single tag, by year | `line` | `%Y` | A \| 2024 |
|
||||
| Multiple tags, by day | `line`, `station` | `%Y-%m-%d` | A \| weld1 \| 2024-01-01 |
|
||||
| Multiple tags, by month | `line`, `station` | `%Y-%m` | A \| weld1 \| 2024-01 |
|
||||
| Multiple tags, by year | `line`, `station` | `%Y` | A \| weld1 \| 2024 |
|
||||
|
||||
##### Partition by tag buckets
|
||||
|
||||
| Description | Tag part | Tag bucket part | Time part | Resulting partition key |
|
||||
| :---------------------------------- | :------- | :-------------- | :--------- | :---------------------- |
|
||||
| Distinct tag, tag buckets, by day | `line` | `station,100` | `%Y-%m-%d` | A \| 3 \| 2024-01-01 |
|
||||
| Distinct tag, tag buckets, by month | `line` | `station,500` | `%Y-%m` | A \| 303 \| 2024-01 |
|
|
@ -0,0 +1,124 @@
|
|||
Use partition templates to define the patterns used to generate partition keys.
|
||||
A partition key uniquely identifies a partition and is used to name the partition
|
||||
Parquet file in the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store).
|
||||
|
||||
A partition template consists of 1-8 _template parts_---dimensions to partition data by.
|
||||
Three types of template parts exist:
|
||||
|
||||
- **tag**: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
|
||||
to partition by.
|
||||
- **tag bucket**: An [InfluxDB tag](/influxdb/cloud-dedicated/reference/glossary/#tag)
|
||||
and number of "buckets" to group tag values into. Data is partitioned by the
|
||||
tag bucket rather than each distinct tag value.
|
||||
- {{< req type="key" >}} **time**: A Rust strftime date and time string that specifies the time interval
|
||||
to partition data by. The smallest unit of time included in the time part
|
||||
template is the interval used to partition data.
|
||||
|
||||
{{% note %}}
|
||||
A partition template must include 1 [time part](#time-part-templates)
|
||||
and can include up to 7 total [tag](#tag-part-templates) and [tag bucket](#tag-bucket-part-templates) parts.
|
||||
{{% /note %}}
|
||||
|
||||
<!-- TOC -->
|
||||
- [Restrictions](#restrictions)
|
||||
- [Template part size limit](#template-part-size-limit)
|
||||
- [Reserved keywords](#reserved-keywords)
|
||||
- [Reserved Characters](#reserved-characters)
|
||||
- [Tag part templates](#tag-part-templates)
|
||||
- [Tag bucket part templates](#tag-bucket-part-templates)
|
||||
- [Time part templates](#time-part-templates)
|
||||
<!-- /TOC -->
|
||||
|
||||
## Restrictions
|
||||
|
||||
### Template part size limit
|
||||
|
||||
Each template part is limited to 200 bytes in length.
|
||||
Anything longer will be truncated at 200 bytes and appended with `#`.
|
||||
|
||||
### Partition key size limit
|
||||
|
||||
With the truncation of template parts, the maximum length of a partition key is
|
||||
1,607 bytes (1.57 KiB).
|
||||
|
||||
### Reserved keywords
|
||||
|
||||
The following reserved keywords cannot be used in partition templates:
|
||||
|
||||
- `time`
|
||||
|
||||
### Reserved Characters
|
||||
|
||||
If used in template parts, non-ASCII characters and the following reserved
|
||||
characters must be [percent encoded](https://developer.mozilla.org/en-US/docs/Glossary/Percent-encoding):
|
||||
|
||||
- `|`: Partition key part delimiter
|
||||
- `!`: Null or missing partition key part
|
||||
- `^`: Empty string partition key part
|
||||
- `#`: Key part truncation marker
|
||||
- `%`: Required for unambiguous reversal of percent encoding
|
||||
|
||||
## Tag part templates
|
||||
|
||||
Tag part templates consist of a _tag key_ to partition by.
|
||||
Generated partition keys include the unique _tag value_ specific to each partition.
|
||||
|
||||
A partition template may include a given tag key only once in template parts
|
||||
that operate on tags (tag value and tag bucket)--for example:
|
||||
|
||||
If a template partitions on unique values of `tag_A`, then
|
||||
you can't use `tag_A` as a tag bucket part.
|
||||
|
||||
## Tag bucket part templates
|
||||
|
||||
Tag bucket part templates consist of a _tag key_ to partition by and the
|
||||
_number of "buckets" to partition tag values into_--for example:
|
||||
|
||||
```
|
||||
customerID,500
|
||||
```
|
||||
|
||||
Values of the `customerID` tag are bucketed into 500 distinct "buckets."
|
||||
Each bucket is identified by the remainder of the tag value hashed into a 32bit
|
||||
integer divided by the specified number of buckets:
|
||||
|
||||
```rust
|
||||
hash(tagValue) % N
|
||||
```
|
||||
|
||||
Generated partition keys include the unique _tag bucket identifier_ specific to
|
||||
each partition.
|
||||
|
||||
**Supported number of tag buckets**: 1-1,000
|
||||
|
||||
{{% note %}}
|
||||
Tag buckets should be used to partition by high cardinality tags or tags with an
|
||||
unknown number of distinct values.
|
||||
{{% /note %}}
|
||||
|
||||
A partition template may include a given tag key only once in template parts
|
||||
that operate on tags (tag value and tag bucket)--for example:
|
||||
|
||||
If a template partitions on unique values of `tag_A`, then
|
||||
you can't use `tag_A` as a tag bucket part.
|
||||
|
||||
## Time part templates
|
||||
|
||||
Time part templates use a limited subset of the
|
||||
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)
|
||||
to specify time format in partition keys.
|
||||
Time part templates can be daily (`%Y-%m-%d`), monthly (`%Y-%m`), or yearly (`%Y`).
|
||||
InfluxDB uses the smallest unit of time included in the time part template as
|
||||
the partition interval.
|
||||
|
||||
InfluxDB supports only [date specifiers](#date-specifiers) in time part templates.
|
||||
|
||||
### Date specifiers
|
||||
|
||||
Time part templates allow only the following date specifiers:
|
||||
|
||||
| Variable | Example | Description |
|
||||
| :------: | :----------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `%Y` | `2001` | The full proleptic Gregorian year, zero-padded to 4 digits. chrono supports years from -262144 to 262143. Note: years before 1 BCE or after 9999 CE, require an initial sign (+/-). |
|
||||
| `%m` | `07` | Month number (01--12), zero-padded to 2 digits. |
|
||||
| `%d` | `08` | Day number (01--31), zero-padded to 2 digits. |
|
|
@ -0,0 +1,169 @@
|
|||
<!--Allow shortcode-->
|
||||
{{< product-name >}} stores partition information in InfluxDB v3 system tables.
|
||||
Query partition information to view partition templates and verify partitions
|
||||
are working as intended.
|
||||
|
||||
- [Query partition information from system tables](#query-partition-information-from-system-tables)
|
||||
- [Partition-related queries](#partition-related-queries)
|
||||
|
||||
{{% warn %}}
|
||||
#### Querying system tables may impact overall cluster performance
|
||||
|
||||
Partition information is stored in InfluxDB v3 system tables.
|
||||
Querying system tables may impact the overall write and query performance of
|
||||
your {{< product-name omit=" Clustered" >}} cluster.
|
||||
|
||||
<!--------------- UPDATE THE DATE BELOW AS EXAMPLES ARE UPDATED --------------->
|
||||
|
||||
#### System tables are subject to change
|
||||
|
||||
System tables are not part of InfluxDB's stable API and may change with new releases.
|
||||
The provided schema information and query examples are valid as of **September 24, 2024**.
|
||||
If you detect a schema change or a non-functioning query example, please
|
||||
[submit an issue](https://github.com/influxdata/docs-v2/issues/new/choose).
|
||||
|
||||
<!--------------- UPDATE THE DATE ABOVE AS EXAMPLES ARE UPDATED --------------->
|
||||
|
||||
{{% /warn %}}
|
||||
|
||||
## Query partition information from system tables
|
||||
|
||||
Use the [`influxctl query` command](/influxdb/cloud-dedicated/reference/cli/influxctl/query/)
|
||||
and SQL to query partition-related information from InfluxDB system tables.
|
||||
Provide the following:
|
||||
|
||||
- **Enable system tables** with the `--enable-system-tables` command flag.
|
||||
- **Database token**: A [database token](/influxdb/cloud-dedicated/admin/tokens/#database-tokens)
|
||||
with read permissions on the specified database. Uses the `token` setting from
|
||||
the [`influxctl` connection profile](/influxdb/cloud-dedicated/reference/cli/influxctl/#configure-connection-profiles)
|
||||
or the `--token` command flag.
|
||||
- **Database name**: The name of the database to query information about.
|
||||
Uses the `database` setting from the
|
||||
[`influxctl` connection profile](/influxdb/cloud-dedicated/reference/cli/influxctl/#configure-connection-profiles)
|
||||
or the `--database` command flag.
|
||||
- **SQL query**: The SQL query to execute.
|
||||
Pass the query in one of the following ways:
|
||||
|
||||
- a string on the command line
|
||||
- a path to a file that contains the query
|
||||
- a single dash (`-`) to read the query from stdin
|
||||
|
||||
{{% code-placeholders "DATABASE_(TOKEN|NAME)|SQL_QUERY" %}}
|
||||
|
||||
```bash
|
||||
influxctl query \
|
||||
--enable-system-tables \
|
||||
--database DATABASE_NAME \
|
||||
--token DATABASE_TOKEN \
|
||||
"SQL_QUERY"
|
||||
```
|
||||
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
Replace the following:
|
||||
|
||||
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}:
|
||||
A database token with read access to the specified database
|
||||
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}:
|
||||
The name of the database to query information about.
|
||||
- {{% code-placeholder-key %}}`SQL_QUERY`{{% /code-placeholder-key %}}:
|
||||
The SQL query to execute. For examples, see
|
||||
[System query examples](#system-query-examples).
|
||||
|
||||
When prompted, enter `y` to acknowledge the potential impact querying system
|
||||
tables may have on your cluster.
|
||||
|
||||
## Partition-related queries
|
||||
|
||||
Use the following queries to return information about partitions in your
|
||||
{{< product-name omit=" Clustered" >}} cluster.
|
||||
|
||||
- [View partition templates of all tables](#view-partition-templates-of-all-tables)
|
||||
- [View the partition template of a specific table](#view-the-partition-template-of-a-specific-table)
|
||||
- [View all partitions for a table](#view-all-partitions-for-a-table)
|
||||
- [View the number of partitions per table](#view-the-number-of-partitions-per-table)
|
||||
- [View the number of partitions for a specific table](#view-the-number-of-partitions-for-a-specific-table)
|
||||
|
||||
---
|
||||
|
||||
In the examples below, replace {{% code-placeholder-key %}}`TABLE_NAME`{{% /code-placeholder-key %}}
|
||||
with the name of the table you want to query information about.
|
||||
|
||||
---
|
||||
|
||||
{{% code-placeholders "TABLE_NAME_(1|2|3)|TABLE_NAME" %}}
|
||||
|
||||
### View the partition template of a specific table
|
||||
|
||||
```sql
|
||||
SELECT * FROM system.tables WHERE table_name = 'TABLE_NAME'
|
||||
```
|
||||
|
||||
#### Example results
|
||||
|
||||
| table_name | partition_template |
|
||||
| :--------- | :----------------------------------------------------------------------------------------- |
|
||||
| weather | `{"parts":[{"timeFormat":"%Y-%m-%d"},{"bucket":{"tagName":"location","numBuckets":250}}]}` |
|
||||
|
||||
{{% note %}}
|
||||
If a table doesn't include a partition template in the output of this command,
|
||||
the table uses the default (1 day) partition strategy and doesn't partition
|
||||
by tags or tag buckets.
|
||||
{{% /note %}}
|
||||
|
||||
### View all partitions for a table
|
||||
|
||||
```sql
|
||||
SELECT * FROM system.partitions WHERE table_name = 'TABLE_NAME'
|
||||
```
|
||||
|
||||
### Example results
|
||||
|
||||
| partition_id | table_name | partition_key | last_new_file_created_at | num_files | total_size_mb |
|
||||
| -----------: | :--------- | :---------------- | -----------------------: | --------: | ------------: |
|
||||
| 1362 | weather | 43 \| 2020-05-27 | 1683747418763813713 | 1 | 0 |
|
||||
| 800 | weather | 234 \| 2021-08-02 | 1683747421899400796 | 1 | 0 |
|
||||
| 630 | weather | 325 \| 2022-03-17 | 1683747417616689036 | 1 | 0 |
|
||||
| 1401 | weather | 12 \| 2021-01-09 | 1683747417786122295 | 1 | 0 |
|
||||
| 1012 | weather | 115 \| 2022-07-04 | 1683747417614219148 | 1 | 0 |
|
||||
|
||||
### View the number of partitions per table
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
table_name,
|
||||
COUNT(*) AS partition_count
|
||||
FROM
|
||||
system.partitions
|
||||
WHERE
|
||||
table_name IN ('TABLE_NAME_1', 'TABLE_NAME_2', 'TABLE_NAME_3')
|
||||
GROUP BY
|
||||
table_name
|
||||
```
|
||||
|
||||
### Example results
|
||||
|
||||
| table_name | partition_count |
|
||||
| :--------- | --------------: |
|
||||
| weather | 1096 |
|
||||
| home | 24 |
|
||||
| numbers | 1 |
|
||||
|
||||
### View the number of partitions for a specific table
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
COUNT(*) AS partition_count
|
||||
FROM
|
||||
system.partitions
|
||||
WHERE
|
||||
table_name = 'TABLE_NAME'
|
||||
```
|
||||
|
||||
### Example results
|
||||
|
||||
| table_name | partition_count |
|
||||
| :--------- | --------------: |
|
||||
| weather | 1096 |
|
||||
|
||||
{{% /code-placeholders %}}
|
Loading…
Reference in New Issue