fix(partitioning): improve clarity and consistency in partitioning, apply suggestions from @reidkaufmann

pull/5741/head
Jason Stirnaman 2025-01-09 15:26:43 -06:00
parent 37183896f4
commit d961c8eeae
2 changed files with 17 additions and 26 deletions

View File

@ -33,39 +33,30 @@ If points don't have a value for the tag, InfluxDB can't store them in the corre
## Avoid over-partitioning
As you plan your partitioning strategy, keep in mind that data can be
"over-partitioned"--meaning partitions are so granular that queries end up
having to retrieve and read many partitions from the object store, which
hurts query performance.
As you plan your partitioning strategy, keep in mind that over-partitioning your data can hurt query performance. If partitions are too granular, queries may need to retrieve and read many partitions from the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store).
- Balance the partition time interval with the actual amount of data written
during each interval. If a single interval doesn't contain a lot of data,
it is better to partition by larger time intervals.
- Don't partition by tags that you typically don't use in your query workload.
- Don't partition by distinct values of high-cardinality tags.
Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to
partition by these tags.
- Balance the partition time interval with the actual amount of data written during each interval. If a single interval doesn't contain a lot of data, partition by larger time intervals.
- Avoid partitioning by tags that you typically don't use in your query workload.
- Avoid partitioning by distinct values of high-cardinality tags. Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to partition by these tags.
## Limit the number of partition files
Avoid exceeding **10,000** total partition files.
Avoid exceeding **10,000** total partitions.
Limiting the total partition count can help manage system performance and costs.
While planning your strategy include the following steps to keep the total
partition count below 10,000 files over the next few years:
While planning your strategy, take the following steps to limit your total
partition count.
We currently recommend planning to keep the total partition count below 10,000.
- [Estimate the total partition count](#estimate-the-total-partition-count) for the lifespan of your data
- Take the following steps to limit the total partition count:
- **Set a [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period)**
to prevent the number of files from growing unbounded.
- **Partition by month or year** to [avoid over-partitioning](#avoid-over-partitioning)
and creating too many partition files.
- **Don't partition on high cardinality tags** unless you also use [tag buckets](#use-tag-buckets-for-high-cardinality-tags)
- **Set a [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period)**
to prevent the number of partitions from growing unbounded
- **Partition by month or year** to [avoid over-partitioning](#avoid-over-partitioning)
- **Don't partition on high cardinality tags** unless you also use [tag buckets](#use-tag-buckets-for-high-cardinality-tags)
### Estimate the total partition count
Use the following formula to estimate the total partition file count over the
Use the following formula to estimate the total partition count over the
lifetime of the database (or retention period):
```text
@ -75,4 +66,4 @@ total_partition_count = (cardinality_of_partitioned_tag) * (data_lifespan / part
- `total_partition_count`: The number of partition files in [Object storage](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-storage)
- `cardinality_of_partitioned_tag`: The number of distinct values for a tag
- `data_lifespan`: The [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period), if set, or the expected lifetime of the database
- `partition_duration`: The partition time interval, defined by the [tine part template](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates)
- `partition_duration`: The partition time interval, defined by the [time part template](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates)

View File

@ -79,7 +79,7 @@ customerID,500
```
Values of the `customerID` tag are bucketed into 500 distinct "buckets."
Each bucket is identified by the remainder of the tag value hashed into a 32bit
Each bucket is identified by the remainder of the tag value hashed into a 32-bit
integer divided by the specified number of buckets:
```rust
@ -108,8 +108,8 @@ Time part templates use a limited subset of the
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)
to specify time format in partition keys.
Time part templates can be daily (`%Y-%m-%d`), monthly (`%Y-%m`), or yearly (`%Y`).
InfluxDB uses the smallest unit of time included in the time part template as
the partition interval.
InfluxDB partitions data by the smallest unit of time included in the time part
template.
InfluxDB supports only [date specifiers](#date-specifiers) in time part templates.