fix(partitioning): improve clarity and consistency in partitioning, apply suggestions from @reidkaufmann
parent
37183896f4
commit
d961c8eeae
|
@ -33,39 +33,30 @@ If points don't have a value for the tag, InfluxDB can't store them in the corre
|
|||
|
||||
## Avoid over-partitioning
|
||||
|
||||
As you plan your partitioning strategy, keep in mind that data can be
|
||||
"over-partitioned"--meaning partitions are so granular that queries end up
|
||||
having to retrieve and read many partitions from the object store, which
|
||||
hurts query performance.
|
||||
As you plan your partitioning strategy, keep in mind that over-partitioning your data can hurt query performance. If partitions are too granular, queries may need to retrieve and read many partitions from the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store).
|
||||
|
||||
- Balance the partition time interval with the actual amount of data written
|
||||
during each interval. If a single interval doesn't contain a lot of data,
|
||||
it is better to partition by larger time intervals.
|
||||
- Don't partition by tags that you typically don't use in your query workload.
|
||||
- Don't partition by distinct values of high-cardinality tags.
|
||||
Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to
|
||||
partition by these tags.
|
||||
- Balance the partition time interval with the actual amount of data written during each interval. If a single interval doesn't contain a lot of data, partition by larger time intervals.
|
||||
- Avoid partitioning by tags that you typically don't use in your query workload.
|
||||
- Avoid partitioning by distinct values of high-cardinality tags. Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to partition by these tags.
|
||||
|
||||
## Limit the number of partition files
|
||||
|
||||
Avoid exceeding **10,000** total partition files.
|
||||
Avoid exceeding **10,000** total partitions.
|
||||
Limiting the total partition count can help manage system performance and costs.
|
||||
|
||||
While planning your strategy include the following steps to keep the total
|
||||
partition count below 10,000 files over the next few years:
|
||||
While planning your strategy, take the following steps to limit your total
|
||||
partition count.
|
||||
We currently recommend planning to keep the total partition count below 10,000.
|
||||
|
||||
- [Estimate the total partition count](#estimate-the-total-partition-count) for the lifespan of your data
|
||||
- Take the following steps to limit the total partition count:
|
||||
|
||||
- **Set a [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period)**
|
||||
to prevent the number of files from growing unbounded.
|
||||
- **Partition by month or year** to [avoid over-partitioning](#avoid-over-partitioning)
|
||||
and creating too many partition files.
|
||||
- **Don't partition on high cardinality tags** unless you also use [tag buckets](#use-tag-buckets-for-high-cardinality-tags)
|
||||
- **Set a [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period)**
|
||||
to prevent the number of partitions from growing unbounded
|
||||
- **Partition by month or year** to [avoid over-partitioning](#avoid-over-partitioning)
|
||||
- **Don't partition on high cardinality tags** unless you also use [tag buckets](#use-tag-buckets-for-high-cardinality-tags)
|
||||
|
||||
### Estimate the total partition count
|
||||
|
||||
Use the following formula to estimate the total partition file count over the
|
||||
Use the following formula to estimate the total partition count over the
|
||||
lifetime of the database (or retention period):
|
||||
|
||||
```text
|
||||
|
@ -75,4 +66,4 @@ total_partition_count = (cardinality_of_partitioned_tag) * (data_lifespan / part
|
|||
- `total_partition_count`: The number of partition files in [Object storage](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-storage)
|
||||
- `cardinality_of_partitioned_tag`: The number of distinct values for a tag
|
||||
- `data_lifespan`: The [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period), if set, or the expected lifetime of the database
|
||||
- `partition_duration`: The partition time interval, defined by the [tine part template](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates)
|
||||
- `partition_duration`: The partition time interval, defined by the [time part template](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates)
|
||||
|
|
|
@ -79,7 +79,7 @@ customerID,500
|
|||
```
|
||||
|
||||
Values of the `customerID` tag are bucketed into 500 distinct "buckets."
|
||||
Each bucket is identified by the remainder of the tag value hashed into a 32bit
|
||||
Each bucket is identified by the remainder of the tag value hashed into a 32-bit
|
||||
integer divided by the specified number of buckets:
|
||||
|
||||
```rust
|
||||
|
@ -108,8 +108,8 @@ Time part templates use a limited subset of the
|
|||
[Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)
|
||||
to specify time format in partition keys.
|
||||
Time part templates can be daily (`%Y-%m-%d`), monthly (`%Y-%m`), or yearly (`%Y`).
|
||||
InfluxDB uses the smallest unit of time included in the time part template as
|
||||
the partition interval.
|
||||
InfluxDB partitions data by the smallest unit of time included in the time part
|
||||
template.
|
||||
|
||||
InfluxDB supports only [date specifiers](#date-specifiers) in time part templates.
|
||||
|
||||
|
|
Loading…
Reference in New Issue