fix(partitioning): improve clarity and consistency in partitioning, apply suggestions from @reidkaufmann

2025-01-09 15:26:43 -06:00 · 2025-01-09 15:26:43 -06:00 · d961c8eeae
parent 37183896f4
commit d961c8eeae
2 changed files with 17 additions and 26 deletions
--- a/content/shared/v3-distributed-admin-custom-partitions/best-practices.md
+++ b/content/shared/v3-distributed-admin-custom-partitions/best-practices.md
@ -33,39 +33,30 @@ If points don't have a value for the tag, InfluxDB can't store them in the corre

 ## Avoid over-partitioning

-As you plan your partitioning strategy, keep in mind that data can be
-"over-partitioned"--meaning partitions are so granular that queries end up
-having to retrieve and read many partitions from the object store, which
-hurts query performance.
+As you plan your partitioning strategy, keep in mind that over-partitioning your data can hurt query performance. If partitions are too granular, queries may need to retrieve and read many partitions from the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store).

- Balance the partition time interval with the actual amount of data written
-  during each interval. If a single interval doesn't contain a lot of data,
-  it is better to partition by larger time intervals.
- Don't partition by tags that you typically don't use in your query workload.
- Don't partition by distinct values of high-cardinality tags.
-  Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to
-  partition by these tags.
+- Balance the partition time interval with the actual amount of data written during each interval. If a single interval doesn't contain a lot of data, partition by larger time intervals.
+- Avoid partitioning by tags that you typically don't use in your query workload.
+- Avoid partitioning by distinct values of high-cardinality tags. Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to partition by these tags.

 ## Limit the number of partition files

-Avoid exceeding **10,000** total partition files.
+Avoid exceeding **10,000** total partitions.
 Limiting the total partition count can help manage system performance and costs.

-While planning your strategy include the following steps to keep the total
-partition count below 10,000 files over the next few years:
+While planning your strategy, take the following steps to limit your total
+partition count.
+We currently recommend planning to keep the total partition count below 10,000.

 - [Estimate the total partition count](#estimate-the-total-partition-count) for the lifespan of your data
- Take the following steps to limit the total partition count:
-
-  - **Set a [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period)**
-    to prevent the number of files from growing unbounded.
-  - **Partition by month or year** to [avoid over-partitioning](#avoid-over-partitioning)
-and creating too many partition files.
-  - **Don't partition on high cardinality tags** unless you also use [tag buckets](#use-tag-buckets-for-high-cardinality-tags)
+- **Set a [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period)**
+  to prevent the number of partitions from growing unbounded
+- **Partition by month or year** to [avoid over-partitioning](#avoid-over-partitioning)
+- **Don't partition on high cardinality tags** unless you also use [tag buckets](#use-tag-buckets-for-high-cardinality-tags)

 ### Estimate the total partition count

-Use the following formula to estimate the total partition file count over the
+Use the following formula to estimate the total partition count over the
 lifetime of the database (or retention period):

 ```text
@ -75,4 +66,4 @@ total_partition_count = (cardinality_of_partitioned_tag) * (data_lifespan / part
 - `total_partition_count`: The number of partition files in [Object storage](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-storage)
 - `cardinality_of_partitioned_tag`: The number of distinct values for a tag
 - `data_lifespan`: The [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period), if set, or the expected lifetime of the database
- `partition_duration`: The partition time interval, defined by the [tine part template](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates)
+- `partition_duration`: The partition time interval, defined by the [time part template](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates)
--- a/content/shared/v3-distributed-admin-custom-partitions/partition-templates.md
+++ b/content/shared/v3-distributed-admin-custom-partitions/partition-templates.md
@ -79,7 +79,7 @@ customerID,500
 ```

 Values of the `customerID` tag are bucketed into 500 distinct "buckets." 
-Each bucket is identified by the remainder of the tag value hashed into a 32bit
+Each bucket is identified by the remainder of the tag value hashed into a 32-bit
 integer divided by the specified number of buckets:

 ```rust
@ -108,8 +108,8 @@ Time part templates use a limited subset of the
 [Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html)
 to specify time format in partition keys.
 Time part templates can be daily (`%Y-%m-%d`), monthly (`%Y-%m`), or yearly (`%Y`).
-InfluxDB uses the smallest unit of time included in the time part template as
-the partition interval.
+InfluxDB partitions data by the smallest unit of time included in the time part
+template.

 InfluxDB supports only [date specifiers](#date-specifiers) in time part templates.