diff --git a/content/shared/v3-distributed-admin-custom-partitions/best-practices.md b/content/shared/v3-distributed-admin-custom-partitions/best-practices.md index f2c49ff43..7d1ae30f1 100644 --- a/content/shared/v3-distributed-admin-custom-partitions/best-practices.md +++ b/content/shared/v3-distributed-admin-custom-partitions/best-practices.md @@ -33,39 +33,30 @@ If points don't have a value for the tag, InfluxDB can't store them in the corre ## Avoid over-partitioning -As you plan your partitioning strategy, keep in mind that data can be -"over-partitioned"--meaning partitions are so granular that queries end up -having to retrieve and read many partitions from the object store, which -hurts query performance. +As you plan your partitioning strategy, keep in mind that over-partitioning your data can hurt query performance. If partitions are too granular, queries may need to retrieve and read many partitions from the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store). -- Balance the partition time interval with the actual amount of data written - during each interval. If a single interval doesn't contain a lot of data, - it is better to partition by larger time intervals. -- Don't partition by tags that you typically don't use in your query workload. -- Don't partition by distinct values of high-cardinality tags. - Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to - partition by these tags. +- Balance the partition time interval with the actual amount of data written during each interval. If a single interval doesn't contain a lot of data, partition by larger time intervals. +- Avoid partitioning by tags that you typically don't use in your query workload. +- Avoid partitioning by distinct values of high-cardinality tags. Instead, [use tag buckets](#use-tag-buckets-for-high-cardinality-tags) to partition by these tags. ## Limit the number of partition files -Avoid exceeding **10,000** total partition files. +Avoid exceeding **10,000** total partitions. Limiting the total partition count can help manage system performance and costs. -While planning your strategy include the following steps to keep the total -partition count below 10,000 files over the next few years: +While planning your strategy, take the following steps to limit your total +partition count. +We currently recommend planning to keep the total partition count below 10,000. - [Estimate the total partition count](#estimate-the-total-partition-count) for the lifespan of your data -- Take the following steps to limit the total partition count: - - - **Set a [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period)** - to prevent the number of files from growing unbounded. - - **Partition by month or year** to [avoid over-partitioning](#avoid-over-partitioning) -and creating too many partition files. - - **Don't partition on high cardinality tags** unless you also use [tag buckets](#use-tag-buckets-for-high-cardinality-tags) +- **Set a [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period)** + to prevent the number of partitions from growing unbounded +- **Partition by month or year** to [avoid over-partitioning](#avoid-over-partitioning) +- **Don't partition on high cardinality tags** unless you also use [tag buckets](#use-tag-buckets-for-high-cardinality-tags) ### Estimate the total partition count -Use the following formula to estimate the total partition file count over the +Use the following formula to estimate the total partition count over the lifetime of the database (or retention period): ```text @@ -75,4 +66,4 @@ total_partition_count = (cardinality_of_partitioned_tag) * (data_lifespan / part - `total_partition_count`: The number of partition files in [Object storage](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-storage) - `cardinality_of_partitioned_tag`: The number of distinct values for a tag - `data_lifespan`: The [database retention period](/influxdb/cloud-dedicated/admin/databases/#retention-period), if set, or the expected lifetime of the database -- `partition_duration`: The partition time interval, defined by the [tine part template](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates) +- `partition_duration`: The partition time interval, defined by the [time part template](/influxdb/cloud-dedicated/admin/custom-partitions/partition-templates/#time-part-templates) diff --git a/content/shared/v3-distributed-admin-custom-partitions/partition-templates.md b/content/shared/v3-distributed-admin-custom-partitions/partition-templates.md index 33130e57d..6742c2547 100644 --- a/content/shared/v3-distributed-admin-custom-partitions/partition-templates.md +++ b/content/shared/v3-distributed-admin-custom-partitions/partition-templates.md @@ -79,7 +79,7 @@ customerID,500 ``` Values of the `customerID` tag are bucketed into 500 distinct "buckets." -Each bucket is identified by the remainder of the tag value hashed into a 32bit +Each bucket is identified by the remainder of the tag value hashed into a 32-bit integer divided by the specified number of buckets: ```rust @@ -108,8 +108,8 @@ Time part templates use a limited subset of the [Rust strftime date and time formatting syntax](https://docs.rs/chrono/latest/chrono/format/strftime/index.html) to specify time format in partition keys. Time part templates can be daily (`%Y-%m-%d`), monthly (`%Y-%m`), or yearly (`%Y`). -InfluxDB uses the smallest unit of time included in the time part template as -the partition interval. +InfluxDB partitions data by the smallest unit of time included in the time part +template. InfluxDB supports only [date specifiers](#date-specifiers) in time part templates.