Update dra.md
pull/51878/head
xin gu 2025-08-11 11:09:48 +08:00
parent e9eb5cc327
commit 4c7abf0dfb
1 changed files with 60 additions and 51 deletions

View File

@ -57,7 +57,7 @@ DRA 驱动是运行在集群的每个节点上的第三方应用,对接节点
DRA drivers implement the [`kubeletplugin` package
interface](https://pkg.go.dev/k8s.io/dynamic-resource-allocation/kubeletplugin).
Your driver may support seamless upgrades by implementing a property of this
Your driver may support _seamless upgrades_ by implementing a property of this
interface that allows two versions of the same DRA driver to coexist for a short
time. This is only available for kubelet versions 1.33 and above and may not be
supported by your driver for heterogeneous clusters with attached nodes running
@ -67,7 +67,7 @@ older versions of Kubernetes - check your driver's documentation to be sure.
DRA 驱动实现
[`kubeletplugin` 包接口](https://pkg.go.dev/k8s.io/dynamic-resource-allocation/kubeletplugin)。
你的驱动可能通过实现此接口的一个属性,支持两个版本共存一段时间,从而实现无缝升级。
你的驱动可能通过实现此接口的一个属性,支持两个版本共存一段时间,从而实现**无缝升级**
该功能仅适用于 kubelet v1.33 及更高版本,对于运行旧版 Kubernetes 的节点所组成的异构集群,
可能不支持这种功能。请查阅你的驱动文档予以确认。
@ -98,7 +98,7 @@ observe that:
<!--
### Confirm your DRA driver exposes a liveness probe and utilize it
Your DRA driver likely implements a grpc socket for healthchecks as part of DRA
Your DRA driver likely implements a gRPC socket for healthchecks as part of DRA
driver good practices. The easiest way to utilize this grpc socket is to
configure it as a liveness probe for the DaemonSet deploying your DRA driver.
Your driver's documentation or deployment tooling may already include this, but
@ -110,7 +110,7 @@ heal, reducing scheduling delays or troubleshooting time.
-->
### 确认你的 DRA 驱动暴露了存活探针并加以利用 {#confirm-your-dra-driver-exposes-a-liveness-probe-and-utilize-it}
你的 DRA 驱动可能已实现用于健康检查的 grpc 套接字,这是 DRA 驱动的良好实践之一。
你的 DRA 驱动可能已实现用于健康检查的 gRPC 套接字,这是 DRA 驱动的良好实践之一。
最简单的利用方式是将该 grpc 套接字配置为部署 DRA 驱动 DaemonSet 的存活探针。
驱动文档或部署工具可能已包括此项配置,但如果你是自行配置或未以 Kubernetes Pod 方式运行 DRA 驱动,
确保你的编排工具在该 grpc 套接字健康检查失败时能重启驱动。这样可以最大程度地减少 DRA 驱动的意外停机,
@ -136,13 +136,15 @@ ResourceClaim 或 ResourceClaimTemplate。
<!--
## Monitor and tune components for higher load, especially in high scale environments
Control plane component `kube-scheduler` and the internal ResourceClaim
controller orchestrated by the component `kube-controller-manager` do the heavy
lifting during scheduling of Pods with claims based on metadata stored in the
DRA APIs. Compared to non-DRA scheduled Pods, the number of API server calls,
memory, and CPU utilization needed by these components is increased for Pods
using DRA claims. In addition, node local components like the DRA driver and
kubelet utilize DRA APIs to allocated the hardware request at Pod sandbox
Control plane component {{< glossary_tooltip text="kube-scheduler"
term_id="kube-scheduler" >}} and the internal ResourceClaim controller
orchestrated by the component {{< glossary_tooltip
text="kube-controller-manager" term_id="kube-controller-manager" >}} do the
heavy lifting during scheduling of Pods with claims based on metadata stored in
the DRA APIs. Compared to non-DRA scheduled Pods, the number of API server
calls, memory, and CPU utilization needed by these components is increased for
Pods using DRA claims. In addition, node local components like the DRA driver
and kubelet utilize DRA APIs to allocated the hardware request at Pod sandbox
creation time. Especially in high scale environments where clusters have many
nodes, and/or deploy many workloads that heavily utilize DRA defined resource
claims, the cluster administrator should configure the relevant components to
@ -150,11 +152,13 @@ anticipate the increased load.
-->
## 在大规模环境中在高负载场景下监控和调优组件 {#monitor-and-tune-components-for-higher-load-especially-in-high-scale-environments}
控制面组件 `kube-scheduler` 以及 `kube-controller-manager` 中的内部 ResourceClaim
控制器在调度使用 DRA 申领的 Pod 时承担了大量任务。与不使用 DRA 的 Pod 相比,这些组件所需的
API 服务器调用次数、内存和 CPU 使用率都更高。此外,节点本地组件(如 DRA 驱动和 kubelet也在创建
Pod 沙箱时使用 DRA API 分配硬件请求资源。
尤其在集群节点数量众多或大量工作负载依赖 DRA 定义的资源申领时,集群管理员应当预先为相关组件配置合理参数以应对增加的负载。
控制面组件 {{< glossary_tooltip text="kube-scheduler" term_id="kube-scheduler" >}}
以及 {{< glossary_tooltip text="kube-controller-manager" term_id="kube-controller-manager" >}}
中的内部 ResourceClaim 控制器在调度使用 DRA 申领的 Pod 时承担了大量任务。与不使用 DRA 的 Pod 相比,
这些组件所需的 API 服务器调用次数、内存和 CPU 使用率都更高。此外,
节点本地组件(如 DRA 驱动和 kubelet也在创建 Pod 沙箱时使用 DRA API 分配硬件请求资源。
尤其在集群节点数量众多或大量工作负载依赖 DRA 定义的资源申领时,
集群管理员应当预先为相关组件配置合理参数以应对增加的负载。
<!--
The effects of mistuned components can have direct or snowballing affects
@ -171,26 +175,29 @@ client-go configuration within `kube-controller-manager` are critical.
<!--
The specific values to tune your cluster to depend on a variety of factors like
number of nodes/pods, rate of pod creation, churn, even in non-DRA environments;
see the [SIG-Scalability README on Kubernetes scalability
see the [SIG Scalability README on Kubernetes scalability
thresholds](https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md)
for more information. In scale tests performed against a DRA enabled cluster
with 100 nodes, involving 720 long-lived pods (90% saturation) and 80 churn pods
(10% churn, 10 times), with a job creation QPS of 10, `kube-controller-manager`
QPS could be set to as low as 75 and Burst to 150 to meet equivalent metric
targets for non-DRA deployments. At this lower bound, it was observed that the
client side rate limiter was triggered enough to protect apiserver from
explosive burst but was is high enough that pod startup SLOs were not impacted.
client side rate limiter was triggered enough to protect the API server from
explosive burst but was high enough that pod startup SLOs were not impacted.
While this is a good starting point, you can get a better idea of how to tune
the different components that have the biggest effect on DRA performance for
your deployment by monitoring the following metrics.
your deployment by monitoring the following metrics. For more information on all
the stable metrics in Kubernetes, see the [Kubernetes Metrics
Reference](/docs/reference/generated/metrics/).
-->
集群调优所需的具体数值取决于多个因素,如节点/Pod 数量、Pod 创建速率、变化频率,甚至与是否使用 DRA 无关。更多信息请参考
[SIG-Scalability README 中的可扩缩性阈值](https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md)。
[SIG Scalability README 中的可扩缩性阈值](https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md)。
在一项针对启用了 DRA 的 100 节点集群的规模测试中,部署了 720 个长生命周期 Pod90% 饱和度)和 80
个短周期 Pod10% 流失,重复 10 次),作业创建 QPS 为 10。将 `kube-controller-manager` 的 QPS
设置为 75、Burst 设置为 150能达到与非 DRA 部署中相同的性能指标。在这个下限设置下,
客户端速率限制器能有效保护 API 服务器避免突发请求,同时不影响 Pod 启动 SLO。
这可作为一个良好的起点。你可以通过监控下列指标,进一步判断对 DRA 性能影响最大的组件,从而优化其配置。
有关 Kubernetes 中所有稳定指标的更多信息,请参阅 [Kubernetes 指标参考](/zh-cn/docs/reference/generated/metrics/)。
<!--
### `kube-controller-manager` metrics
@ -203,24 +210,22 @@ managed by the `kube-controller-manager` component.
以下指标聚焦于由 `kube-controller-manager` 组件管理的内部 ResourceClaim 控制器:
<!--
* Workqueue Add Rate: Monitor
`sum(rate(workqueue_adds_total{name="resource_claim"}[5m]))` to gauge how
quickly items are added to the ResourceClaim controller.
* Workqueue Add Rate: Monitor {{< highlight promql "hl_inline=true" >}} sum(rate(workqueue_adds_total{name="resource_claim"}[5m])) {{< /highlight >}} to gauge how quickly items are added to the ResourceClaim controller.
* Workqueue Depth: Track
`sum(workqueue_depth{endpoint="kube-controller-manager",
name="resource_claim"})` to identify any backlogs in the ResourceClaim
{{< highlight promql "hl_inline=true" >}}sum(workqueue_depth{endpoint="kube-controller-manager",
name="resource_claim"}){{< /highlight >}} to identify any backlogs in the ResourceClaim
controller.
* Workqueue Work Duration: Observe `histogram_quantile(0.99,
* Workqueue Work Duration: Observe {{< highlight promql "hl_inline=true">}}histogram_quantile(0.99,
sum(rate(workqueue_work_duration_seconds_bucket{name="resource_claim"}[5m]))
by (le))` to understand the speed at which the ResourceClaim controller
by (le)){{< /highlight >}} to understand the speed at which the ResourceClaim controller
processes work.
-->
* 工作队列添加速率:监控 `sum(rate(workqueue_adds_total{name="resource_claim"}[5m]))`
* 工作队列添加速率:监控 {{< highlight promql "hl_inline=true" >}}sum(rate(workqueue_adds_total{name="resource_claim"}[5m])){{< /highlight >}}
以衡量任务加入 ResourceClaim 控制器的速度。
* 工作队列深度:跟踪 `sum(workqueue_depth{endpoint="kube-controller-manager", name="resource_claim"})`
* 工作队列深度:跟踪 {{< highlight promql "hl_inline=true" >}}sum(workqueue_depth{endpoint="kube-controller-manager", name="resource_claim"}){{< /highlight >}}
识别 ResourceClaim 控制器中是否存在积压。
* 工作队列处理时长:观察
`histogram_quantile(0.99, sum(rate(workqueue_work_duration_seconds_bucket{name="resource_claim"}[5m])) by (le))`
{{< highlight promql "hl_inline=true">}}histogram_quantile(0.99, sum(rate(workqueue_work_duration_seconds_bucket{name="resource_claim"}[5m])) by (le)){{< /highlight >}}
以了解 ResourceClaim 控制器的处理速度。
<!--
@ -249,7 +254,7 @@ manageable.
The following scheduler metrics are high level metrics aggregating performance
across all Pods scheduled, not just those using DRA. It is important to note
that the end-to-end metrics are ultimately influenced by the
kube-controller-manager's performance in creating ResourceClaims from
`kube-controller-manager`'s performance in creating ResourceClaims from
ResourceClainTemplates in deployments that heavily use ResourceClainTemplates.
-->
### `kube-scheduler` 指标 {#kube-scheduler-metrics}
@ -259,17 +264,17 @@ ResourceClainTemplates in deployments that heavily use ResourceClainTemplates.
的性能影响,尤其在广泛使用 ResourceClaimTemplate 的部署中。
<!--
* Scheduler End-to-End Duration: Monitor `histogram_quantile(0.99,
* Scheduler End-to-End Duration: Monitor {{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99,
sum(increase(scheduler_pod_scheduling_sli_duration_seconds_bucket[5m])) by
(le))`.
* Scheduler Algorithm Latency: Track `histogram_quantile(0.99,
(le)){{< /highlight >}}.
* Scheduler Algorithm Latency: Track {{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99,
sum(increase(scheduler_scheduling_algorithm_duration_seconds_bucket[5m])) by
(le))`.
(le)){{< /highlight >}}.
-->
* 调度器端到端耗时:监控
`histogram_quantile(0.99, sum(increase(scheduler_pod_scheduling_sli_duration_seconds_bucket[5m])) by (le))`
{{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99, sum(increase(scheduler_pod_scheduling_sli_duration_seconds_bucket[5m])) by (le)){{< /highlight >}}。
* 调度器算法延迟:跟踪
`histogram_quantile(0.99, sum(increase(scheduler_scheduling_algorithm_duration_seconds_bucket[5m])) by (le))`
{{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99, sum(increase(scheduler_scheduling_algorithm_duration_seconds_bucket[5m])) by (le)){{< /highlight >}}。
<!--
### `kubelet` metrics
@ -285,17 +290,17 @@ following metrics.
`NodePrepareResources``NodeUnprepareResources` 方法。你可以通过以下指标从 kubelet 的角度观察其行为。
<!--
* Kubelet NodePrepareResources: Monitor `histogram_quantile(0.99,
* Kubelet NodePrepareResources: Monitor {{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99,
sum(rate(dra_operations_duration_seconds_bucket{operation_name="PrepareResources"}[5m]))
by (le))`.
* Kubelet NodeUnprepareResources: Track `histogram_quantile(0.99,
by (le)){{< /highlight >}}.
* Kubelet NodeUnprepareResources: Track {{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99,
sum(rate(dra_operations_duration_seconds_bucket{operation_name="UnprepareResources"}[5m]))
by (le))`.
by (le)){{< /highlight >}}.
-->
* kubelet 调用 PrepareResources监控
`histogram_quantile(0.99, sum(rate(dra_operations_duration_seconds_bucket{operation_name="PrepareResources"}[5m])) by (le))`
{{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99, sum(rate(dra_operations_duration_seconds_bucket{operation_name="PrepareResources"}[5m])) by (le)){{< /highlight >}}。
* kubelet 调用 UnprepareResources跟踪
`histogram_quantile(0.99, sum(rate(dra_operations_duration_seconds_bucket{operation_name="UnprepareResources"}[5m])) by (le))`
{{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99, sum(rate(dra_operations_duration_seconds_bucket{operation_name="UnprepareResources"}[5m])) by (le)){{< /highlight >}}。
<!--
### DRA kubeletplugin operations
@ -313,21 +318,25 @@ DRA 驱动实现 [`kubeletplugin` 包接口](https://pkg.go.dev/k8s.io/dynamic-r
你可以从内部 kubeletplugin 的角度通过以下指标观察其行为:
<!--
* DRA kubeletplugin gRPC NodePrepareResources operation: Observe `histogram_quantile(0.99,
* DRA kubeletplugin gRPC NodePrepareResources operation: Observe {{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99,
sum(rate(dra_grpc_operations_duration_seconds_bucket{method_name=~".*NodePrepareResources"}[5m]))
by (le))`
* DRA kubeletplugin gRPC NodeUnprepareResources operation: Observe `histogram_quantile(0.99,
by (le)){{< /highlight >}}.
* DRA kubeletplugin gRPC NodeUnprepareResources operation: Observe {{< highlight promql "hl_inline=true" >}} histogram_quantile(0.99,
sum(rate(dra_grpc_operations_duration_seconds_bucket{method_name=~".*NodeUnprepareResources"}[5m]))
by (le))`.
by (le)){{< /highlight >}}.
-->
* DRA kubeletplugin 的 NodePrepareResources 操作:观察
`histogram_quantile(0.99, sum(rate(dra_grpc_operations_duration_seconds_bucket{method_name=~".*NodePrepareResources"}[5m])) by (le))`
{{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99, sum(rate(dra_grpc_operations_duration_seconds_bucket{method_name=~".*NodePrepareResources"}[5m])) by (le)){{< /highlight >}}。
* DRA kubeletplugin 的 NodeUnprepareResources 操作:观察
`histogram_quantile(0.99, sum(rate(dra_grpc_operations_duration_seconds_bucket{method_name=~".*NodeUnprepareResources"}[5m])) by (le))`
{{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99, sum(rate(dra_grpc_operations_duration_seconds_bucket{method_name=~".*NodeUnprepareResources"}[5m])) by (le)){{< /highlight >}}。
## {{% heading "whatsnext" %}}
<!--
* [Learn more about DRA](/docs/concepts/scheduling-eviction/dynamic-resource-allocation)
* [Learn more about
DRA](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/)
* Read the [Kubernetes Metrics
Reference](/docs/reference/generated/metrics/)
-->
* [进一步了解 DRA](/zh-cn/docs/concepts/scheduling-eviction/dynamic-resource-allocation)
* 阅读 [Kubernetes 指标参考](/zh-cn/docs/reference/generated/metrics/)