parent
e9eb5cc327
commit
4c7abf0dfb
|
@ -57,7 +57,7 @@ DRA 驱动是运行在集群的每个节点上的第三方应用,对接节点
|
|||
|
||||
DRA drivers implement the [`kubeletplugin` package
|
||||
interface](https://pkg.go.dev/k8s.io/dynamic-resource-allocation/kubeletplugin).
|
||||
Your driver may support seamless upgrades by implementing a property of this
|
||||
Your driver may support _seamless upgrades_ by implementing a property of this
|
||||
interface that allows two versions of the same DRA driver to coexist for a short
|
||||
time. This is only available for kubelet versions 1.33 and above and may not be
|
||||
supported by your driver for heterogeneous clusters with attached nodes running
|
||||
|
@ -67,7 +67,7 @@ older versions of Kubernetes - check your driver's documentation to be sure.
|
|||
|
||||
DRA 驱动实现
|
||||
[`kubeletplugin` 包接口](https://pkg.go.dev/k8s.io/dynamic-resource-allocation/kubeletplugin)。
|
||||
你的驱动可能通过实现此接口的一个属性,支持两个版本共存一段时间,从而实现无缝升级。
|
||||
你的驱动可能通过实现此接口的一个属性,支持两个版本共存一段时间,从而实现**无缝升级**。
|
||||
该功能仅适用于 kubelet v1.33 及更高版本,对于运行旧版 Kubernetes 的节点所组成的异构集群,
|
||||
可能不支持这种功能。请查阅你的驱动文档予以确认。
|
||||
|
||||
|
@ -98,7 +98,7 @@ observe that:
|
|||
<!--
|
||||
### Confirm your DRA driver exposes a liveness probe and utilize it
|
||||
|
||||
Your DRA driver likely implements a grpc socket for healthchecks as part of DRA
|
||||
Your DRA driver likely implements a gRPC socket for healthchecks as part of DRA
|
||||
driver good practices. The easiest way to utilize this grpc socket is to
|
||||
configure it as a liveness probe for the DaemonSet deploying your DRA driver.
|
||||
Your driver's documentation or deployment tooling may already include this, but
|
||||
|
@ -110,7 +110,7 @@ heal, reducing scheduling delays or troubleshooting time.
|
|||
-->
|
||||
### 确认你的 DRA 驱动暴露了存活探针并加以利用 {#confirm-your-dra-driver-exposes-a-liveness-probe-and-utilize-it}
|
||||
|
||||
你的 DRA 驱动可能已实现用于健康检查的 grpc 套接字,这是 DRA 驱动的良好实践之一。
|
||||
你的 DRA 驱动可能已实现用于健康检查的 gRPC 套接字,这是 DRA 驱动的良好实践之一。
|
||||
最简单的利用方式是将该 grpc 套接字配置为部署 DRA 驱动 DaemonSet 的存活探针。
|
||||
驱动文档或部署工具可能已包括此项配置,但如果你是自行配置或未以 Kubernetes Pod 方式运行 DRA 驱动,
|
||||
确保你的编排工具在该 grpc 套接字健康检查失败时能重启驱动。这样可以最大程度地减少 DRA 驱动的意外停机,
|
||||
|
@ -136,13 +136,15 @@ ResourceClaim 或 ResourceClaimTemplate。
|
|||
<!--
|
||||
## Monitor and tune components for higher load, especially in high scale environments
|
||||
|
||||
Control plane component `kube-scheduler` and the internal ResourceClaim
|
||||
controller orchestrated by the component `kube-controller-manager` do the heavy
|
||||
lifting during scheduling of Pods with claims based on metadata stored in the
|
||||
DRA APIs. Compared to non-DRA scheduled Pods, the number of API server calls,
|
||||
memory, and CPU utilization needed by these components is increased for Pods
|
||||
using DRA claims. In addition, node local components like the DRA driver and
|
||||
kubelet utilize DRA APIs to allocated the hardware request at Pod sandbox
|
||||
Control plane component {{< glossary_tooltip text="kube-scheduler"
|
||||
term_id="kube-scheduler" >}} and the internal ResourceClaim controller
|
||||
orchestrated by the component {{< glossary_tooltip
|
||||
text="kube-controller-manager" term_id="kube-controller-manager" >}} do the
|
||||
heavy lifting during scheduling of Pods with claims based on metadata stored in
|
||||
the DRA APIs. Compared to non-DRA scheduled Pods, the number of API server
|
||||
calls, memory, and CPU utilization needed by these components is increased for
|
||||
Pods using DRA claims. In addition, node local components like the DRA driver
|
||||
and kubelet utilize DRA APIs to allocated the hardware request at Pod sandbox
|
||||
creation time. Especially in high scale environments where clusters have many
|
||||
nodes, and/or deploy many workloads that heavily utilize DRA defined resource
|
||||
claims, the cluster administrator should configure the relevant components to
|
||||
|
@ -150,11 +152,13 @@ anticipate the increased load.
|
|||
-->
|
||||
## 在大规模环境中在高负载场景下监控和调优组件 {#monitor-and-tune-components-for-higher-load-especially-in-high-scale-environments}
|
||||
|
||||
控制面组件 `kube-scheduler` 以及 `kube-controller-manager` 中的内部 ResourceClaim
|
||||
控制器在调度使用 DRA 申领的 Pod 时承担了大量任务。与不使用 DRA 的 Pod 相比,这些组件所需的
|
||||
API 服务器调用次数、内存和 CPU 使用率都更高。此外,节点本地组件(如 DRA 驱动和 kubelet)也在创建
|
||||
Pod 沙箱时使用 DRA API 分配硬件请求资源。
|
||||
尤其在集群节点数量众多或大量工作负载依赖 DRA 定义的资源申领时,集群管理员应当预先为相关组件配置合理参数以应对增加的负载。
|
||||
控制面组件 {{< glossary_tooltip text="kube-scheduler" term_id="kube-scheduler" >}}
|
||||
以及 {{< glossary_tooltip text="kube-controller-manager" term_id="kube-controller-manager" >}}
|
||||
中的内部 ResourceClaim 控制器在调度使用 DRA 申领的 Pod 时承担了大量任务。与不使用 DRA 的 Pod 相比,
|
||||
这些组件所需的 API 服务器调用次数、内存和 CPU 使用率都更高。此外,
|
||||
节点本地组件(如 DRA 驱动和 kubelet)也在创建 Pod 沙箱时使用 DRA API 分配硬件请求资源。
|
||||
尤其在集群节点数量众多或大量工作负载依赖 DRA 定义的资源申领时,
|
||||
集群管理员应当预先为相关组件配置合理参数以应对增加的负载。
|
||||
|
||||
<!--
|
||||
The effects of mistuned components can have direct or snowballing affects
|
||||
|
@ -171,26 +175,29 @@ client-go configuration within `kube-controller-manager` are critical.
|
|||
<!--
|
||||
The specific values to tune your cluster to depend on a variety of factors like
|
||||
number of nodes/pods, rate of pod creation, churn, even in non-DRA environments;
|
||||
see the [SIG-Scalability README on Kubernetes scalability
|
||||
see the [SIG Scalability README on Kubernetes scalability
|
||||
thresholds](https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md)
|
||||
for more information. In scale tests performed against a DRA enabled cluster
|
||||
with 100 nodes, involving 720 long-lived pods (90% saturation) and 80 churn pods
|
||||
(10% churn, 10 times), with a job creation QPS of 10, `kube-controller-manager`
|
||||
QPS could be set to as low as 75 and Burst to 150 to meet equivalent metric
|
||||
targets for non-DRA deployments. At this lower bound, it was observed that the
|
||||
client side rate limiter was triggered enough to protect apiserver from
|
||||
explosive burst but was is high enough that pod startup SLOs were not impacted.
|
||||
client side rate limiter was triggered enough to protect the API server from
|
||||
explosive burst but was high enough that pod startup SLOs were not impacted.
|
||||
While this is a good starting point, you can get a better idea of how to tune
|
||||
the different components that have the biggest effect on DRA performance for
|
||||
your deployment by monitoring the following metrics.
|
||||
your deployment by monitoring the following metrics. For more information on all
|
||||
the stable metrics in Kubernetes, see the [Kubernetes Metrics
|
||||
Reference](/docs/reference/generated/metrics/).
|
||||
-->
|
||||
集群调优所需的具体数值取决于多个因素,如节点/Pod 数量、Pod 创建速率、变化频率,甚至与是否使用 DRA 无关。更多信息请参考
|
||||
[SIG-Scalability README 中的可扩缩性阈值](https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md)。
|
||||
[SIG Scalability README 中的可扩缩性阈值](https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md)。
|
||||
在一项针对启用了 DRA 的 100 节点集群的规模测试中,部署了 720 个长生命周期 Pod(90% 饱和度)和 80
|
||||
个短周期 Pod(10% 流失,重复 10 次),作业创建 QPS 为 10。将 `kube-controller-manager` 的 QPS
|
||||
设置为 75、Burst 设置为 150,能达到与非 DRA 部署中相同的性能指标。在这个下限设置下,
|
||||
客户端速率限制器能有效保护 API 服务器避免突发请求,同时不影响 Pod 启动 SLO。
|
||||
这可作为一个良好的起点。你可以通过监控下列指标,进一步判断对 DRA 性能影响最大的组件,从而优化其配置。
|
||||
有关 Kubernetes 中所有稳定指标的更多信息,请参阅 [Kubernetes 指标参考](/zh-cn/docs/reference/generated/metrics/)。
|
||||
|
||||
<!--
|
||||
### `kube-controller-manager` metrics
|
||||
|
@ -203,24 +210,22 @@ managed by the `kube-controller-manager` component.
|
|||
以下指标聚焦于由 `kube-controller-manager` 组件管理的内部 ResourceClaim 控制器:
|
||||
|
||||
<!--
|
||||
* Workqueue Add Rate: Monitor
|
||||
`sum(rate(workqueue_adds_total{name="resource_claim"}[5m]))` to gauge how
|
||||
quickly items are added to the ResourceClaim controller.
|
||||
* Workqueue Add Rate: Monitor {{< highlight promql "hl_inline=true" >}} sum(rate(workqueue_adds_total{name="resource_claim"}[5m])) {{< /highlight >}} to gauge how quickly items are added to the ResourceClaim controller.
|
||||
* Workqueue Depth: Track
|
||||
`sum(workqueue_depth{endpoint="kube-controller-manager",
|
||||
name="resource_claim"})` to identify any backlogs in the ResourceClaim
|
||||
{{< highlight promql "hl_inline=true" >}}sum(workqueue_depth{endpoint="kube-controller-manager",
|
||||
name="resource_claim"}){{< /highlight >}} to identify any backlogs in the ResourceClaim
|
||||
controller.
|
||||
* Workqueue Work Duration: Observe `histogram_quantile(0.99,
|
||||
* Workqueue Work Duration: Observe {{< highlight promql "hl_inline=true">}}histogram_quantile(0.99,
|
||||
sum(rate(workqueue_work_duration_seconds_bucket{name="resource_claim"}[5m]))
|
||||
by (le))` to understand the speed at which the ResourceClaim controller
|
||||
by (le)){{< /highlight >}} to understand the speed at which the ResourceClaim controller
|
||||
processes work.
|
||||
-->
|
||||
* 工作队列添加速率:监控 `sum(rate(workqueue_adds_total{name="resource_claim"}[5m]))`,
|
||||
* 工作队列添加速率:监控 {{< highlight promql "hl_inline=true" >}}sum(rate(workqueue_adds_total{name="resource_claim"}[5m])){{< /highlight >}},
|
||||
以衡量任务加入 ResourceClaim 控制器的速度。
|
||||
* 工作队列深度:跟踪 `sum(workqueue_depth{endpoint="kube-controller-manager", name="resource_claim"})`,
|
||||
* 工作队列深度:跟踪 {{< highlight promql "hl_inline=true" >}}sum(workqueue_depth{endpoint="kube-controller-manager", name="resource_claim"}){{< /highlight >}},
|
||||
识别 ResourceClaim 控制器中是否存在积压。
|
||||
* 工作队列处理时长:观察
|
||||
`histogram_quantile(0.99, sum(rate(workqueue_work_duration_seconds_bucket{name="resource_claim"}[5m])) by (le))`,
|
||||
{{< highlight promql "hl_inline=true">}}histogram_quantile(0.99, sum(rate(workqueue_work_duration_seconds_bucket{name="resource_claim"}[5m])) by (le)){{< /highlight >}},
|
||||
以了解 ResourceClaim 控制器的处理速度。
|
||||
|
||||
<!--
|
||||
|
@ -249,7 +254,7 @@ manageable.
|
|||
The following scheduler metrics are high level metrics aggregating performance
|
||||
across all Pods scheduled, not just those using DRA. It is important to note
|
||||
that the end-to-end metrics are ultimately influenced by the
|
||||
kube-controller-manager's performance in creating ResourceClaims from
|
||||
`kube-controller-manager`'s performance in creating ResourceClaims from
|
||||
ResourceClainTemplates in deployments that heavily use ResourceClainTemplates.
|
||||
-->
|
||||
### `kube-scheduler` 指标 {#kube-scheduler-metrics}
|
||||
|
@ -259,17 +264,17 @@ ResourceClainTemplates in deployments that heavily use ResourceClainTemplates.
|
|||
的性能影响,尤其在广泛使用 ResourceClaimTemplate 的部署中。
|
||||
|
||||
<!--
|
||||
* Scheduler End-to-End Duration: Monitor `histogram_quantile(0.99,
|
||||
* Scheduler End-to-End Duration: Monitor {{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99,
|
||||
sum(increase(scheduler_pod_scheduling_sli_duration_seconds_bucket[5m])) by
|
||||
(le))`.
|
||||
* Scheduler Algorithm Latency: Track `histogram_quantile(0.99,
|
||||
(le)){{< /highlight >}}.
|
||||
* Scheduler Algorithm Latency: Track {{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99,
|
||||
sum(increase(scheduler_scheduling_algorithm_duration_seconds_bucket[5m])) by
|
||||
(le))`.
|
||||
(le)){{< /highlight >}}.
|
||||
-->
|
||||
* 调度器端到端耗时:监控
|
||||
`histogram_quantile(0.99, sum(increase(scheduler_pod_scheduling_sli_duration_seconds_bucket[5m])) by (le))`
|
||||
{{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99, sum(increase(scheduler_pod_scheduling_sli_duration_seconds_bucket[5m])) by (le)){{< /highlight >}}。
|
||||
* 调度器算法延迟:跟踪
|
||||
`histogram_quantile(0.99, sum(increase(scheduler_scheduling_algorithm_duration_seconds_bucket[5m])) by (le))`
|
||||
{{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99, sum(increase(scheduler_scheduling_algorithm_duration_seconds_bucket[5m])) by (le)){{< /highlight >}}。
|
||||
|
||||
<!--
|
||||
### `kubelet` metrics
|
||||
|
@ -285,17 +290,17 @@ following metrics.
|
|||
`NodePrepareResources` 和 `NodeUnprepareResources` 方法。你可以通过以下指标从 kubelet 的角度观察其行为。
|
||||
|
||||
<!--
|
||||
* Kubelet NodePrepareResources: Monitor `histogram_quantile(0.99,
|
||||
* Kubelet NodePrepareResources: Monitor {{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99,
|
||||
sum(rate(dra_operations_duration_seconds_bucket{operation_name="PrepareResources"}[5m]))
|
||||
by (le))`.
|
||||
* Kubelet NodeUnprepareResources: Track `histogram_quantile(0.99,
|
||||
by (le)){{< /highlight >}}.
|
||||
* Kubelet NodeUnprepareResources: Track {{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99,
|
||||
sum(rate(dra_operations_duration_seconds_bucket{operation_name="UnprepareResources"}[5m]))
|
||||
by (le))`.
|
||||
by (le)){{< /highlight >}}.
|
||||
-->
|
||||
* kubelet 调用 PrepareResources:监控
|
||||
`histogram_quantile(0.99, sum(rate(dra_operations_duration_seconds_bucket{operation_name="PrepareResources"}[5m])) by (le))`
|
||||
{{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99, sum(rate(dra_operations_duration_seconds_bucket{operation_name="PrepareResources"}[5m])) by (le)){{< /highlight >}}。
|
||||
* kubelet 调用 UnprepareResources:跟踪
|
||||
`histogram_quantile(0.99, sum(rate(dra_operations_duration_seconds_bucket{operation_name="UnprepareResources"}[5m])) by (le))`
|
||||
{{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99, sum(rate(dra_operations_duration_seconds_bucket{operation_name="UnprepareResources"}[5m])) by (le)){{< /highlight >}}。
|
||||
<!--
|
||||
### DRA kubeletplugin operations
|
||||
|
||||
|
@ -313,21 +318,25 @@ DRA 驱动实现 [`kubeletplugin` 包接口](https://pkg.go.dev/k8s.io/dynamic-r
|
|||
你可以从内部 kubeletplugin 的角度通过以下指标观察其行为:
|
||||
|
||||
<!--
|
||||
* DRA kubeletplugin gRPC NodePrepareResources operation: Observe `histogram_quantile(0.99,
|
||||
* DRA kubeletplugin gRPC NodePrepareResources operation: Observe {{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99,
|
||||
sum(rate(dra_grpc_operations_duration_seconds_bucket{method_name=~".*NodePrepareResources"}[5m]))
|
||||
by (le))`
|
||||
* DRA kubeletplugin gRPC NodeUnprepareResources operation: Observe `histogram_quantile(0.99,
|
||||
by (le)){{< /highlight >}}.
|
||||
* DRA kubeletplugin gRPC NodeUnprepareResources operation: Observe {{< highlight promql "hl_inline=true" >}} histogram_quantile(0.99,
|
||||
sum(rate(dra_grpc_operations_duration_seconds_bucket{method_name=~".*NodeUnprepareResources"}[5m]))
|
||||
by (le))`.
|
||||
by (le)){{< /highlight >}}.
|
||||
-->
|
||||
* DRA kubeletplugin 的 NodePrepareResources 操作:观察
|
||||
`histogram_quantile(0.99, sum(rate(dra_grpc_operations_duration_seconds_bucket{method_name=~".*NodePrepareResources"}[5m])) by (le))`
|
||||
{{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99, sum(rate(dra_grpc_operations_duration_seconds_bucket{method_name=~".*NodePrepareResources"}[5m])) by (le)){{< /highlight >}}。
|
||||
* DRA kubeletplugin 的 NodeUnprepareResources 操作:观察
|
||||
`histogram_quantile(0.99, sum(rate(dra_grpc_operations_duration_seconds_bucket{method_name=~".*NodeUnprepareResources"}[5m])) by (le))`
|
||||
{{< highlight promql "hl_inline=true" >}}histogram_quantile(0.99, sum(rate(dra_grpc_operations_duration_seconds_bucket{method_name=~".*NodeUnprepareResources"}[5m])) by (le)){{< /highlight >}}。
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
<!--
|
||||
* [Learn more about DRA](/docs/concepts/scheduling-eviction/dynamic-resource-allocation)
|
||||
* [Learn more about
|
||||
DRA](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/)
|
||||
* Read the [Kubernetes Metrics
|
||||
Reference](/docs/reference/generated/metrics/)
|
||||
-->
|
||||
* [进一步了解 DRA](/zh-cn/docs/concepts/scheduling-eviction/dynamic-resource-allocation)
|
||||
* 阅读 [Kubernetes 指标参考](/zh-cn/docs/reference/generated/metrics/)
|
||||
|
|
Loading…
Reference in New Issue