Merge pull request #32671 from zaunist/docs/concepts

[zh] Resync device-plugins
pull/32698/head
Kubernetes Prow Robot 2022-03-31 17:40:38 -07:00 committed by GitHub
commit e94ce37cc2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 82 additions and 11 deletions

View File

@ -92,12 +92,14 @@ specification as they request other types of resources, with the following limit
* 扩展资源仅可作为整数资源使用,并且不能被过量使用 * 扩展资源仅可作为整数资源使用,并且不能被过量使用
* 设备不能在容器之间共享 * 设备不能在容器之间共享
### 示例 {#example-pod}
<!-- <!--
Suppose a Kubernetes cluster is running a device plugin that advertises resource `hardware-vendor.example/foo` Suppose a Kubernetes cluster is running a device plugin that advertises resource `hardware-vendor.example/foo`
on certain nodes. Here is an example of a pod requesting this resource to run a demo workload: on certain nodes. Here is an example of a pod requesting this resource to run a demo workload:
--> -->
假设 Kubernetes 集群正在运行一个设备插件,该插件在一些节点上公布的资源为 `hardware-vendor.example/foo` 假设 Kubernetes 集群正在运行一个设备插件,该插件在一些节点上公布的资源为 `hardware-vendor.example/foo`
下面就是一个 Pod 示例,请求此资源以运行某演示负载 下面就是一个 Pod 示例,请求此资源以运行一个工作负载的示例
```yaml ```yaml
--- ---
@ -140,8 +142,12 @@ The general workflow of a device plugin includes the following steps:
一个 gRPC 服务,该服务实现以下接口: 一个 gRPC 服务,该服务实现以下接口:
<!-- <!--
```gRPC ```gRPC
service DevicePlugin { service DevicePlugin {
// GetDevicePluginOptions returns options to be communicated with Device Manager.
rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
// ListAndWatch returns a stream of List of Devices // ListAndWatch returns a stream of List of Devices
// Whenever a Device state change or a Device disappears, ListAndWatch // Whenever a Device state change or a Device disappears, ListAndWatch
// returns the new list // returns the new list
@ -168,6 +174,9 @@ The general workflow of a device plugin includes the following steps:
--> -->
```gRPC ```gRPC
service DevicePlugin { service DevicePlugin {
// GetDevicePluginOptions 返回与设备管理器沟通的选项。
rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
// ListAndWatch 返回 Device 列表构成的数据流。 // ListAndWatch 返回 Device 列表构成的数据流。
// 当 Device 状态发生变化或者 Device 消失时ListAndWatch // 当 Device 状态发生变化或者 Device 消失时ListAndWatch
// 会返回新的列表。 // 会返回新的列表。
@ -331,6 +340,8 @@ service PodResourcesLister {
} }
``` ```
### `List` gRPC 端点 {#grpc-endpoint-list}
<!-- <!--
The `List` endpoint provides information on resources of running pods, with details such as the The `List` endpoint provides information on resources of running pods, with details such as the
id of exclusively allocated CPUs, device id as it was reported by device plugins and id of id of exclusively allocated CPUs, device id as it was reported by device plugins and id of
@ -387,6 +398,51 @@ message ContainerDevices {
} }
``` ```
<!--
{{< note >}}
cpu_ids in the `ContainerResources` in the `List` endpoint correspond to exclusive CPUs allocated
to a partilar container. If the goal is to evaluate CPUs that belong to the shared pool, the `List`
endpoint needs to be used in conjunction with the `GetAllocatableResources` endpoint as explained
below:
1. Call `GetAllocatableResources` to get a list of all the allocatable CPUs
2. Call `GetCpuIds` on all `ContainerResources` in the system
3. Subtract out all of the CPUs from the `GetCpuIds` calls from the `GetAllocatableResources` call
{{< /note >}}
-->
{{< note >}}
`List` 端点中的 `ContainerResources` 中的 cpu_ids 对应于分配给某个容器的专属 CPU。
如果要统计共享池中的 CPU`List` 端点需要与 `GetAllocatableResources` 端点一起使用,如下所述:
1. 调用 `GetAllocatableResources` 获取所有可用的 CPUs。
2. 在系统中所有的 `ContainerResources` 上调用 `GetCpuIds`
3. 用 `GetAllocatableResources` 获取的 CPU 数减去 `GetCpuIds` 获取的 CPU 数。
{{< /note >}}
### `GetAllocatableResources` gRPC 端点 {#grpc-endpoint-getallocatableresources}
{{< feature-state state="beta" for_k8s_version="v1.23" >}}
<!--
{{< note >}}
`GetAllocatableResources` should only be used to evaluate [allocatable](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable)
resources on a node. If the goal is to evaluate free/unallocated resources it should be used in
conjunction with the List() endpoint. The result obtained by `GetAllocatableResources` would remain
the same unless the underlying resources exposed to kubelet change. This happens rarely but when
it does (for example: hotplug/hotunplug, device health changes), client is expected to call
`GetAlloctableResources` endpoint.
However, calling `GetAllocatableResources` endpoint is not sufficient in case of cpu and/or memory
update and Kubelet needs to be restarted to reflect the correct resource capacity and allocatable.
{{< /note >}}
-->
{{< note >}}
`GetAllocatableResources` 应该仅被用于评估一个节点上的[可分配的](/zh/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable)
资源。如果目标是评估空闲/未分配的资源,此调用应该与 List() 端点一起使用。
除非暴露给 kubelet 的底层资源发生变化 否则 `GetAllocatableResources` 得到的结果将保持不变。
这种情况很少发生,但当发生时(例如:热插拔,设备健康状况改变),客户端应该调用 `GetAlloctableResources` 端点。
然而,调用 `GetAllocatableResources` 端点在 cpu、内存被更新的情况下是不够的
Kubelet 需要重新启动以获取正确的资源容量和可分配的资源。
{{< /note >}}
<!-- <!--
GetAllocatableResources provides information on resources initially available on the worker node. GetAllocatableResources provides information on resources initially available on the worker node.
It provides more information than kubelet exports to APIServer. It provides more information than kubelet exports to APIServer.
@ -394,7 +450,6 @@ It provides more information than kubelet exports to APIServer.
端点 `GetAllocatableResources` 提供最初在工作节点上可用的资源的信息。 端点 `GetAllocatableResources` 提供最初在工作节点上可用的资源的信息。
此端点所提供的信息比导出给 API 服务器的信息更丰富。 此端点所提供的信息比导出给 API 服务器的信息更丰富。
```gRPC ```gRPC
// AllocatableResourcesResponses 包含 kubelet 所了解到的所有设备的信息 // AllocatableResourcesResponses 包含 kubelet 所了解到的所有设备的信息
message AllocatableResourcesResponse { message AllocatableResourcesResponse {
@ -405,6 +460,23 @@ message AllocatableResourcesResponse {
``` ```
<!--
Starting from Kubernetes v1.23, the `GetAllocatableResources` is enabled by default.
You can disable it by turning off the
`KubeletPodResourcesGetAllocatable` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
Preceding Kubernetes v1.23, to enable this feature `kubelet` must be started with the following flag:
`--feature-gates=KubeletPodResourcesGetAllocatable=true`
-->
从 Kubernetes v1.23 开始,`GetAllocatableResources` 被默认启用。
你可以通过关闭 `KubeletPodResourcesGetAllocatable`
[特性门控](/zh/docs/reference/command-line-tools-reference/feature-gates/) 来禁用。
在 Kubernetes v1.23 之前,要启用这一功能,`kubelet` 必须用以下标志启动:
`--feature-gates=KubeletPodResourcesGetAllocatable=true`
<!-- <!--
`ContainerDevices` do expose the topology information declaring to which NUMA cells the device is affine. `ContainerDevices` do expose the topology information declaring to which NUMA cells the device is affine.
The NUMA cells are identified using a opaque integer ID, which value is consistent to what device The NUMA cells are identified using a opaque integer ID, which value is consistent to what device
@ -515,6 +587,7 @@ Here are some examples of device plugin implementations:
* 需要 [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) 2.0,以允许运行 Docker 容器的时候启用 GPU。 * 需要 [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) 2.0,以允许运行 Docker 容器的时候启用 GPU。
* [为 Container-Optimized OS 所提供的 NVIDIA GPU 设备插件](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu) * [为 Container-Optimized OS 所提供的 NVIDIA GPU 设备插件](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu)
* [RDMA 设备插件](https://github.com/hustcat/k8s-rdma-device-plugin) * [RDMA 设备插件](https://github.com/hustcat/k8s-rdma-device-plugin)
* [SocketCAN 设备插件](https://github.com/collabora/k8s-socketcan)
* [Solarflare 设备插件](https://github.com/vikaschoudhary16/sfc-device-plugin) * [Solarflare 设备插件](https://github.com/vikaschoudhary16/sfc-device-plugin)
* [SR-IOV 网络设备插件](https://github.com/intel/sriov-network-device-plugin) * [SR-IOV 网络设备插件](https://github.com/intel/sriov-network-device-plugin)
* [Xilinx FPGA 设备插件](https://github.com/Xilinx/FPGA_as_a_Service/tree/master/k8s-fpga-device-plugin) * [Xilinx FPGA 设备插件](https://github.com/Xilinx/FPGA_as_a_Service/tree/master/k8s-fpga-device-plugin)
@ -531,5 +604,3 @@ Here are some examples of device plugin implementations:
* 查看在上如何[公布节点上的扩展资源](/zh/docs/tasks/administer-cluster/extended-resource-node/) * 查看在上如何[公布节点上的扩展资源](/zh/docs/tasks/administer-cluster/extended-resource-node/)
* 阅读如何在 Kubernetes 中使用 [TLS Ingress 的硬件加速](https://kubernetes.io/blog/2019/04/24/hardware-accelerated-ssl/tls-termination-in-ingress-controllers-using-kubernetes-device-plugins-and-runtimeclass/) * 阅读如何在 Kubernetes 中使用 [TLS Ingress 的硬件加速](https://kubernetes.io/blog/2019/04/24/hardware-accelerated-ssl/tls-termination-in-ingress-controllers-using-kubernetes-device-plugins-and-runtimeclass/)
* 学习[拓扑管理器](/zh/docs/tasks/administer-cluster/topology-manager/) * 学习[拓扑管理器](/zh/docs/tasks/administer-cluster/topology-manager/)