Merge pull request #32671 from zaunist/docs/concepts

[zh] Resync device-plugins
2022-03-31 17:40:38 -07:00 · 2022-03-31 17:40:38 -07:00 · e94ce37cc2
parent b53955eed4 fd9b3076be
commit e94ce37cc2
1 changed files with 82 additions and 11 deletions
--- a/content/zh/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
+++ b/content/zh/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md
@ -92,12 +92,14 @@ specification as they request other types of resources, with the following limit
 * 扩展资源仅可作为整数资源使用，并且不能被过量使用
 * 设备不能在容器之间共享
 ### 示例 {#example-pod}
 <!--
 Suppose a Kubernetes cluster is running a device plugin that advertises resource `hardware-vendor.example/foo`
 on certain nodes. Here is an example of a pod requesting this resource to run a demo workload:
 -->
 假设 Kubernetes 集群正在运行一个设备插件，该插件在一些节点上公布的资源为 `hardware-vendor.example/foo`。
-下面就是一个 Pod 示例，请求此资源以运行某演示负载：
+下面就是一个 Pod 示例，请求此资源以运行一个工作负载的示例：
 ```yaml
 ---
@ -140,8 +142,12 @@ The general workflow of a device plugin includes the following steps:
  一个 gRPC 服务，该服务实现以下接口：
  <!--
  ```gRPC
  service DevicePlugin {
        // GetDevicePluginOptions returns options to be communicated with Device Manager.
        rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
        // ListAndWatch returns a stream of List of Devices
        // Whenever a Device state change or a Device disappears, ListAndWatch
        // returns the new list
@ -168,6 +174,9 @@ The general workflow of a device plugin includes the following steps:
  -->
  ```gRPC
  service DevicePlugin {
        // GetDevicePluginOptions 返回与设备管理器沟通的选项。
        rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
        // ListAndWatch 返回 Device 列表构成的数据流。
        // 当 Device 状态发生变化或者 Device 消失时，ListAndWatch
        // 会返回新的列表。
@ -331,6 +340,8 @@ service PodResourcesLister {
 }
 ```
 ### `List` gRPC 端点 {#grpc-endpoint-list}
 <!--
 The `List` endpoint provides information on resources of running pods, with details such as the
 id of exclusively allocated CPUs, device id as it was reported by device plugins and id of
@ -387,6 +398,51 @@ message ContainerDevices {
 }
 ```
 <!--
 {{< note >}}
 cpu_ids in the `ContainerResources` in the `List` endpoint correspond to exclusive CPUs allocated
 to a partilar container. If the goal is to evaluate CPUs that belong to the shared pool, the `List`
 endpoint needs to be used in conjunction with the `GetAllocatableResources` endpoint as explained
 below:
 1. Call `GetAllocatableResources` to get a list of all the allocatable CPUs
 2. Call `GetCpuIds` on all `ContainerResources` in the system
 3. Subtract out all of the CPUs from the `GetCpuIds` calls from the `GetAllocatableResources` call
 {{< /note >}}
 -->
 {{< note >}}
 `List` 端点中的 `ContainerResources` 中的 cpu_ids 对应于分配给某个容器的专属 CPU。
 如果要统计共享池中的 CPU，`List` 端点需要与 `GetAllocatableResources` 端点一起使用，如下所述:
 1. 调用 `GetAllocatableResources` 获取所有可用的 CPUs。
 2. 在系统中所有的 `ContainerResources` 上调用 `GetCpuIds`。
 3. 用 `GetAllocatableResources` 获取的 CPU 数减去 `GetCpuIds` 获取的 CPU 数。
 {{< /note >}}
 ### `GetAllocatableResources` gRPC 端点 {#grpc-endpoint-getallocatableresources}
 {{< feature-state state="beta" for_k8s_version="v1.23" >}}
 <!--
 {{< note >}}
 `GetAllocatableResources` should only be used to evaluate [allocatable](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable)
 resources on a node. If the goal is to evaluate free/unallocated resources it should be used in
 conjunction with the List() endpoint. The result obtained by `GetAllocatableResources` would remain
 the same unless the underlying resources exposed to kubelet change. This happens rarely but when
 it does (for example: hotplug/hotunplug, device health changes), client is expected to call
 `GetAlloctableResources` endpoint.
 However, calling `GetAllocatableResources` endpoint is not sufficient in case of cpu and/or memory
 update and Kubelet needs to be restarted to reflect the correct resource capacity and allocatable.
 {{< /note >}}
 -->
 {{< note >}}
 `GetAllocatableResources` 应该仅被用于评估一个节点上的[可分配的](/zh/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable)
 资源。如果目标是评估空闲/未分配的资源，此调用应该与 List() 端点一起使用。
 除非暴露给 kubelet 的底层资源发生变化 否则 `GetAllocatableResources` 得到的结果将保持不变。
 这种情况很少发生，但当发生时（例如：热插拔，设备健康状况改变），客户端应该调用 `GetAlloctableResources` 端点。
 然而，调用 `GetAllocatableResources` 端点在 cpu、内存被更新的情况下是不够的，
 Kubelet 需要重新启动以获取正确的资源容量和可分配的资源。
 {{< /note >}}
 <!--
 GetAllocatableResources provides information on resources initially available on the worker node.
 It provides more information than kubelet exports to APIServer.
@ -394,7 +450,6 @@ It provides more information than kubelet exports to APIServer.
 端点 `GetAllocatableResources` 提供最初在工作节点上可用的资源的信息。
 此端点所提供的信息比导出给 API 服务器的信息更丰富。
 ```gRPC
 // AllocatableResourcesResponses 包含 kubelet 所了解到的所有设备的信息
 message AllocatableResourcesResponse {
@ -405,6 +460,23 @@ message AllocatableResourcesResponse {
 ```
 <!--
 Starting from Kubernetes v1.23, the `GetAllocatableResources` is enabled by default.
 You can disable it by turning off the
 `KubeletPodResourcesGetAllocatable` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
 Preceding Kubernetes v1.23, to enable this feature `kubelet` must be started with the following flag:
 `--feature-gates=KubeletPodResourcesGetAllocatable=true`
 -->
 从 Kubernetes v1.23 开始，`GetAllocatableResources` 被默认启用。
 你可以通过关闭 `KubeletPodResourcesGetAllocatable`
 [特性门控](/zh/docs/reference/command-line-tools-reference/feature-gates/) 来禁用。
 在 Kubernetes v1.23 之前，要启用这一功能，`kubelet` 必须用以下标志启动：
 `--feature-gates=KubeletPodResourcesGetAllocatable=true`
 <!--
 `ContainerDevices` do expose the topology information declaring to which NUMA cells the device is affine.
 The NUMA cells are identified using a opaque integer ID, which value is consistent to what device
@ -515,6 +587,7 @@ Here are some examples of device plugin implementations:
 * 需要 [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) 2.0，以允许运行 Docker 容器的时候启用 GPU。
 * [为 Container-Optimized OS 所提供的 NVIDIA GPU 设备插件](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu)
 * [RDMA 设备插件](https://github.com/hustcat/k8s-rdma-device-plugin)
 * [SocketCAN 设备插件](https://github.com/collabora/k8s-socketcan)
 * [Solarflare 设备插件](https://github.com/vikaschoudhary16/sfc-device-plugin)
 * [SR-IOV 网络设备插件](https://github.com/intel/sriov-network-device-plugin)
 * [Xilinx FPGA 设备插件](https://github.com/Xilinx/FPGA_as_a_Service/tree/master/k8s-fpga-device-plugin)
@ -531,5 +604,3 @@ Here are some examples of device plugin implementations:
 * 查看在上如何[公布节点上的扩展资源](/zh/docs/tasks/administer-cluster/extended-resource-node/)
 * 阅读如何在 Kubernetes 中使用 [TLS Ingress 的硬件加速](https://kubernetes.io/blog/2019/04/24/hardware-accelerated-ssl/tls-termination-in-ingress-controllers-using-kubernetes-device-plugins-and-runtimeclass/)
 * 学习[拓扑管理器](/zh/docs/tasks/administer-cluster/topology-manager/)