fix some layout

pull/36708/head
yanrongshi 2022-09-09 23:22:21 +08:00
parent 104f34049d
commit b174bfed02
1 changed files with 33 additions and 25 deletions

View File

@ -1,6 +1,6 @@
--- ---
content_type: concept content_type: concept
title: 调度 GPUs title: 调度 GPU
description: 配置和调度 GPU 成一类资源以供集群中节点使用。 description: 配置和调度 GPU 成一类资源以供集群中节点使用。
--- ---
<!-- <!--
@ -76,7 +76,7 @@ when using GPUs:
- Each container can request one or more GPUs. It is not possible to request a - Each container can request one or more GPUs. It is not possible to request a
fraction of a GPU. fraction of a GPU.
--> -->
- GPUs 只能设置在 `limits` 部分,这意味着: - GPU 只能设置在 `limits` 部分,这意味着:
* 你可以指定 GPU 的 `limits` 而不指定其 `requests`Kubernetes 将使用限制 * 你可以指定 GPU 的 `limits` 而不指定其 `requests`Kubernetes 将使用限制
值作为默认的请求值; 值作为默认的请求值;
* 你可以同时指定 `limits``requests`,不过这两个值必须相等。 * 你可以同时指定 `limits``requests`,不过这两个值必须相等。
@ -87,6 +87,8 @@ when using GPUs:
<!-- <!--
Here's an example: Here's an example:
--> -->
这里是一个例子:
```yaml ```yaml
apiVersion: v1 apiVersion: v1
kind: Pod kind: Pod
@ -111,27 +113,20 @@ has the following requirements:
--> -->
### 部署 AMD GPU 设备插件 {#deploying-amd-gpu-device-plugin} ### 部署 AMD GPU 设备插件 {#deploying-amd-gpu-device-plugin}
[官方的 AMD GPU 设备插件](https://github.com/RadeonOpenCompute/k8s-device-plugin) 有以下要求: [官方的 AMD GPU 设备插件](https://github.com/RadeonOpenCompute/k8s-device-plugin)有以下要求:
<!-- <!--
- Kubernetes nodes have to be pre-installed with AMD GPU Linux driver. - Kubernetes nodes have to be pre-installed with AMD GPU Linux driver.
To deploy the AMD device plugin once your cluster is running and the above To deploy the AMD device plugin once your cluster is running and the above
requirements are satisfied: requirements are satisfied:
```
# For Kubernetes v1.9
kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/r1.9/k8s-ds-amdgpu-dp.yaml
# For Kubernetes v1.10
kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/r1.10/k8s-ds-amdgpu-dp.yaml
```
--> -->
- Kubernetes 节点必须预先安装 AMD GPU 的 Linux 驱动。 - Kubernetes 节点必须预先安装 AMD GPU 的 Linux 驱动。
如果你的集群已经启动并且满足上述要求的话,可以这样部署 AMD 设备插件: 如果你的集群已经启动并且满足上述要求的话,可以这样部署 AMD 设备插件:
```shell ```shell
kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/r1.10/k8s-ds-amdgpu-dp.yaml kubectl create -f https://raw.githubusercontent.com/RadeonOpenCompute/k8s-device-plugin/v1.10/k8s-ds-amdgpu-dp.yaml
``` ```
<!-- <!--
@ -148,7 +143,7 @@ There are currently two device plugin implementations for NVIDIA GPUs:
--> -->
### 部署 NVIDIA GPU 设备插件 {#deploying-nvidia-gpu-device-plugin} ### 部署 NVIDIA GPU 设备插件 {#deploying-nvidia-gpu-device-plugin}
对于 NVIDIA GPUs,目前存在两种设备插件的实现: 对于 NVIDIA GPU目前存在两种设备插件的实现
<!-- <!--
#### Official NVIDIA GPU device plugin #### Official NVIDIA GPU device plugin
@ -163,24 +158,32 @@ has the following requirements:
<!-- <!--
- Kubernetes nodes have to be pre-installed with NVIDIA drivers. - Kubernetes nodes have to be pre-installed with NVIDIA drivers.
- Kubernetes nodes have to be pre-installed with [nvidia-docker 2.0](https://github.com/NVIDIA/nvidia-docker) - Kubernetes nodes have to be pre-installed with [nvidia-docker 2.0](https://github.com/NVIDIA/nvidia-docker)
- nvidia-container-runtime must be configured as the [default runtime](https://github.com/NVIDIA/k8s-device-plugin#preparing-your-gpu-nodes) - Kubelet must use Docker as its container runtime
for docker instead of runc. - `nvidia-container-runtime` must be configured as the [default runtime](https://github.com/NVIDIA/k8s-device-plugin#preparing-your-gpu-nodes)
- NVIDIA drivers ~= 361.93 for Docker, instead of runc.
- The version of the NVIDIA drivers must match the constraint ~= 384.81.
To deploy the NVIDIA device plugin once your cluster is running and the above To deploy the NVIDIA device plugin once your cluster is running and the above
requirements are satisfied: requirements are satisfied:
--> -->
- Kubernetes 的节点必须预先安装了 NVIDIA 驱动 - Kubernetes 的节点必须预先安装了 NVIDIA 驱动
- Kubernetes 的节点必须预先安装 [nvidia-docker 2.0](https://github.com/NVIDIA/nvidia-docker) - Kubernetes 的节点必须预先安装 [nvidia-docker 2.0](https://github.com/NVIDIA/nvidia-docker)
- Docker 的[默认运行时](https://github.com/NVIDIA/k8s-device-plugin#preparing-your-gpu-nodes)必须设置为 nvidia-container-runtime而不是 runc - Kubelet 的容器运行时必须使用 Docker
- NVIDIA 驱动版本 ~= 384.81 - Docker 的[默认运行时](https://github.com/NVIDIA/k8s-device-plugin#preparing-your-gpu-nodes)必须设置为
`nvidia-container-runtime`,而不是 `runc`
- NVIDIA 驱动程序的版本必须匹配 ~= 361.93
如果你的集群已经启动并且满足上述要求的话,可以这样部署 NVIDIA 设备插件: 如果你的集群已经启动并且满足上述要求的话,可以这样部署 NVIDIA 设备插件:
```shell ```shell
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml
``` ```
请到 [NVIDIA/k8s-device-plugin](https://github.com/NVIDIA/k8s-device-plugin)项目报告有关此设备插件的问题。
<!--
You can report issues with this third-party device plugin by logging an issue in
[NVIDIA/k8s-device-plugin](https://github.com/NVIDIA/k8s-device-plugin).
-->
你可以通过在 [NVIDIA/k8s-device-plugin](https://github.com/NVIDIA/k8s-device-plugin) 中记录问题来报告此第三方设备插件的问题。
<!-- <!--
#### NVIDIA GPU device plugin used by GCE #### NVIDIA GPU device plugin used by GCE
@ -195,6 +198,9 @@ and has experimental code for Ubuntu from 1.9 onwards.
[GCE 使用的 NVIDIA GPU 设备插件](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu) 并不要求使用 nvidia-docker并且对于任何实现了 Kubernetes CRI 的容器运行时,都应该能够使用。这一实现已经在 [Container-Optimized OS](https://cloud.google.com/container-optimized-os/) 上进行了测试,并且在 1.9 版本之后会有对于 Ubuntu 的实验性代码。 [GCE 使用的 NVIDIA GPU 设备插件](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu) 并不要求使用 nvidia-docker并且对于任何实现了 Kubernetes CRI 的容器运行时,都应该能够使用。这一实现已经在 [Container-Optimized OS](https://cloud.google.com/container-optimized-os/) 上进行了测试,并且在 1.9 版本之后会有对于 Ubuntu 的实验性代码。
<!--
You can use the following commands to install the NVIDIA drivers and device plugin:
-->
你可以使用下面的命令来安装 NVIDIA 驱动以及设备插件: 你可以使用下面的命令来安装 NVIDIA 驱动以及设备插件:
``` ```
@ -209,13 +215,15 @@ kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/releas
``` ```
<!-- <!--
Report issues with this device plugin and installation method to [GoogleCloudPlatform/container-engine-accelerators](https://github.com/GoogleCloudPlatform/container-engine-accelerators). You can report issues with using or deploying this third-party device plugin by logging an issue in
[GoogleCloudPlatform/container-engine-accelerators](https://github.com/GoogleCloudPlatform/container-engine-accelerators).
Google publishes its own [instructions](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus) for using NVIDIA GPUs on GKE . Google publishes its own [instructions](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus) for using NVIDIA GPUs on GKE .
--> -->
请到 [GoogleCloudPlatform/container-engine-accelerators](https://github.com/GoogleCloudPlatform/container-engine-accelerators) 报告有关此设备插件以及安装方法的问题。 你可以通过在 [GoogleCloudPlatform/container-engine-accelerators](https://github.com/GoogleCloudPlatform/container-engine-accelerators)
中记录问题来报告使用或部署此第三方设备插件的问题。
关于如何在 GKE 上使用 NVIDIA GPUsGoogle 也提供自己的[指令](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus)。 关于如何在 GKE 上使用 NVIDIA GPUGoogle 也提供自己的[指令](https://cloud.google.com/kubernetes-engine/docs/how-to/gpus)。
<!-- <!--
## Clusters containing different types of GPUs ## Clusters containing different types of GPUs
@ -249,14 +257,14 @@ kubectl label nodes <node-with-p100> accelerator=nvidia-tesla-p100
If you're using AMD GPU devices, you can deploy If you're using AMD GPU devices, you can deploy
[Node Labeller](https://github.com/RadeonOpenCompute/k8s-device-plugin/tree/master/cmd/k8s-node-labeller). [Node Labeller](https://github.com/RadeonOpenCompute/k8s-device-plugin/tree/master/cmd/k8s-node-labeller).
Node Labeller is a {{< glossary_tooltip text="controller" term_id="controller" >}} that automatically Node Labeller is a {{< glossary_tooltip text="controller" term_id="controller" >}} that automatically
labels your nodes with GPU properties. labels your nodes with GPU device properties.
At the moment, that controller can add labels for: At the moment, that controller can add labels for:
--> -->
如果你在使用 AMD GPUs,你可以部署 如果你在使用 AMD GPU你可以部署
[Node Labeller](https://github.com/RadeonOpenCompute/k8s-device-plugin/tree/master/cmd/k8s-node-labeller) [Node Labeller](https://github.com/RadeonOpenCompute/k8s-device-plugin/tree/master/cmd/k8s-node-labeller)
它是一个 {{< glossary_tooltip text="控制器" term_id="controller" >}} 它是一个 {{< glossary_tooltip text="控制器" term_id="controller" >}}
会自动给节点打上 GPU 属性标签。目前支持的属性: 会自动给节点打上 GPU 设备属性标签。目前支持的属性:
<!-- <!--
* Device ID (-device-id) * Device ID (-device-id)
@ -307,7 +315,7 @@ kubectl describe node cluster-node-23
kubernetes.io/hostname=cluster-node-23 kubernetes.io/hostname=cluster-node-23
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock Annotations: kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0 node.alpha.kubernetes.io/ttl: 0
......
``` ```
<!-- <!--