Merge pull request #39898 from garymm/patch-1

Reference nvidia gpu feature discovery
pull/39911/head
Kubernetes Prow Robot 2023-03-10 03:58:40 -08:00 committed by GitHub
commit 703f24a719
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 2 additions and 62 deletions

View File

@ -88,65 +88,5 @@ If you're using AMD GPU devices, you can deploy
Node Labeller is a {{< glossary_tooltip text="controller" term_id="controller" >}} that automatically
labels your nodes with GPU device properties.
At the moment, that controller can add labels for:
* Device ID (-device-id)
* VRAM Size (-vram)
* Number of SIMD (-simd-count)
* Number of Compute Unit (-cu-count)
* Firmware and Feature Versions (-firmware)
* GPU Family, in two letters acronym (-family)
* SI - Southern Islands
* CI - Sea Islands
* KV - Kaveri
* VI - Volcanic Islands
* CZ - Carrizo
* AI - Arctic Islands
* RV - Raven
```shell
kubectl describe node cluster-node-23
```
```
Name: cluster-node-23
Roles: <none>
Labels: beta.amd.com/gpu.cu-count.64=1
beta.amd.com/gpu.device-id.6860=1
beta.amd.com/gpu.family.AI=1
beta.amd.com/gpu.simd-count.256=1
beta.amd.com/gpu.vram.16G=1
kubernetes.io/arch=amd64
kubernetes.io/os=linux
kubernetes.io/hostname=cluster-node-23
Annotations: node.alpha.kubernetes.io/ttl: 0
```
With the Node Labeller in use, you can specify the GPU type in the Pod spec:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
# https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
image: "registry.k8s.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
matchExpressions:
key: beta.amd.com/gpu.family.AI # Arctic Islands GPU family
operator: Exist
```
This ensures that the Pod will be scheduled to a node that has the GPU type
you specified.
Similar functionality for NVIDIA is provied by
[GPU feature discovery](https://github.com/NVIDIA/gpu-feature-discovery/blob/main/README.md).