Merge pull request #39898 from garymm/patch-1
Reference nvidia gpu feature discoverypull/39911/head
commit
703f24a719
|
@ -88,65 +88,5 @@ If you're using AMD GPU devices, you can deploy
|
|||
Node Labeller is a {{< glossary_tooltip text="controller" term_id="controller" >}} that automatically
|
||||
labels your nodes with GPU device properties.
|
||||
|
||||
At the moment, that controller can add labels for:
|
||||
|
||||
* Device ID (-device-id)
|
||||
* VRAM Size (-vram)
|
||||
* Number of SIMD (-simd-count)
|
||||
* Number of Compute Unit (-cu-count)
|
||||
* Firmware and Feature Versions (-firmware)
|
||||
* GPU Family, in two letters acronym (-family)
|
||||
* SI - Southern Islands
|
||||
* CI - Sea Islands
|
||||
* KV - Kaveri
|
||||
* VI - Volcanic Islands
|
||||
* CZ - Carrizo
|
||||
* AI - Arctic Islands
|
||||
* RV - Raven
|
||||
|
||||
```shell
|
||||
kubectl describe node cluster-node-23
|
||||
```
|
||||
|
||||
```
|
||||
Name: cluster-node-23
|
||||
Roles: <none>
|
||||
Labels: beta.amd.com/gpu.cu-count.64=1
|
||||
beta.amd.com/gpu.device-id.6860=1
|
||||
beta.amd.com/gpu.family.AI=1
|
||||
beta.amd.com/gpu.simd-count.256=1
|
||||
beta.amd.com/gpu.vram.16G=1
|
||||
kubernetes.io/arch=amd64
|
||||
kubernetes.io/os=linux
|
||||
kubernetes.io/hostname=cluster-node-23
|
||||
Annotations: node.alpha.kubernetes.io/ttl: 0
|
||||
…
|
||||
```
|
||||
|
||||
With the Node Labeller in use, you can specify the GPU type in the Pod spec:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: cuda-vector-add
|
||||
spec:
|
||||
restartPolicy: OnFailure
|
||||
containers:
|
||||
- name: cuda-vector-add
|
||||
# https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile
|
||||
image: "registry.k8s.io/cuda-vector-add:v0.1"
|
||||
resources:
|
||||
limits:
|
||||
nvidia.com/gpu: 1
|
||||
affinity:
|
||||
nodeAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
nodeSelectorTerms:
|
||||
– matchExpressions:
|
||||
– key: beta.amd.com/gpu.family.AI # Arctic Islands GPU family
|
||||
operator: Exist
|
||||
```
|
||||
|
||||
This ensures that the Pod will be scheduled to a node that has the GPU type
|
||||
you specified.
|
||||
Similar functionality for NVIDIA is provied by
|
||||
[GPU feature discovery](https://github.com/NVIDIA/gpu-feature-discovery/blob/main/README.md).
|
||||
|
|
Loading…
Reference in New Issue