Documentation for the DRA Partitionable Devices feature

pull/49868/head
Morten Torkildsen 2025-02-23 16:21:03 +00:00
parent 23f39cdd4b
commit 490f3bb2b1
2 changed files with 78 additions and 1 deletions

View File

@ -258,7 +258,7 @@ real time changes of the state of the device.
When the feature is disabled, that field automatically gets cleared when storing the ResourceClaim.
A ResourceClaim device status is supported when it is possible, from a DRA driver, to update an
existing ResourceClaim where the `status.devices` field is set.
existing ResourceClaim where the `status.devices` field is set.
## Prioritized List
@ -304,6 +304,59 @@ spec:
count: 2
```
## Partitionable Devices
{{< feature-state feature_gate_name="DRAPartitionableDevices" >}}
Devices represented in DRA don't necessarily have to be a single unit connected to a single machine,
but can also be a logical device comprised of multiple devices connected to multiple machines. These
devices might consume overlapping resources of the underlying phyical devices, meaning that when one
logical device is allocated other devices will no longer be available.
In the ResourceSlice API, this is represented as a list of named CounterSets, each of which
contains a set of named counters. The counters represent the resources available on the physical
device that are used by the logical devices advertised through DRA.
Logical devices can specify the ConsumesCounters list. Each entry contains a reference to a CounterSet
and a set of named counters with the amounts they will consume. So for a device to be allocatable,
the referenced counter sets must have sufficient quantity for the counters referenced by the device.
Here is an example of two devices, each consuming 6Gi of memory from the a shared counter with
8Gi of memory. Thus, only one of the devices can be allocated at any point in time. The scheduler
handles this and it is transparent to the consumer as the ResourceClaim API is not affected.
```yaml
kind: ResourceSlice
apiVersion: resource.k8s.io/v1beta1
metadata:
name: resourceslice
spec:
nodeName: worker-1
pool:
name: pool
generation: 1
resourceSliceCount: 1
driver: dra.example.com
sharedCounters:
- name: gpu-1-counters
counters:
memory:
value: 8Gi
devices:
- name: device-1
consumesCounters:
- counterSet: gpu-1-counters
counters:
memory:
value: 6Gi
- name: device-2
consumesCounters:
- counterSet: gpu-1-counters
counters:
memory:
value: 6Gi
```
## Enabling dynamic resource allocation
Dynamic resource allocation is a *beta feature* which is off by default and only enabled when the
@ -366,6 +419,13 @@ is enabled in the kube-apiserver and kube-scheduler. It also requires that the
`DynamicResourceAllocation` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
is enabled.
### Enabling Partitionable Devices
[Partitionable Devices](#partitionable-devices) is an *alpha feature*
and only enabled when the `DRAPartitionableDevices`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
is enabled in the kube-apiserver and kube-scheduler.
## {{% heading "whatsnext" %}}
- For more information on the design, see the

View File

@ -0,0 +1,17 @@
---
title: DRAPartitionableDevices
content_type: feature_gate
_build:
list: never
render: false
stages:
- stage: alpha
defaultValue: false
fromVersion: "1.33"
---
Enables support for requesting [Partitionable Devices](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#partitionable-devices)
for DRA. This lets drivers advertise multiple devices that maps to the same resources
of a physical device.
This feature gate has no effect unless you also enable the `DynamicResourceAllocation` feature gate.