diff --git a/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md b/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md index 9f72593f62..0f06380750 100644 --- a/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md +++ b/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md @@ -258,7 +258,7 @@ real time changes of the state of the device. When the feature is disabled, that field automatically gets cleared when storing the ResourceClaim. A ResourceClaim device status is supported when it is possible, from a DRA driver, to update an -existing ResourceClaim where the `status.devices` field is set. +existing ResourceClaim where the `status.devices` field is set. ## Prioritized List @@ -304,6 +304,59 @@ spec: count: 2 ``` +## Partitionable Devices + +{{< feature-state feature_gate_name="DRAPartitionableDevices" >}} + +Devices represented in DRA don't necessarily have to be a single unit connected to a single machine, +but can also be a logical device comprised of multiple devices connected to multiple machines. These +devices might consume overlapping resources of the underlying phyical devices, meaning that when one +logical device is allocated other devices will no longer be available. + +In the ResourceSlice API, this is represented as a list of named CounterSets, each of which +contains a set of named counters. The counters represent the resources available on the physical +device that are used by the logical devices advertised through DRA. + +Logical devices can specify the ConsumesCounters list. Each entry contains a reference to a CounterSet +and a set of named counters with the amounts they will consume. So for a device to be allocatable, +the referenced counter sets must have sufficient quantity for the counters referenced by the device. + +Here is an example of two devices, each consuming 6Gi of memory from the a shared counter with +8Gi of memory. Thus, only one of the devices can be allocated at any point in time. The scheduler +handles this and it is transparent to the consumer as the ResourceClaim API is not affected. + +```yaml +kind: ResourceSlice +apiVersion: resource.k8s.io/v1beta1 +metadata: + name: resourceslice +spec: + nodeName: worker-1 + pool: + name: pool + generation: 1 + resourceSliceCount: 1 + driver: dra.example.com + sharedCounters: + - name: gpu-1-counters + counters: + memory: + value: 8Gi + devices: + - name: device-1 + consumesCounters: + - counterSet: gpu-1-counters + counters: + memory: + value: 6Gi + - name: device-2 + consumesCounters: + - counterSet: gpu-1-counters + counters: + memory: + value: 6Gi +``` + ## Enabling dynamic resource allocation Dynamic resource allocation is a *beta feature* which is off by default and only enabled when the @@ -366,6 +419,13 @@ is enabled in the kube-apiserver and kube-scheduler. It also requires that the `DynamicResourceAllocation` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled. +### Enabling Partitionable Devices + +[Partitionable Devices](#partitionable-devices) is an *alpha feature* +and only enabled when the `DRAPartitionableDevices` +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) +is enabled in the kube-apiserver and kube-scheduler. + ## {{% heading "whatsnext" %}} - For more information on the design, see the diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/DRAPartitionableDevices.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/DRAPartitionableDevices.md new file mode 100644 index 0000000000..7da6697e30 --- /dev/null +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/DRAPartitionableDevices.md @@ -0,0 +1,17 @@ +--- +title: DRAPartitionableDevices +content_type: feature_gate +_build: + list: never + render: false + +stages: + - stage: alpha + defaultValue: false + fromVersion: "1.33" +--- +Enables support for requesting [Partitionable Devices](/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#partitionable-devices) +for DRA. This lets drivers advertise multiple devices that maps to the same resources +of a physical device. + +This feature gate has no effect unless you also enable the `DynamicResourceAllocation` feature gate.