From d0eca44a5123a6c35140aef275ea0fa7799a4f4d Mon Sep 17 00:00:00 2001 From: Lionel Jouin Date: Tue, 12 Nov 2024 13:53:57 -0700 Subject: [PATCH 1/2] [KEP-4817] DRAResourceClaimDeviceStatus documentation Signed-off-by: Lionel Jouin --- .../dynamic-resource-allocation.md | 19 ++++++++++++++++++- .../dra-resource-claim-device-status.md | 14 ++++++++++++++ 2 files changed, 32 insertions(+), 1 deletion(-) create mode 100644 content/en/docs/reference/command-line-tools-reference/feature-gates/dra-resource-claim-device-status.md diff --git a/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md b/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md index 30b940c7fb..2a54a88ef9 100644 --- a/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md +++ b/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md @@ -11,6 +11,10 @@ weight: 65 {{< feature-state feature_gate_name="DynamicResourceAllocation" >}} +Dynamic Resource Allocation with ResourceClaim device status: + +{{< feature-state feature_gate_name="DRAResourceClaimDeviceStatus" >}} + Dynamic resource allocation is an API for requesting and sharing resources between pods and containers inside a pod. It is a generalization of the persistent volumes API for generic resources. Typically those resources @@ -47,7 +51,11 @@ ResourceClaim for use by workloads. For example, if a workload needs an accelerator device with specific properties, this is how that request is expressed. The status stanza tracks whether this claim has been satisfied and what specific - resources have been allocated. + resources have been allocated. When the `DRAResourceClaimDeviceStatus` + [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is + enabled, drivers can report driver-specific device status data for each allocated + device in a resource claim. For example, IPs assigned to a network interface device + can be reported in the ResourceClaim status. ResourceClaimTemplate : Defines the spec and some metadata for creating @@ -209,6 +217,10 @@ are enabled. For details on that, see the `--feature-gates` and `--runtime-confi [kube-apiserver parameters](/docs/reference/command-line-tools-reference/kube-apiserver/). kube-scheduler, kube-controller-manager and kubelet also need the feature gate. +When a resource driver reports the status of the devices, then the +`DRAResourceClaimDeviceStatus` feature gate has to be enabled in addition to +`DynamicResourceAllocation`. + A quick check whether a Kubernetes cluster supports the feature is to list DeviceClass objects with: @@ -229,6 +241,11 @@ If not supported, this error is printed instead: error: the server doesn't have a resource type "deviceclasses" ``` +A ResourceClaim device status is supported when it is possible, from a DRA driver, +to update an existing ResourceClaim where the `status.devices` field is set. When the +`DRAResourceClaimDeviceStatus` feature is disabled, that field automatically +gets cleared when storing the ResourceClaim. + The default configuration of kube-scheduler enables the "DynamicResources" plugin if and only if the feature gate is enabled and when using the v1 configuration API. Custom configurations may have to be modified to diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates/dra-resource-claim-device-status.md b/content/en/docs/reference/command-line-tools-reference/feature-gates/dra-resource-claim-device-status.md new file mode 100644 index 0000000000..ac33e12a3a --- /dev/null +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates/dra-resource-claim-device-status.md @@ -0,0 +1,14 @@ +--- +title: DRAResourceClaimDeviceStatus +content_type: feature_gate +_build: + list: never + render: false + +stages: + - stage: alpha + defaultValue: false + fromVersion: "1.32" +--- +Enables support the ResourceClaim.status.devices field and for setting this +status from DRA drivers. \ No newline at end of file From 816efb776e0b280293367d9890b97efdb4aa11de Mon Sep 17 00:00:00 2001 From: Lionel Jouin Date: Mon, 25 Nov 2024 23:12:50 -0500 Subject: [PATCH 2/2] [KEP-4817] Dedicated sub-section Signed-off-by: Lionel Jouin --- .../dynamic-resource-allocation.md | 39 ++++++++++++------- 1 file changed, 25 insertions(+), 14 deletions(-) diff --git a/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md b/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md index 2a54a88ef9..08df169045 100644 --- a/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md +++ b/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md @@ -11,10 +11,6 @@ weight: 65 {{< feature-state feature_gate_name="DynamicResourceAllocation" >}} -Dynamic Resource Allocation with ResourceClaim device status: - -{{< feature-state feature_gate_name="DRAResourceClaimDeviceStatus" >}} - Dynamic resource allocation is an API for requesting and sharing resources between pods and containers inside a pod. It is a generalization of the persistent volumes API for generic resources. Typically those resources @@ -51,11 +47,7 @@ ResourceClaim for use by workloads. For example, if a workload needs an accelerator device with specific properties, this is how that request is expressed. The status stanza tracks whether this claim has been satisfied and what specific - resources have been allocated. When the `DRAResourceClaimDeviceStatus` - [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is - enabled, drivers can report driver-specific device status data for each allocated - device in a resource claim. For example, IPs assigned to a network interface device - can be reported in the ResourceClaim status. + resources have been allocated. ResourceClaimTemplate : Defines the spec and some metadata for creating @@ -208,6 +200,23 @@ spec: You may also be able to mutate the incoming Pod, at admission time, to unset the `.spec.nodeName` field and to use a node selector instead. +## ResourceClaim Device Status + +{{< feature-state feature_gate_name="DRAResourceClaimDeviceStatus" >}} + +The drivers can report driver-specific device status data for each allocated device +in a resource claim. For example, IPs assigned to a network interface device can be +reported in the ResourceClaim status. + +The drivers setting the status, the accuracy of the information depends on the implementation +of those DRA Drivers. Therefore, the reported status of the device may not always reflect the +real time changes of the state of the device. + +When the feature is disabled, that field automatically gets cleared when storing the ResourceClaim. + +A ResourceClaim device status is supported when it is possible, from a DRA driver, to update an +existing ResourceClaim where the `status.devices` field is set. + ## Enabling dynamic resource allocation Dynamic resource allocation is an *alpha feature* and only enabled when the @@ -241,11 +250,6 @@ If not supported, this error is printed instead: error: the server doesn't have a resource type "deviceclasses" ``` -A ResourceClaim device status is supported when it is possible, from a DRA driver, -to update an existing ResourceClaim where the `status.devices` field is set. When the -`DRAResourceClaimDeviceStatus` feature is disabled, that field automatically -gets cleared when storing the ResourceClaim. - The default configuration of kube-scheduler enables the "DynamicResources" plugin if and only if the feature gate is enabled and when using the v1 configuration API. Custom configurations may have to be modified to @@ -254,6 +258,13 @@ include it. In addition to enabling the feature in the cluster, a resource driver also has to be installed. Please refer to the driver's documentation for details. +### Enabling Device Status + +[ResourceClaim Device Status](#resourceclaim-device-status) is an *alpha feature* +and only enabled when the `DRAResourceClaimDeviceStatus` +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) +is enabled in the kube-apiserver. + ## {{% heading "whatsnext" %}} - For more information on the design, see the