Merge pull request #37766 from pohly/dynamic-resource-allocation-concepts

dynamic resource allocation concepts
2022-11-28 04:44:07 -08:00 · 2022-11-28 04:44:07 -08:00 · be09333a58
parent b53d942c41 197de4f5ba
commit be09333a58
3 changed files with 217 additions and 0 deletions
--- a/content/en/docs/concepts/scheduling-eviction/_index.md
+++ b/content/en/docs/concepts/scheduling-eviction/_index.md
@ -26,6 +26,7 @@ of terminating one or more Pods on Nodes.
 * [Pod Topology Spread Constraints](/docs/concepts/scheduling-eviction/topology-spread-constraints/)
 * [Taints and Tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/)
 * [Scheduling Framework](/docs/concepts/scheduling-eviction/scheduling-framework)
 * [Dynamic Resource Allocation](/docs/concepts/scheduling-eviction/dynamic-resource-allocation)
 * [Scheduler Performance Tuning](/docs/concepts/scheduling-eviction/scheduler-perf-tuning/)
 * [Resource Bin Packing for Extended Resources](/docs/concepts/scheduling-eviction/resource-bin-packing/)
 * [Pod Scheduling Readiness](/docs/concepts/scheduling-eviction/pod-scheduling-readiness/)
--- a/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md
+++ b/content/en/docs/concepts/scheduling-eviction/dynamic-resource-allocation.md
@ -0,0 +1,215 @@
 ---
 reviewers:
 - klueska
 - pohly
 title: Dynamic Resource Allocation
 content_type: concept
 weight: 65
 ---
 <!-- overview -->
 {{< feature-state for_k8s_version="v1.26" state="alpha" >}}
 Dynamic resource allocation is a new API for requesting and sharing resources
 between pods and containers inside a pod. It is a generalization of the
 persistent volumes API for generic resources. Third-party resource drivers are
 responsible for tracking and allocating resources. Different kinds of
 resources support arbitrary parameters for defining requirements and
 initialization.
 ## {{% heading "prerequisites" %}}
 Kubernetes v{{< skew currentVersion >}} includes cluster-level API support for
 dynamic resource allocation, but it [needs to be
 enabled](#enabling-dynamic-resource-allocation) explicitly.  You also must
 install a resource driver for specific resources that are meant to be managed
 using this API.  If you are not running Kubernetes v{{< skew currentVersion>}},
 check the documentation for that version of Kubernetes.
 <!-- body -->
 ## API
 The new `resource.k8s.io/v1alpha1` {{< glossary_tooltip text="API group"
 term_id="api-group" >}} provides four new types:
 ResourceClass
 : Defines which resource driver handles a certain kind of
  resource and provides common parameters for it. ResourceClasses
  are created by a cluster administrator when installing a resource
  driver.
 ResourceClaim
 : Defines a particular resource instances that is required by a
  workload. Created by a user (lifecycle managed manually, can be shared
  between different Pods) or for individual Pods by the control plane based on
  a ResourceClaimTemplate (automatic lifecycle, typically used by just one
  Pod).
 ResourceClaimTemplate
 : Defines the spec and some meta data for creating
  ResourceClaims. Created by a user when deploying a workload.
 PodScheduling
 : Used internally by the control plane and resource drivers
  to coordinate pod scheduling when ResourceClaims need to be allocated
  for a Pod.
 Parameters for ResourceClass and ResourceClaim are stored in separate objects,
 typically using the type defined by a {{< glossary_tooltip
 term_id="CustomResourceDefinition" text="CRD" >}} that was created when
 installing a resource driver.
 The `core/v1` `PodSpec` defines ResourceClaims that are needed for a Pod in a new
 `resourceClaims` field. Entries in that list reference either a ResourceClaim
 or a ResourceClaimTemplate. When referencing a ResourceClaim, all Pods using
 this PodSpec (for example, inside a Deployment or StatefulSet) share the same
 ResourceClaim instance. When referencing a ResourceClaimTemplate, each Pod gets
 its own instance.
 The `resources.claims` list for container resources defines whether a container gets
 access to these resource instances, which makes it possible to share resources
 between one or more containers.
 Here is an example for a fictional resource driver. Two ResourceClaim objects
 will get created for this Pod and each container gets access to one of them.
 ```yaml
 apiVersion: resource.k8s.io/v1alpha1
 kind: ResourceClass
 name: resource.example.com
 driverName: resource-driver.example.com
 ---
 apiVersion: cats.resource.example.com/v1
 kind: ClaimParameters
 name: large-black-cat-claim-parameters
 spec:
  color: black
  size: large
 ---
 apiVersion: resource.k8s.io/v1alpha1
 kind: ResourceClaimTemplate
 metadata:
  name: large-black-cat-claim-template
 spec:
  spec:
    resourceClassName: resource.example.com
    parametersRef:
      apiGroup: cats.resource.example.com
      kind: ClaimParameters
      name: large-black-cat-claim-parameters
 –--
 apiVersion: v1
 kind: Pod
 metadata:
  name: pod-with-cats
 spec:
  containers:
  - name: container0
    image: ubuntu:20.04
    command: ["sleep", "9999"]
    resources:
      claims:
      - name: cat-0
  - name: container1
    image: ubuntu:20.04
    command: ["sleep", "9999"]
    resources:
      claims:
      - name: cat-1
  resourceClaims:
  - name: cat-0
    source:
      resourceClaimTemplateName: large-black-cat-claim-template
  - name: cat-1
    source:
      resourceClaimTemplateName: large-black-cat-claim-template
 ```
 ## Scheduling
 In contrast to native resources (CPU, RAM) and extended resources (managed by a
 device plugin, advertised by kubelet), the scheduler has no knowledge of what
 dynamic resources are available in a cluster or how they could be split up to
 satisfy the requirements of a specific ResourceClaim. Resource drivers are
 responsible for that. They mark ResourceClaims as "allocated" once resources
 for it are reserved. This also then tells the scheduler where in the cluster a
 ResourceClaim is available.
 ResourceClaims can get allocated as soon as they are created ("immediate
 allocation"), without considering which Pods will use them. The default is to
 delay allocation until a Pod gets scheduled which needs the ResourceClaim
 (i.e. "wait for first consumer").
 In that mode, the scheduler checks all ResourceClaims needed by a Pod and
 creates a PodScheduling object where it informs the resource drivers
 responsible for those ResourceClaims about nodes that the scheduler considers
 suitable for the Pod. The resource drivers respond by excluding nodes that
 don't have enough of the driver's resources left. Once the scheduler has that
 information, it selects one node and stores that choice in the PodScheduling
 object. The resource drivers then allocate their ResourceClaims so that the
 resources will be available on that node. Once that is complete, the Pod
 gets scheduled.
 As part of this process, ResourceClaims also get reserved for the
 Pod. Currently ResourceClaims can either be used exclusively by a single Pod or
 an unlimited number of Pods.
 One key feature is that Pods do not get scheduled to a node unless all of
 their resources are allocated and reserved. This avoids the scenario where a Pod
 gets scheduled onto one node and then cannot run there, which is bad because
 such a pending Pod also blocks all other resources like RAM or CPU that were
 set aside for it.
 ## Limitations
 The scheduler plugin must be involved in scheduling Pods which use
 ResourceClaims. Bypassing the scheduler by setting the `nodeName` field leads
 to Pods that the kubelet refuses to start because the ResourceClaims are not
 reserved or not even allocated. It may be possible to [remove this
 limitation](https://github.com/kubernetes/kubernetes/issues/114005) in the
 future.
 ## Enabling dynamic resource allocation
 Dynamic resource allocation is an *alpha feature* and only enabled when the
 `DynamicResourceAllocation` [feature
 gate](/docs/reference/command-line-tools-reference/feature-gates/) and the
 `resource.k8s.io/v1alpha1` {{< glossary_tooltip text="API group"
 term_id="api-group" >}} are enabled. For details on that, see the
 `--feature-gates` and `--runtime-config` [kube-apiserver
 parameters](/docs/reference/command-line-tools-reference/kube-apiserver/).
 kube-scheduler, kube-controller-manager and kubelet also need the feature gate.
 A quick check whether a Kubernetes cluster supports the feature is to list
 ResourceClass objects with:
 ```shell
 kubectl get resourceclasses
 ```
 If your cluster supports dynamic resource allocation, the response is either a
 list of ResourceClass objects or:
 ```
 No resources found
 ```
 If not supported, this error is printed instead:
 ```
 error: the server doesn't have a resource type "resourceclasses"
 ```
 The default configuration of kube-scheduler enables the "DynamicResources"
 plugin if and only if the feature gate is enabled. Custom configurations may
 have to be modified to include it.
 In addition to enabling the feature in the cluster, a resource driver also has to
 be installed. Please refer to the driver's documentation for details.
 ## {{% heading "whatsnext" %}}
 - For more information on the design, see the
 [Dynamic Resource Allocation KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3063-dynamic-resource-allocation/README.md).
--- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md
+++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md
@ -88,6 +88,7 @@ For a reference to old feature gates that are removed, please refer to
 | `DownwardAPIHugePages` | `false` | Alpha | 1.20 | 1.20 |
 | `DownwardAPIHugePages` | `false` | Beta | 1.21 | 1.21 |
 | `DownwardAPIHugePages` | `true` | Beta | 1.22 | |
 | `DynamicResourceAllocation` | `false` | Alpha | 1.26 | |
 | `EndpointSliceTerminatingCondition` | `false` | Alpha | 1.20 | 1.21 |
 | `EndpointSliceTerminatingCondition` | `true` | Beta | 1.22 | |
 | `ExpandedDNSConfig` | `false` | Alpha | 1.22 | |