Merge pull request #21634 from pohly/kubernetes-1-19-features
storage: CSIStorageCapacitypull/21671/head
commit
c7285443ee
|
@ -95,3 +95,7 @@ of the scheduler:
|
||||||
* Learn about [configuring multiple schedulers](/docs/tasks/administer-cluster/configure-multiple-schedulers/)
|
* Learn about [configuring multiple schedulers](/docs/tasks/administer-cluster/configure-multiple-schedulers/)
|
||||||
* Learn about [topology management policies](/docs/tasks/administer-cluster/topology-manager/)
|
* Learn about [topology management policies](/docs/tasks/administer-cluster/topology-manager/)
|
||||||
* Learn about [Pod Overhead](/docs/concepts/configuration/pod-overhead/)
|
* Learn about [Pod Overhead](/docs/concepts/configuration/pod-overhead/)
|
||||||
|
* Learn about scheduling of Pods that use volumes in:
|
||||||
|
* [Volume Topology Support](/docs/concepts/storage/storage-classes/#volume-binding-mode)
|
||||||
|
* [Storage Capacity Tracking](/docs/concepts/storage/storage-capacity/)
|
||||||
|
* [Node-specific Volume Limits](/docs/concepts/storage/storage-limits/)
|
||||||
|
|
|
@ -0,0 +1,134 @@
|
||||||
|
---
|
||||||
|
reviewers:
|
||||||
|
- jsafrane
|
||||||
|
- saad-ali
|
||||||
|
- msau42
|
||||||
|
- xing-yang
|
||||||
|
- pohly
|
||||||
|
title: Storage Capacity
|
||||||
|
content_type: concept
|
||||||
|
weight: 45
|
||||||
|
---
|
||||||
|
|
||||||
|
<!-- overview -->
|
||||||
|
|
||||||
|
Storage capacity is limited and may vary depending on the node on
|
||||||
|
which a pod runs: network-attached storage might not be accessible by
|
||||||
|
all nodes, or storage is local to a node to begin with.
|
||||||
|
|
||||||
|
{{< feature-state for_k8s_version="v1.19" state="alpha" >}}
|
||||||
|
|
||||||
|
This page describes how Kubernetes keeps track of storage capacity and
|
||||||
|
how the scheduler uses that information to schedule Pods onto nodes
|
||||||
|
that have access to enough storage capacity for the remaining missing
|
||||||
|
volumes. Without storage capacity tracking, the scheduler may choose a
|
||||||
|
node that doesn't have enough capacity to provision a volume and
|
||||||
|
multiple scheduling retries will be needed.
|
||||||
|
|
||||||
|
Tracking storage capacity is supported for {{< glossary_tooltip
|
||||||
|
text="Container Storage Interface" term_id="csi" >}} (CSI) drivers and
|
||||||
|
[needs to be enabled](#enabling-storage-capacity-tracking) when installing a CSI driver.
|
||||||
|
|
||||||
|
<!-- body -->
|
||||||
|
|
||||||
|
## API
|
||||||
|
|
||||||
|
There are two API extensions for this feature:
|
||||||
|
- [CSIStorageCapacity](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#csistoragecapacity-v1alpha1-storage-k8s-io) objects:
|
||||||
|
these get produced by a CSI driver in the namespace
|
||||||
|
where the driver is installed. Each object contains capacity
|
||||||
|
information for one storage class and defines which nodes have
|
||||||
|
access to that storage.
|
||||||
|
- [The `CSIDriverSpec.StorageCapacity` field](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#csidriverspec-v1-storage-k8s-io):
|
||||||
|
when set to `true`, the Kubernetes scheduler will consider storage
|
||||||
|
capacity for volumes that use the CSI driver.
|
||||||
|
|
||||||
|
## Scheduling
|
||||||
|
|
||||||
|
Storage capacity information is used by the Kubernetes scheduler if:
|
||||||
|
- the `CSIStorageCapacity` feature gate is true,
|
||||||
|
- a Pod uses a volume that has not been created yet,
|
||||||
|
- that volume uses a {{< glossary_tooltip text="StorageClass" term_id="storage-class" >}} which references a CSI driver and
|
||||||
|
uses `WaitForFirstConsumer` [volume binding
|
||||||
|
mode](/docs/concepts/storage/storage-classes/#volume-binding-mode),
|
||||||
|
and
|
||||||
|
- the `CSIDriver` object for the driver has `StorageCapacity` set to
|
||||||
|
true.
|
||||||
|
|
||||||
|
In that case, the scheduler only considers nodes for the Pod which
|
||||||
|
have enough storage available to them. This check is very
|
||||||
|
simplistic and only compares the size of the volume against the
|
||||||
|
capacity listed in `CSIStorageCapacity` objects with a topology that
|
||||||
|
includes the node.
|
||||||
|
|
||||||
|
For volumes with `Immediate` volume binding mode, the storage driver
|
||||||
|
decides where to create the volume, independently of Pods that will
|
||||||
|
use the volume. The scheduler then schedules Pods onto nodes where the
|
||||||
|
volume is available after the volume has been created.
|
||||||
|
|
||||||
|
For [CSI ephemeral volumes](/docs/concepts/storage/volumes/#csi),
|
||||||
|
scheduling always happens without considering storage capacity. This
|
||||||
|
is based on the assumption that this volume type is only used by
|
||||||
|
special CSI drivers which are local to a node and do not need
|
||||||
|
significant resources there.
|
||||||
|
|
||||||
|
## Rescheduling
|
||||||
|
|
||||||
|
When a node has been selected for a Pod with `WaitForFirstConsumer`
|
||||||
|
volumes, that decision is still tentative. The next step is that the
|
||||||
|
CSI storage driver gets asked to create the volume with a hint that the
|
||||||
|
volume is supposed to be available on the selected node.
|
||||||
|
|
||||||
|
Because Kubernetes might have chosen a node based on out-dated
|
||||||
|
capacity information, it is possible that the volume cannot really be
|
||||||
|
created. The node selection is then reset and the Kubernetes scheduler
|
||||||
|
tries again to find a node for the Pod.
|
||||||
|
|
||||||
|
## Limitations
|
||||||
|
|
||||||
|
Storage capacity tracking increases the chance that scheduling works
|
||||||
|
on the first try, but cannot guarantee this because the scheduler has
|
||||||
|
to decide based on potentially out-dated information. Usually, the
|
||||||
|
same retry mechanism as for scheduling without any storage capacity
|
||||||
|
information handles scheduling failures.
|
||||||
|
|
||||||
|
One situation where scheduling can fail permanently is when a Pod uses
|
||||||
|
multiple volumes: one volume might have been created already in a
|
||||||
|
topology segment which then does not have enough capacity left for
|
||||||
|
another volume. Manual intervention is necessary to recover from this,
|
||||||
|
for example by increasing capacity or deleting the volume that was
|
||||||
|
already created. [Further
|
||||||
|
work](https://github.com/kubernetes/enhancements/pull/1703) is needed
|
||||||
|
to handle this automatically.
|
||||||
|
|
||||||
|
## Enabling storage capacity tracking
|
||||||
|
|
||||||
|
Storage capacity tracking is an *alpha feature* and only enabled when
|
||||||
|
the `CSIStorageCapacity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled. A quick check
|
||||||
|
whether a Kubernetes cluster supports the feature is to list
|
||||||
|
CSIStorageCapacity objects with:
|
||||||
|
```shell
|
||||||
|
kubectl get csistoragecapacities --all-namespaces
|
||||||
|
```
|
||||||
|
|
||||||
|
If your cluster supports CSIStorageCapacity, the response is either a list of CSIStorageCapacity objects or:
|
||||||
|
```
|
||||||
|
No resources found
|
||||||
|
```
|
||||||
|
|
||||||
|
If not supported, this error is printed instead:
|
||||||
|
```
|
||||||
|
error: the server doesn't have a resource type "csistoragecapacities"
|
||||||
|
```
|
||||||
|
|
||||||
|
In addition to enabling the feature in the cluster, a CSI
|
||||||
|
driver also has to
|
||||||
|
support it. Please refer to the driver's documentation for
|
||||||
|
details.
|
||||||
|
|
||||||
|
## {{% heading "whatsnext" %}}
|
||||||
|
|
||||||
|
- For more information on the design, see the
|
||||||
|
[Storage Capacity Constraints for Pod Scheduling KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1472-storage-capacity-tracking/README.md).
|
||||||
|
- For more information on further development of this feature, see the [enhancement tracking issue #1472](https://github.com/kubernetes/enhancements/issues/1472).
|
||||||
|
- Learn about [Kubernetes Scheduler](/docs/concepts/scheduling-eviction/kube-scheduler/)
|
|
@ -78,6 +78,7 @@ different Kubernetes components.
|
||||||
| `CSIMigrationOpenStackComplete` | `false` | Alpha | 1.17 | |
|
| `CSIMigrationOpenStackComplete` | `false` | Alpha | 1.17 | |
|
||||||
| `CSIMigrationvSphere` | `false` | Beta | 1.19 | |
|
| `CSIMigrationvSphere` | `false` | Beta | 1.19 | |
|
||||||
| `CSIMigrationvSphereComplete` | `false` | Beta | 1.19 | |
|
| `CSIMigrationvSphereComplete` | `false` | Beta | 1.19 | |
|
||||||
|
| `CSIStorageCapacity` | `false` | Alpha | 1.19 | |
|
||||||
| `ConfigurableFSGroupPolicy` | `false` | Alpha | 1.18 | |
|
| `ConfigurableFSGroupPolicy` | `false` | Alpha | 1.18 | |
|
||||||
| `CustomCPUCFSQuotaPeriod` | `false` | Alpha | 1.12 | |
|
| `CustomCPUCFSQuotaPeriod` | `false` | Alpha | 1.12 | |
|
||||||
| `CustomResourceDefaulting` | `false` | Alpha| 1.15 | 1.15 |
|
| `CustomResourceDefaulting` | `false` | Alpha| 1.15 | 1.15 |
|
||||||
|
@ -398,6 +399,7 @@ Each feature gate is designed for enabling/disabling a specific feature:
|
||||||
- `CSIPersistentVolume`: Enable discovering and mounting volumes provisioned through a
|
- `CSIPersistentVolume`: Enable discovering and mounting volumes provisioned through a
|
||||||
[CSI (Container Storage Interface)](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md)
|
[CSI (Container Storage Interface)](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md)
|
||||||
compatible volume plugin.
|
compatible volume plugin.
|
||||||
|
- `CSIStorageCapacity`: Enables CSI drivers to publish storage capacity information and the Kubernetes scheduler to use that information when scheduling pods. See [Storage Capacity](/docs/concepts/storage/storage-capacity/).
|
||||||
Check the [`csi` volume type](/docs/concepts/storage/volumes/#csi) documentation for more details.
|
Check the [`csi` volume type](/docs/concepts/storage/volumes/#csi) documentation for more details.
|
||||||
- `CustomCPUCFSQuotaPeriod`: Enable nodes to change CPUCFSQuotaPeriod.
|
- `CustomCPUCFSQuotaPeriod`: Enable nodes to change CPUCFSQuotaPeriod.
|
||||||
- `CustomPodDNS`: Enable customizing the DNS settings for a Pod using its `dnsConfig` property.
|
- `CustomPodDNS`: Enable customizing the DNS settings for a Pod using its `dnsConfig` property.
|
||||||
|
|
Loading…
Reference in New Issue