Volume Populators Redesign Blog
For https://github.com/kubernetes/enhancements/issues/1495pull/28970/head
parent
575bd04640
commit
c0b5d85371
|
@ -0,0 +1,219 @@
|
||||||
|
---
|
||||||
|
layout: blog
|
||||||
|
title: "Kubernetes 1.22: A New Design for Volume Populators"
|
||||||
|
date: 2021-08-30
|
||||||
|
slug: volume-populators-redesigned
|
||||||
|
---
|
||||||
|
|
||||||
|
**Authors:**
|
||||||
|
Ben Swartzlander (NetApp)
|
||||||
|
|
||||||
|
Kubernetes v1.22, released earlier this month, introduced a redesigned approach for volume
|
||||||
|
populators. Originally implemented
|
||||||
|
in v1.18, the API suffered from backwards compatibility issues. Kubernetes v1.22 includes a new API
|
||||||
|
field called `dataSourceRef` that fixes these problems.
|
||||||
|
|
||||||
|
## Data sources
|
||||||
|
|
||||||
|
Earlier Kubernetes releases already added a `dataSource` field into the
|
||||||
|
[PersistentVolumeClaim](/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) API,
|
||||||
|
used for cloning volumes and creating volumes from snapshots. You could use the `dataSource` field when
|
||||||
|
creating a new PVC, referencing either an existing PVC or a VolumeSnapshot in the same namespace.
|
||||||
|
That also modified the normal provisioning process so that instead of yielding an empty volume, the
|
||||||
|
new PVC contained the same data as either the cloned PVC or the cloned VolumeSnapshot.
|
||||||
|
|
||||||
|
Volume populators embrace the same design idea, but extend it to any type of object, as long
|
||||||
|
as there exists a [custom resource](/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
|
||||||
|
to define the data source, and a populator controller to implement the logic. Initially,
|
||||||
|
the `dataSource` field was directly extended to allow arbitrary objects, if the `AnyVolumeDataSource`
|
||||||
|
feature gate was enabled on a cluster. That change unfortunately caused backwards compatibility
|
||||||
|
problems, and so the new `dataSourceRef` field was born.
|
||||||
|
|
||||||
|
In v1.22 if the `AnyVolumeDataSource` feature gate is enabled, the `dataSourceRef` field is
|
||||||
|
added, which behaves similarly to the `dataSource` field except that it allows arbitrary
|
||||||
|
objects to be specified. The API server ensures that the two fields always have the same
|
||||||
|
contents, and neither of them are mutable. The differences is that at creation time
|
||||||
|
`dataSource` allows only PVCs or VolumeSnapshots, and ignores all other values, while
|
||||||
|
`dataSourceRef` allows most types of objects, and in the few cases it doesn't allow an
|
||||||
|
object (core objects other than PVCs) a validation error occurs.
|
||||||
|
|
||||||
|
When this API change graduates to stable, we would deprecate using `dataSource` and recommend
|
||||||
|
using `dataSourceRef` field for all use cases.
|
||||||
|
In the v1.22 release, `dataSourceRef` is available (as an alpha feature) specifically for cases
|
||||||
|
where you want to use for custom volume populators.
|
||||||
|
|
||||||
|
## Using populators
|
||||||
|
|
||||||
|
Every volume populator must have one or more CRDs that it supports. Administrators may
|
||||||
|
install the CRD and the populator controller and then PVCs with a `dataSourceRef` specifies
|
||||||
|
a CR of the type that the populator supports will be handled by the populator controller
|
||||||
|
instead of the CSI driver directly.
|
||||||
|
|
||||||
|
Underneath the covers, the CSI driver is still invoked to create an empty volume, which
|
||||||
|
the populator controller fills with the appropriate data. The PVC doesn't bind to the PV
|
||||||
|
until it's fully populated, so it's safe to define a whole application manifest including
|
||||||
|
pod and PVC specs and the pods won't begin running until everything is ready, just as if
|
||||||
|
the PVC was a clone of another PVC or VolumeSnapshot.
|
||||||
|
|
||||||
|
## How it works
|
||||||
|
|
||||||
|
PVCs with data sources are still noticed by the external-provisioner sidecar for the
|
||||||
|
related storage class (assuming a CSI provisioner is used), but because the sidecar
|
||||||
|
doesn't understand the data source kind, it doesn't do anything. The populator controller
|
||||||
|
is also watching for PVCs with data sources of a kind that it understands and when it
|
||||||
|
sees one, it creates a temporary PVC of the same size, volume mode, storage class,
|
||||||
|
and even on the same topology (if topology is used) as the original PVC. The populator
|
||||||
|
controller creates a worker pod that attaches to the volume and writes the necessary
|
||||||
|
data to it, then detaches from the volume and the populator controller rebinds the PV
|
||||||
|
from the temporary PVC to the orignal PVC.
|
||||||
|
|
||||||
|
## Trying it out
|
||||||
|
|
||||||
|
The following things are required to use volume populators:
|
||||||
|
* Enable the `AnyVolumeDataSource` feature gate
|
||||||
|
* Install a CRD for the specific data source / populator
|
||||||
|
* Install the populator controller itself
|
||||||
|
|
||||||
|
Populator controllers may use the [lib-volume-populator](https://github.com/kubernetes-csi/lib-volume-populator)
|
||||||
|
library to do most of the Kubernetes API level work. Individual populators only need to
|
||||||
|
provide logic for actually writing data into the volume based on a particular CR
|
||||||
|
instance. This library provides a sample populator implementation.
|
||||||
|
|
||||||
|
These optional components improve user experience:
|
||||||
|
* Install the VolumePopulator CRD
|
||||||
|
* Create a VolumePopulator custom respource for each specific data source
|
||||||
|
* Install the [volume data source validator](https://github.com/kubernetes-csi/volume-data-source-validator)
|
||||||
|
controller (alpha)
|
||||||
|
|
||||||
|
The purpose of these components is to generate warning events on PVCs with data sources
|
||||||
|
for which there is no populator.
|
||||||
|
|
||||||
|
## Putting it all together
|
||||||
|
|
||||||
|
To see how this works, you can install the sample "hello" populator and try it
|
||||||
|
out.
|
||||||
|
|
||||||
|
First install the volume-data-source-validator controller.
|
||||||
|
|
||||||
|
```terminal
|
||||||
|
kubectl apply -f https://github.com/kubernetes-csi/volume-data-source-validator/blob/master/deploy/kubernetes/rbac-data-source-validator.yaml
|
||||||
|
kubectl apply -f https://github.com/kubernetes-csi/volume-data-source-validator/blob/master/deploy/kubernetes/setup-data-source-validator.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
Next install the example populator.
|
||||||
|
|
||||||
|
```terminal
|
||||||
|
kubectl apply -f https://github.com/kubernetes-csi/lib-volume-populator/blob/master/example/hello-populator/crd.yaml
|
||||||
|
kubectl apply -f https://github.com/kubernetes-csi/lib-volume-populator/blob/master/example/hello-populator/deploy.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
Create an instance of the `Hello` CR, with some text.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: hello.k8s.io/v1alpha1
|
||||||
|
kind: Hello
|
||||||
|
metadata:
|
||||||
|
name: example-hello
|
||||||
|
spec:
|
||||||
|
fileName: example.txt
|
||||||
|
fileContents: Hello, world!
|
||||||
|
```
|
||||||
|
|
||||||
|
Create a PVC that refers to that CR as its data source.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolumeClaim
|
||||||
|
metadata:
|
||||||
|
name: example-pvc
|
||||||
|
spec:
|
||||||
|
accessModes:
|
||||||
|
- ReadWriteOnce
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: 10Mi
|
||||||
|
dataSourceRef:
|
||||||
|
apiGroup: hello.k8s.io
|
||||||
|
kind: Hello
|
||||||
|
name: example-hello
|
||||||
|
volumeMode: Filesystem
|
||||||
|
```
|
||||||
|
|
||||||
|
Next, run a job that reads the file in the PVC.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: Job
|
||||||
|
metadata:
|
||||||
|
name: example-job
|
||||||
|
spec:
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: example-container
|
||||||
|
image: busybox:latest
|
||||||
|
command:
|
||||||
|
- cat
|
||||||
|
- /mnt/example.txt
|
||||||
|
volumeMounts:
|
||||||
|
- name: vol
|
||||||
|
mountPath: /mnt
|
||||||
|
restartPolicy: Never
|
||||||
|
volumes:
|
||||||
|
- name: vol
|
||||||
|
persistentVolumeClaim:
|
||||||
|
claimName: example-pvc
|
||||||
|
```
|
||||||
|
|
||||||
|
Wait for the job to complete (including all of its dependencies).
|
||||||
|
|
||||||
|
```terminal
|
||||||
|
kubectl wait --for=condition=Complete job/example-job
|
||||||
|
```
|
||||||
|
|
||||||
|
And last examine the log from the job.
|
||||||
|
|
||||||
|
```terminal
|
||||||
|
kubectl logs job/example-job
|
||||||
|
Hello, world!
|
||||||
|
```
|
||||||
|
|
||||||
|
Note that the volume already contained a text file with the string contents from
|
||||||
|
the CR. This is only the simplest example. Actual populators can set up the volume
|
||||||
|
to contain arbitrary contents.
|
||||||
|
|
||||||
|
## How to write your own volume populator
|
||||||
|
|
||||||
|
Developers interested in writing new poplators are encouraged to use the
|
||||||
|
[lib-volume-populator](https://github.com/kubernetes-csi/lib-volume-populator) library
|
||||||
|
and to only supply a small controller wrapper around the library, and a pod image
|
||||||
|
capable of attaching to volumes and writing the appropriate data to the volume.
|
||||||
|
|
||||||
|
Individual populators can be extremely generic such that they work with every type
|
||||||
|
of PVC, or they can do vendor specific things to rapidly fill a volume with data
|
||||||
|
if the volume was provisioned by a specific CSI driver from the same vendor, for
|
||||||
|
example, by communicating directly with the storage for that volume.
|
||||||
|
|
||||||
|
## The future
|
||||||
|
|
||||||
|
As this feature is still in alpha, we expect to update the out of tree controllers
|
||||||
|
with more tests and documentation. The community plans to eventually re-implement
|
||||||
|
the populator library as a sidecar, for ease of operations.
|
||||||
|
|
||||||
|
We hope to see some official community-supported populators for some widely-shared
|
||||||
|
use cases. Also, we expect that volume populators will be used by backup vendors
|
||||||
|
as a way to "restore" backups to volumes, and possibly a standardized API to do
|
||||||
|
this will evolve.
|
||||||
|
|
||||||
|
## How can I learn more?
|
||||||
|
|
||||||
|
The enhancement proposal,
|
||||||
|
[Volume Populators](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1495-volume-populators), includes lots of detail about the history and technical implementation
|
||||||
|
of this feature.
|
||||||
|
|
||||||
|
[Volume populators and data sources] (in the documenation topic about persistent volumes)
|
||||||
|
explains how to use this feature in your cluster.
|
||||||
|
|
||||||
|
Please get involved by joining the Kubernetes storage SIG to help us enhance this
|
||||||
|
feature. There are a lot of good ideas already and we'd be thrilled to have more!
|
||||||
|
|
Loading…
Reference in New Issue