Volume Populators Redesign Blog

For https://github.com/kubernetes/enhancements/issues/1495
pull/28970/head
Ben Swartzlander 2021-07-15 21:57:05 -06:00
parent 575bd04640
commit c0b5d85371
1 changed files with 219 additions and 0 deletions

View File

@ -0,0 +1,219 @@
---
layout: blog
title: "Kubernetes 1.22: A New Design for Volume Populators"
date: 2021-08-30
slug: volume-populators-redesigned
---
**Authors:**
Ben Swartzlander (NetApp)
Kubernetes v1.22, released earlier this month, introduced a redesigned approach for volume
populators. Originally implemented
in v1.18, the API suffered from backwards compatibility issues. Kubernetes v1.22 includes a new API
field called `dataSourceRef` that fixes these problems.
## Data sources
Earlier Kubernetes releases already added a `dataSource` field into the
[PersistentVolumeClaim](/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) API,
used for cloning volumes and creating volumes from snapshots. You could use the `dataSource` field when
creating a new PVC, referencing either an existing PVC or a VolumeSnapshot in the same namespace.
That also modified the normal provisioning process so that instead of yielding an empty volume, the
new PVC contained the same data as either the cloned PVC or the cloned VolumeSnapshot.
Volume populators embrace the same design idea, but extend it to any type of object, as long
as there exists a [custom resource](/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
to define the data source, and a populator controller to implement the logic. Initially,
the `dataSource` field was directly extended to allow arbitrary objects, if the `AnyVolumeDataSource`
feature gate was enabled on a cluster. That change unfortunately caused backwards compatibility
problems, and so the new `dataSourceRef` field was born.
In v1.22 if the `AnyVolumeDataSource` feature gate is enabled, the `dataSourceRef` field is
added, which behaves similarly to the `dataSource` field except that it allows arbitrary
objects to be specified. The API server ensures that the two fields always have the same
contents, and neither of them are mutable. The differences is that at creation time
`dataSource` allows only PVCs or VolumeSnapshots, and ignores all other values, while
`dataSourceRef` allows most types of objects, and in the few cases it doesn't allow an
object (core objects other than PVCs) a validation error occurs.
When this API change graduates to stable, we would deprecate using `dataSource` and recommend
using `dataSourceRef` field for all use cases.
In the v1.22 release, `dataSourceRef` is available (as an alpha feature) specifically for cases
where you want to use for custom volume populators.
## Using populators
Every volume populator must have one or more CRDs that it supports. Administrators may
install the CRD and the populator controller and then PVCs with a `dataSourceRef` specifies
a CR of the type that the populator supports will be handled by the populator controller
instead of the CSI driver directly.
Underneath the covers, the CSI driver is still invoked to create an empty volume, which
the populator controller fills with the appropriate data. The PVC doesn't bind to the PV
until it's fully populated, so it's safe to define a whole application manifest including
pod and PVC specs and the pods won't begin running until everything is ready, just as if
the PVC was a clone of another PVC or VolumeSnapshot.
## How it works
PVCs with data sources are still noticed by the external-provisioner sidecar for the
related storage class (assuming a CSI provisioner is used), but because the sidecar
doesn't understand the data source kind, it doesn't do anything. The populator controller
is also watching for PVCs with data sources of a kind that it understands and when it
sees one, it creates a temporary PVC of the same size, volume mode, storage class,
and even on the same topology (if topology is used) as the original PVC. The populator
controller creates a worker pod that attaches to the volume and writes the necessary
data to it, then detaches from the volume and the populator controller rebinds the PV
from the temporary PVC to the orignal PVC.
## Trying it out
The following things are required to use volume populators:
* Enable the `AnyVolumeDataSource` feature gate
* Install a CRD for the specific data source / populator
* Install the populator controller itself
Populator controllers may use the [lib-volume-populator](https://github.com/kubernetes-csi/lib-volume-populator)
library to do most of the Kubernetes API level work. Individual populators only need to
provide logic for actually writing data into the volume based on a particular CR
instance. This library provides a sample populator implementation.
These optional components improve user experience:
* Install the VolumePopulator CRD
* Create a VolumePopulator custom respource for each specific data source
* Install the [volume data source validator](https://github.com/kubernetes-csi/volume-data-source-validator)
controller (alpha)
The purpose of these components is to generate warning events on PVCs with data sources
for which there is no populator.
## Putting it all together
To see how this works, you can install the sample "hello" populator and try it
out.
First install the volume-data-source-validator controller.
```terminal
kubectl apply -f https://github.com/kubernetes-csi/volume-data-source-validator/blob/master/deploy/kubernetes/rbac-data-source-validator.yaml
kubectl apply -f https://github.com/kubernetes-csi/volume-data-source-validator/blob/master/deploy/kubernetes/setup-data-source-validator.yaml
```
Next install the example populator.
```terminal
kubectl apply -f https://github.com/kubernetes-csi/lib-volume-populator/blob/master/example/hello-populator/crd.yaml
kubectl apply -f https://github.com/kubernetes-csi/lib-volume-populator/blob/master/example/hello-populator/deploy.yaml
```
Create an instance of the `Hello` CR, with some text.
```yaml
apiVersion: hello.k8s.io/v1alpha1
kind: Hello
metadata:
name: example-hello
spec:
fileName: example.txt
fileContents: Hello, world!
```
Create a PVC that refers to that CR as its data source.
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: example-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Mi
dataSourceRef:
apiGroup: hello.k8s.io
kind: Hello
name: example-hello
volumeMode: Filesystem
```
Next, run a job that reads the file in the PVC.
```yaml
apiVersion: batch/v1
kind: Job
metadata:
name: example-job
spec:
template:
spec:
containers:
- name: example-container
image: busybox:latest
command:
- cat
- /mnt/example.txt
volumeMounts:
- name: vol
mountPath: /mnt
restartPolicy: Never
volumes:
- name: vol
persistentVolumeClaim:
claimName: example-pvc
```
Wait for the job to complete (including all of its dependencies).
```terminal
kubectl wait --for=condition=Complete job/example-job
```
And last examine the log from the job.
```terminal
kubectl logs job/example-job
Hello, world!
```
Note that the volume already contained a text file with the string contents from
the CR. This is only the simplest example. Actual populators can set up the volume
to contain arbitrary contents.
## How to write your own volume populator
Developers interested in writing new poplators are encouraged to use the
[lib-volume-populator](https://github.com/kubernetes-csi/lib-volume-populator) library
and to only supply a small controller wrapper around the library, and a pod image
capable of attaching to volumes and writing the appropriate data to the volume.
Individual populators can be extremely generic such that they work with every type
of PVC, or they can do vendor specific things to rapidly fill a volume with data
if the volume was provisioned by a specific CSI driver from the same vendor, for
example, by communicating directly with the storage for that volume.
## The future
As this feature is still in alpha, we expect to update the out of tree controllers
with more tests and documentation. The community plans to eventually re-implement
the populator library as a sidecar, for ease of operations.
We hope to see some official community-supported populators for some widely-shared
use cases. Also, we expect that volume populators will be used by backup vendors
as a way to "restore" backups to volumes, and possibly a standardized API to do
this will evolve.
## How can I learn more?
The enhancement proposal,
[Volume Populators](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1495-volume-populators), includes lots of detail about the history and technical implementation
of this feature.
[Volume populators and data sources] (in the documenation topic about persistent volumes)
explains how to use this feature in your cluster.
Please get involved by joining the Kubernetes storage SIG to help us enhance this
feature. There are a lot of good ideas already and we'd be thrilled to have more!