velero/design/csi-snapshots.md

21 KiB

CSI Snapshot Support

The Container Storage Interface (CSI) introduced an alpha snapshot API in Kubernetes v1.12. It will reach beta support in Kubernetes v1.17, scheduled for release in December 2019. This proposal documents an approach for integrating support for this snapshot API within Velero, augmenting its existing capabilities.

Goals

  • Enable Velero to backup and restore CSI-backed volumes using the Kubernetes CSI CustomResourceDefinition API

Non Goals

  • Replacing Velero's existing VolumeSnapshotter API
  • Replacing Velero's Restic support

Background

Velero has had support for performing persistent volume snapshots since its inception. However, support has been limited to a handful of providers. The plugin API introduced in Velero v0.7 enabled the community to expand the number of supported providers. In the meantime, the Kubernetes sig-storage advanced the CSI spec to allow for a generic storage interface, opening up the possibility of moving storage code out of the core Kubernetes code base. The CSI working group has also developed a generic snapshotting API that any CSI driver developer may implement, giving users the ability to snapshot volumes from a standard interface.

By supporting the CSI snapshot API, Velero can extend its support to any CSI driver, without requiring a Velero-specific plugin be written, easing the development burden on providers while also reaching more end users.

High-Level Design

In order to support CSI's snapshot API, Velero must interact with the VolumeSnapshot and VolumeSnapshotContent CRDs. These act as requests to the CSI driver to perform a snapshot on the underlying provider's volume. This can largely be accomplished with Velero BackupItemAction and RestoreItemAction plugins that operate on these CRDs.

Additionally, changes to the Velero server and client code are necessary to track VolumeSnapshots that are associated with a given backup, similarly to how Velero tracks its own volume.Snapshot type. Tracking these is important for allowing users to see what is in their backup, and provides parity for the existing volume.Snapshot and PodVolumeBackup types. This is also done to retain the object store as Velero's source of truth, without having to query the Kubernetes API server for associated VolumeSnapshots.

velero backup describe --details will use the stored VolumeSnapshots to list CSI snapshots included in the backup to the user.

Detailed Design

Resource Plugins

A set of prototype plugins was developed that informed this design.

The plugins will be as follows:

A BackupItemAction for PersistentVolumeClaims, named velero.io/csi-pvc

This plugin will act directly on PVCs, since an implementation of Velero's VolumeSnapshotter does not have enough information about the StorageClass to properly create the VolumeSnapshot objects.

The associated PV will be queried and checked for the presence of PersistentVolume.Spec.PersistentVolumeSource.CSI. (See the "Snapshot Mechanism Selection" section below). If this field is nil, then the plugin will return early without taking action. If the Backup.Spec.SnapshotVolumes value is false, the plugin will return early without taking action.

Additionally, to prevent creating CSI snapshots for volumes backed up by restic, the plugin will query for all pods in the PersistentVolumeClaim's namespace. It will then filter out the pods that have the PVC mounted, and inspect the backup.velero.io/backup-volumes annotation for the associated volume's name. If the name is found in the list, then the plugin will return early without taking further action.

Create a VolumeSnapshot.snapshot.storage.k8s.io object from the PVC. Label the VolumeSnapshot object with the velero.io/backup-name label for ease of lookup later. Also set an ownerRef on the VolumeSnapshot so that cascading deletion of the Velero Backup will delete associated VolumeSnapshots.

The CSI controllers will create a VolumeSnapshotContent.snapshot.storage.k8s.io object associated with the VolumeSnapshot.

Associated VolumeSnapshotContent objects will be retrieved and updated with the velero.io/backup-name label for ease of lookup later. velero.io/volume-snapshot-name will be applied as a label to the PVC so that the VolumeSnapshot can be found easily for restore.

VolumeSnapshot, VolumeSnapshotContent, and VolumeSnapshotClass objects would be returned as additional items to be backed up. GitHub issue 1566 represents this work.

The VolumeSnapshotContent.Spec.VolumeSnapshotSource.SnapshotHandle field is the link to the underlying platform's on-disk snapshot, and must be preserved for restoration.

The plugin will not wait for the VolumeSnapshot.Status.readyToUse field to be true before returning. This field indicates that the snapshot is ready to use for restoration, and for different vendors can indicate that the snapshot has been made durable. However, the applications can proceed as soon as VolumeSnapshot.Status.CreationTime is set. This also maintains current Velero behavior, which allows applications to quiesce and resume quickly, with minimal interruption.

Any sort of monitoring or waiting for durable snapshots, either Velero-native or CSI snapshots, are not covered by this proposal.

K8s object relationships inside of the backup tarball
+-----------------------+               +-----------------------+
| PersistentVolumeClaim +-------------->+ PersistentVolume      |
+-----------+-----------+               +-----------+-----------+
            ^                                       ^
            |                                       |
            |                                       |
            |                                       |
+-----------+-----------+               +-----------+-----------+
| VolumeSnapshot        +<------------->+ VolumeSnapshotContent |
+-----------------------+               +-----------------------+

A RestoreItemAction for VolumeSnapshotContent objects, named velero.io/csi-vsc

On restore, VolumeSnapshotContent objects are cleaned so that they may be properly associated with IDs assigned by the target cluster.

Only VolumeSnapshotContent objects with the velero.io/backup-name label will be processed, using the plugin's AppliesTo function.

The metadata (excluding labels), PersistentVolumeClaim.UUID, and VolumeSnapshotRef.UUID fields will be cleared. The reference fields are cleared because the associated objects will get new UUIDs in the cluster. This also maps to the "import" case of the snapshot API.

This means the relationship between the VolumeSnapshot and VolumeSnapshotContent is one way until the CSI controllers rebind them.

K8s objects after the velero.io/csi-vsc plugin has run
+-----------------------+               +-----------------------+
| PersistentVolumeClaim +-------------->+ PersistentVolume      |
+-----------------------+               +-----------------------+
                                                     
                                                     
+-----------------------+               +-----------------------+
| VolumeSnapshot        +-------------->+ VolumeSnapshotContent |
+-----------------------+               +-----------------------+

A RestoreItemAction for VolumeSnapshot objects, named velero.io/csi-vs

VolumeSnapshot objects must be prepared for importing into the target cluster by removing IDs and metadata associated with their origin cluster.

Only VolumeSnapshot objects with the velero.io/backup-name label will be processed, using the plugin's AppliesTo function.

Metadata (excluding labels) and Source (that is, the pointer to the PersistentVolumeClaim) fields on the object will be cleared. The VolumeSnapshot.Spec.SnapshotContentName is the link back to the VolumeSnapshotContent object, and thus the actual snapshot. The Source field indicates that a new CSI snapshot operation should be performed, which isn't relevant on restore. This follows the "import" case of the snapshot API.

The Backup associated with the VolumeSnapshot will be queried, and set as an ownerRef on the VolumeSnapshot so that deletion can cascade.

+-----------------------+               +-----------------------+
| PersistentVolumeClaim +-------------->+ PersistentVolume      |
+-----------------------+               +-----------------------+
                                                     
                                                     
+-----------------------+               +-----------------------+
| VolumeSnapshot        +-------------->+ VolumeSnapshotContent |
+-----------------------+               +-----------------------+

A RestoreItemAction for PersistentVolumeClaims named velero.io/csi-pvc

On restore, PersistentVolumeClaims will need to be created from the snapshot, and thus will require editing before submission.

Only PersistentVolumeClaim objects with the velero.io/volume-snapshot-name label will be processed, using the plugin's AppliesTo function. Metadata (excluding labels) will be cleared, and the velero.io/volume-snapshot-name label will be used to find the relevant VolumeSnapshot. A reference to the VolumeSnapshot will be added to the PersistentVolumeClaim.DataSource field.

+-----------------------+                                        
| PersistentVolumeClaim |                                        
+-----------------------+                                        
                                                                 
+-----------------------+               +-----------------------+
| VolumeSnapshot        +-------------->+ VolumeSnapshotContent |
+-----------------------+               +-----------------------+

VolumeSnapshotClasses

No special logic is required to restore VolumeSnapshotClass objects.

These plugins should be provided with Velero, as there will also be some changes to core Velero code to enable association of a Backup to the included VolumeSnapshots.

Velero server changes

Any non-plugin code changes must be behind a EnableCSI feature flag and the behavior will be opt-in until it's exited beta status. This will allow the development to continue on the feature while it's in pre-production state, while also reducing the need for long-lived feature branches.

persistBackup will be extended to query for all VolumeSnapshots associated with the backup, and persist the list to JSON.

BackupStore.PutBackup will receive an additional argument, volumeSnapshots io.Reader, that contains the JSON representation of VolumeSnapshots. This will be written to a file named csi-snapshots.json.gz.

defaultRestorePriorities should be rewritten to the following to accomodate proper association between the CSI objects and PVCs. CustomResourceDefinitions are moved up because they're necessary for creating the CSI CRDs. The CSI CRDs are created before PersistentVolumes and PersistentVolumeClaims so that they may be used as data sources. GitHub issue 1565 represents this work.

var defaultRestorePriorities = []string{
    "namespaces",
    "storageclasses",
    "customresourcedefinitions",
    "volumesnapshotclass.snapshot.storage.k8s.io",
    "volumesnapshotcontents.snapshot.storage.k8s.io",
    "volumesnapshots.snapshot.storage.k8s.io",
    "persistentvolumes",
    "persistentvolumeclaims",
    "secrets",
    "configmaps",
    "serviceaccounts",
    "limitranges",
    "pods",
    "replicaset",
}

Restic and CSI interaction

Volumes found in a Pod's backup.velero.io/backup-volumes list will use Velero's current Restic code path. This also means Velero will continue to offer Restic as an option for CSI volumes.

The velero.io/csi-pvc BackupItemAction plugin will inspect pods in the namespace to ensure that it does not act on PVCs already being backed up by restic.

This is preferred to modifying the PVC due to the fact that Velero's current backup process backs up PVCs and PVs mounted to pods at the same time as the pod.

A drawback to this approach is that we're querying all pods in the namespace per PVC, which could be a large number. In the future, the plugin interface could be improved to have some sort of context argument, so that additional data such as our existing resticSnapshotTracker could be passed to plugins and reduce work.

Garbage collection and deletion

To ensure that all created resources are deleted when a backup expires or is deleted, VolumeSnapshots will have an ownerRef defined pointing to the Velero backup that created them.

In order to fully delete these objects, each VolumeSnapshotContents object will need to be edited to ensure the associated provider snapshot is deleted. This will be done by editing the object and setting VolumeSnapshotContent.Spec.DeletionPolicy to Delete, regardless of whether or not the default policy for the class is Retain. See the Deletion Policies section below. The edit will happen before making Kubernetes API deletion calls to ensure that the cascade works as expected.

Deleting a Velero Backup or any associated CSI object via kubectl is unsupported; data will be lost or orphaned if this is done.

Other snapshots included in the backup

Since VolumeSnapshot and VolumeSnapshotContent objects are contained within a Velero backup tarball, it is possible that all CRDs and on-disk provider snapshots have been deleted, yet the CRDs are still within other Velero backup tarballs. Thus, when a Velero backup that contains these CRDs is restored, the VolumeSnapshot and VolumeSnapshotContent objects are restored into the cluster, the CSI controllers will attempt to reconcile their state, and there are two possible states when the on-disk snapshot has been deleted:

1) If the driver _does not_ support the `ListSnapshots` gRPC method, then the CSI controllers have no way of knowing how to find it, and sets the `VolumeSnapshot.Status.readyToUse` field to `true`.
2) If the driver _does_ support the `ListSnapshots` gRPC method, then the CSI controllers will query the state of the on-disk snapshot, see it is missing, and set `VolumeSnapshot.Status.readyToUse` and `VolumeSnapshotContent.Status.readyToUse` fields to `false`.

Velero client changes

To use CSI features, the Velero client must use the EnableCSI feature flag.

DescribeBackupStatus will be extended to download the csi-snapshots.json.gz file for processing. GitHub Issue 1568 captures this work.

A new describeCSIVolumeSnapshots function should be added to the output package that knows how to render the included VolumeSnapshot names referenced in the csi-snapshots.json.gz file.

Snapshot selection mechanism

The most accurate, reliable way to detect if a PersistentVolume is a CSI volume is to check for a non-nil PersistentVolume.Spec.PersistentVolumeSource.CSI field. Using the volume.beta.kubernetes.io/storage-provisioner is not viable, since the usage is for any PVC that should be dynamically provisioned, and is not limited to CSI implementations. It was introduced with dynamic provisioning support in 2016, predating CSI.

In the BackupItemAction for PVCs, the associated PV will be queried and checked for the presence of PersistentVolume.Spec.PersistentVolumeSource.CSI. Volumes with any other PersistentVolumeSource set will use Velero's current VolumeSnapshotter plugin code path.

VolumeSnapshotLocations and VolumeSnapshotClasses

Velero uses its own VolumeSnapshotLocation CRDs to specify configuration options for a given storage system. In Velero, this often includes topology information such as regions or availibility zones, as well as credential information.

CSI volume snapshotting has a VolumeSnapshotClass CRD which also contains configuration options for a given storage system, but these options are not the same as those that Velero would use. Since CSI volume snapshotting is operating within the same storage system that manages the volumes already, it does not need the same topology or credential information that Velero does.

As such, when used with CSI volumes, Velero's VolumeSnapshotLocation CRDs are not relevant, and could be omitted.

This will create a separate path in our documentation for the time being, and should be called out explicitly.

Alternatives Considered

  • Implementing similar logic in a Velero VolumeSnapshotter plugin was considered. However, this is inappropriate given CSI's data model, which requires a PVC/PV's StorageClass. Given the arguments to the VolumeSnapshotter interface, the plugin would have to instantiate its own client and do queries against the Kubernetes API server to get the necessary information.

This is unnecessary given the fact that the BackupItemAction and RestoreItemAction APIs can act directly on the appropriate objects.

Additionally, the VolumeSnapshotter plugins and CSI volume snapshot drivers overlap - both produce a snapshot on backup and a PersistentVolume on restore. Thus, there's not a logical place to fit the creation of VolumeSnapshot creation in the VolumeSnapshotter interface.

  • Implement CSI logic directly in Velero core code. The plugins could be packaged separately, but that doesn't necessarily make sense with server and client changes being made to accomodate CSI snapshot lookup.

  • Implementing the CSI logic entirely in external plugins. As mentioned above, the necessary plugins for PersistentVolumeClaim, VolumeSnapshot, and VolumeSnapshotContent could be hosted out-out-of-tree from Velero. In fact, much of the logic for creating the CSI objects will be driven entirely inside of the plugin implementation.

However, Velero currently has no way for plugins to communicate that some arbitrary data should be stored in or retrieved from object storage, such as list of all VolumeSnapshot objects associated with a given Backup. This is important, because to display snapshots included in a backup, whether as native snapshots or Restic backups, separate JSON-encoded lists are stored within the backup on object storage. Snapshots are not listed directly on the Backup to fit within the etcd size limitations. Additionally, there are no client-side Velero plugin mechanisms, which means that the velero describe backup --details command would have no way of displaying the objects to the user, even if they were stored.

Deletion Policies

In order for underlying, provider-level snapshots to be retained similarly to Velero's current functionality, the VolumeSnapshotContent.Spec.DeletionPolicy field must be set to Retain.

This is most easily accomplished by setting the VolumeSnapshotClass.DeletionPolicy field to Retain, which will be inherited by all VolumeSnapshotContent objects associated with the VolumeSnapshotClass.

The current default for dynamically provisioned VolumeSnapshotContent objects is Delete, which will delete the provider-level snapshot when the VolumeSnapshotContent object representing it is deleted. Additionally, the Delete policy will cascade a deletion of a VolumeSnapshot, removing the associated VolumeSnapshotContent object.

It is not currently possible to define a deletion policy on a VolumeSnapshot that gets passed to a VolumeSnapshotContent object on an individual basis.

Security Considerations

This proposal does not significantly change Velero's security implications within a cluster.

If a deployment is using solely CSI volumes, Velero will no longer need privileges to interact with volumes or snapshots, as these will be handled by the CSI driver. This reduces the provider permissions footprint of Velero.

Velero must still be able to access cluster-scoped resources in order to back up VolumeSnapshotContent objects. Without these objects, the provider-level snapshots cannot be located in order to re-associate them with volumes in the event of a restore.