11 KiB
Add support for ExistingResourcePolicy
to restore API
Abstract
Velero currently does not support any restore policy on kubernetes resources that are already present in-cluster. Velero skips over the restore of the resource if it already exists in the namespace/cluster irrespective of whether the resource present in the restore is the same or different from the one present on the cluster. It is desired that Velero gives the option to the user to decide whether or not the resource in backup should overwrite the one present in the cluster.
Background
As of Today, Velero will skip over the restoration of resources that already exist in the cluster. The current workflow followed by Velero is (Using a service
that is backed up for example):
- Velero tries to attempt restore of the
service
- Fetches the
service
from the cluster - If the
service
exists then:- Checks whether the
service
instance in the cluster is equal to theservice
instance present in backup- If not equal then skips the restore of the
service
and adds a restore warning (except for ServiceAccount objects) - If equal then skips the restore of the
service
and mentions that the restore of resourceservice
is skipped in logs
- If not equal then skips the restore of the
- Checks whether the
It is desired to add the functionality to specify whether or not to overwrite the instance of resource service
in cluster with the one present in backup during the restore process.
Related issue: https://github.com/vmware-tanzu/velero/issues/4066
Goals
- Add support for
ExistingResourcePolicy
to restore API for Kubernetes resources.
Non Goals
- Change existing restore workflow for
ServiceAccount
objects - Add support for
ExistingResourcePolicy
asrecreate
for Kubernetes resources. (Future scope feature)
Unrelated Proposals (Completely different functionalities than the one proposed in the design)
- Add support for
ExistingResourcePolicy
to restore API for Non-Kubernetes resources. - Add support for
ExistingResourcePolicy
to restore API forPersistentVolume
data.
Use-cases/Scenarios
A. Production Cluster - Backup Cluster:
Let's say you have a Backup Cluster which is identical to the Production Cluster. After some operations/usage/time the Production Cluster had changed itself, there might be new deployments, some secrets might have been updated. Now, this means that the Backup cluster will no longer be identical to the Production Cluster. In order to keep the Backup Cluster up to date/identical to the Production Cluster with respect to Kubernetes resources except PV data we would like to use Velero for scheduling new backups which would in turn help us update the Backup Cluster via Velero restore.
Reference: https://github.com/vmware-tanzu/velero/issues/4066#issuecomment-954320686
B. Help identify resource delta:
Here delta resources mean the resources restored by a previous backup, but they are no longer in the latest backup. Let's follow a sequence of steps to understand this scenario:
- Consider there are 2 clusters, Cluster A, which has 3 resources - P1, P2 and P3.
- Create a Backup1 from Cluster A which has P1, P2 and P3.
- Perform restore on a new Cluster B using Backup1.
- Now, Lets say in Cluster A resource P1 gets deleted and resource P2 gets updated.
- Create a new Backup2 with the new state of Cluster A, keep in mind Backup1 has P1, P2 and P3 while Backup2 has P2' and P3.
- So the Delta here is (|Cluster B - Backup2|), Delete P1 and Update P2.
- During Restore time we would want the Restore to help us identify this resource delta.
Reference: https://github.com/vmware-tanzu/velero/pull/4613#issuecomment-1027260446
High-Level Design
Approach 1: Add a new spec field existingResourcePolicy
to the Restore API
In this approach we do not change existing velero behavior. If the resource to restore in cluster is equal to the one backed up then do nothing following current Velero behavior. For resources that already exist in the cluster that are not equal to the resource in the backup (other than Service Accounts). We add a new optional spec field existingResourcePolicy
which can have the following values:
none
: This is the existing behavior, if Velero encounters a resource that already exists in the cluster, we simply skip restoration.update
: This option would provide the following behavior.- Unchanged resources: Velero would update the backup/restore labels on the unchanged resources, if labels patch fails Velero adds a restore error.
- Changed resources: Velero will first try to patch the changed resource, Now if the patch:
- succeeds: Then the in-cluster resource gets updated with the labels as well as the resource diff
- fails: Velero adds a restore warning and tries to just update the backup/restore labels on the resource, if the labels patch also fails then we add restore error.
recreate
: If resource already exists, then Velero will delete it and recreate the resource.
Note: The recreate
option is a non-goal for this enhancement proposal, but it is considered as a future scope.
Another thing to highlight is that Velero will not be deleting any resources in any of the policy options proposed in
this design but Velero will patch the resources in update
policy option.
Example:
A. The following Restore will execute the existingResourcePolicy
restore type none
for the services
and deployments
present in the velero-protection
namespace.
Kind: Restore
…
includeNamespaces: velero-protection
includeResources:
- services
- deployments
existingResourcePolicy: none
B. The following Restore will execute the existingResourcePolicy
restore type update
for the secrets
and daemonsets
present in the gdpr-application
namespace.
Kind: Restore
…
includeNamespaces: gdpr-application
includeResources:
- secrets
- daemonsets
existingResourcePolicy: update
Approach 2: Add a new spec field existingResourcePolicyConfig
to the Restore API
In this approach we give user the ability to specify which resources are to be included for a particular kind of force update behaviour, essentially a more granular approach where in the user is able to specify a resource:behaviour mapping. It would look like:
existingResourcePolicyConfig
:
patch:
includedResources:
string
recreate:
includedResources:
string
Note:
- There is no
none
behaviour in this approach as that would conform to the current/default Velero restore behaviour. - The
recreate
option is a non-goal for this enhancement proposal, but it is considered as a future scope.
Example:
A. The following Restore will execute the restore type patch
and apply the existingResourcePolicyConfig
for secrets
and daemonsets
present in the inventory-app
namespace.
Kind: Restore
…
includeNamespaces: inventory-app
existingResourcePolicyConfig:
patch:
includedResources
- secrets
- daemonsets
Approach 3: Combination of Approach 1 and Approach 2
Now, this approach is somewhat a combination of the aforementioned approaches. Here we propose addition of two spec fields to the Restore API - existingResourceDefaultPolicy
and existingResourcePolicyOverrides
. As the names suggest ,the idea being that existingResourceDefaultPolicy
would describe the default velero behaviour for this restore and existingResourcePolicyOverrides
would override the default policy explicitly for some resources.
Example:
A. The following Restore will execute the restore type patch
as the existingResourceDefaultPolicy
but will override the default policy for secrets
using the existingResourcePolicyOverrides
spec as none
.
Kind: Restore
…
includeNamespaces: inventory-app
existingResourceDefaultPolicy: patch
existingResourcePolicyOverrides:
none:
includedResources
- secrets
Detailed Design
Approach 1: Add a new spec field existingResourcePolicy
to the Restore API
The existingResourcePolicy
spec field will be an PolicyType
type field.
Restore API:
type RestoreSpec struct {
.
.
.
// ExistingResourcePolicy specifies the restore behaviour for the kubernetes resource to be restored
// +optional
ExistingResourcePolicy PolicyType
}
PolicyType:
type PolicyType string
const PolicyTypeNone PolicyType = "none"
const PolicyTypePatch PolicyType = "update"
Approach 2: Add a new spec field existingResourcePolicyConfig
to the Restore API
The existingResourcePolicyConfig
will be a spec of type PolicyConfiguration
which gets added to the Restore API.
Restore API:
type RestoreSpec struct {
.
.
.
// ExistingResourcePolicyConfig specifies the restore behaviour for a particular/list of kubernetes resource(s) to be restored
// +optional
ExistingResourcePolicyConfig []PolicyConfiguration
}
PolicyConfiguration:
type PolicyConfiguration struct {
PolicyTypeMapping map[PolicyType]ResourceList
}
PolicyType:
type PolicyType string
const PolicyTypePatch PolicyType = "patch"
const PolicyTypeRecreate PolicyType = "recreate"
ResourceList:
type ResourceList struct {
IncludedResources []string
}
Approach 3: Combination of Approach 1 and Approach 2
Restore API:
type RestoreSpec struct {
.
.
.
// ExistingResourceDefaultPolicy specifies the default restore behaviour for the kubernetes resource to be restored
// +optional
existingResourceDefaultPolicy PolicyType
// ExistingResourcePolicyOverrides specifies the restore behaviour for a particular/list of kubernetes resource(s) to be restored
// +optional
existingResourcePolicyOverrides []PolicyConfiguration
}
PolicyType:
type PolicyType string
const PolicyTypeNone PolicyType = "none"
const PolicyTypePatch PolicyType = "patch"
const PolicyTypeRecreate PolicyType = "recreate"
PolicyConfiguration:
type PolicyConfiguration struct {
PolicyTypeMapping map[PolicyType]ResourceList
}
ResourceList:
type ResourceList struct {
IncludedResources []string
}
The restore workflow changes will be done here
CLI changes for Approach 1
We would introduce a new CLI flag called existing-resource-policy
of string type. This flag would be used to accept the
policy from the user. The velero restore command would look somewhat like this:
velero create restore <restore_name> --existing-resource-policy=update
Help message Restore Policy to be used during the restore workflow, can be - none, update
The CLI changes will go at pkg/cmd/cli/restore/create.go
We would also add a validation which checks for invalid policy values provided to this flag.
Restore describer will also be updated to reflect the policy pkg/cmd/util/output/restore_describer.go
Implementation Decision
We have decided to go ahead with the implementation of Approach 1 as:
- It is easier to implement
- It is also easier to scale and leaves room for improvement and the door open to expanding to approach 3
- It also provides an option to preserve the existing velero restore workflow