* Add builders for CRD schemas
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add test case for #2319
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add failing test case
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Remove unnecessary print and temporary variable
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add some options for fixing the test case
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Switch to a JSON middle step to "fix" conversions
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add comment and changelog
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Account for possible missing schemas on v1 CRDs
If a v1beta1 CRD without a Schema was submitted to a Kubernets v1.16
cluster, then Kubernetes will server it back as a v1 CRD without a
schema.
However, when Velero tries to restore this document, the request will be
rejected as a v1 CRD must have a schema.
This commit has some defensive coding on the restore side, as well as
potential fixes on the backup side for getting around this.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Back up nonstructural CRDs as v1beta1
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add tests for remapping plugin
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add builders for v1 CRDs
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Address review feedback
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Remove extraneous log message
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add changelog
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Wait for CRDs to be available and ready
When restoring CRDs, we should wait for the definition to be ready and
available before moving on to restoring specific CRs.
While the CRDs are often ready by the time we get to restoring a CR,
there is a race condition where the CRD isn't ready.
This change waits on each CRD at restore time.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Pruning unknown fields
In CRD apiversion v1beta1, default preserveUnknownFields=true.
In CRD apiversion v1, the preserveUnknownFields can only be false.
Otherwise, the k8s validation bumps out error message for the
invalid preserveUnknownFields value.
Deploy Velero on k8s 1.16+ with CRD apiversion v1beta1, the
k8s cluster converts apiversion from v1beta1 to v1 automatically.
Fully backup and restore the cluster, restore bumps out error message
due to the preserveUnknownFields=true is not allowed on k8s 1.16+.
Since the CRD structural schema had been defined, enable the preserveUnknownFields
to false to solves the restore bumps out error message on k8s 1.16+.
Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>
* Add changelog
Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>
* Check for nil LastMaintenanceTime in dueForMaintenance
ResticRepository.dueForMaintenance causes a panic in the velero pod
("invalid memory address or nil pointer dereference") if
repository.Status.LastMaintenanceTime is nil. This fix returns 'true'
if it's nil, so the repository is due for maintenance if LastMaintenanceTime
is nil *or* the time elapsed since the last maintenance is greater than
repository.Spec.MaintenanceFrequency.Duration
Signed-off-by: Scott Seago <sseago@redhat.com>
* changelog for PR#2200
Signed-off-by: Scott Seago <sseago@redhat.com>
* remove fsfreeze-pause image, replace with ubuntu in nginx example
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* switch to sleep infinity for clarity
Signed-off-by: Steve Kriss <krisss@vmware.com>
* restic: don't try to restore PVBs with no snapshotID
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add --allow-partially-failed flag to velero restore create
Signed-off-by: Steve Kriss <krisss@vmware.com>
* remove extraneous client creation
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add godoc to helper func
Signed-off-by: Steve Kriss <krisss@vmware.com>
* todo
Signed-off-by: Steve Kriss <krisss@vmware.com>
Related issue: https://github.com/heptio/velero/issues/1830
This accomplishes everything
that's needed, although there might be room for improvement in avoiding
a GET call for matching CRDs for each resource backed up. An alternative
could be a single call to get all CRDs prior to iterating over resources
and passing this into the backupResource function.
Signed-off-by: Scott Seago <sseago@redhat.com>
* feat: add azure china support
Signed-off-by: andyzhangx <xiazhang@microsoft.com>
* remove AZURE_CLOUD_NAME from required env var fetching
Signed-off-by: Steve Kriss <krisss@vmware.com>
* minor simplification of parseAzureEnvironment
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* remove cloudNameEnvVar from getRequiredValues call
Signed-off-by: Steve Kriss <krisss@vmware.com>
* just check for err != nil
Signed-off-by: Steve Kriss <krisss@vmware.com>
* backup sync controller: replace revision file with full diff each interval
Signed-off-by: Steve Kriss <krisss@vmware.com>
* remove getting/setting of metadata/revision file
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* tweak logging
Signed-off-by: Steve Kriss <krisss@vmware.com>
* don't keep podVolumeBackup log field around after syncing PVBs
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update generated CRDs
Signed-off-by: Steve Kriss <krisss@vmware.com>
* velero install: wait for restic daemonset to be ready
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* rename PV during restore when cloning a namespace
Signed-off-by: Steve Kriss <krisss@vmware.com>
* rename func and vars, switch to if..else
Signed-off-by: Steve Kriss <krisss@vmware.com>
* make pv renamer func configurable for testing purposes
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add unit test cases
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* address review feedback
Signed-off-by: Steve Kriss <krisss@vmware.com>
* address review feedback
Signed-off-by: Steve Kriss <krisss@vmware.com>
* when backing up PVCs with restic, explicitly specify --parent
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* address review feedback
Signed-off-by: Steve Kriss <krisss@vmware.com>
The Velero deployment did not have a way of exposing the namespace it
was installed in to the API client. This is a problem for plugins that
need to query for resources in that namespaces, such as the restic
restore process that needs to find PodVolume(Backup|Restore)s.
While the Velero client is consulted for a configured namespace, this
cannot be set in the server pod since there is no valid home directory
in which to place it.
This change provides the namespace to the deployment via the downward
API, and updates the API client factory to use the VELERO_NAMESPACE
before looking at the config file, so that any plugins using the client
will look at the appropriate namespace.
Fixes#1743
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* use Backup CR labels as tags for snapshots
This allows users to define custom tags to be added to snapshots, by
specifying custom labels on the Backup CR with the `velero backup create
--labels` flag.
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* periodically check for stale restic repo locks
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* only try to init a restic repo if it doesn't already exist
Signed-off-by: Steve Kriss <krisss@vmware.com>
* reword comment
Signed-off-by: Steve Kriss <krisss@vmware.com>
Flags specifying the kubeconfig or kubecontext to use weren't actually
being used by the install command.
Fixes#1651
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Remove references to apps/v1beta1 API group
In Kubernetes v1.16, the apps/v1 API group will be the default served
for relevant resources.
Update any references to apps/v1beta1 for fowards compatibility.
Fixes#1672
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Update API group on plugin commands
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* add restore item action to change PVC/PV storage class name
Signed-off-by: Steve Kriss <krisss@vmware.com>
* code review
Signed-off-by: Steve Kriss <krisss@vmware.com>
* change existing plugin names to lowercase/hyphenated
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add validation for new storage class name
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add test cases
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* fix imports
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update plugin names to be more consistent
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update unit tests to use pkg/test object constructors
Signed-off-by: Steve Kriss <krisss@vmware.com>
CSI volumes are mounted one level deeper than "native" kubernetes
volumes, and this needs to be appended for proper restic support.
Fixes#1313.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* allow users to specify additional Velero/restic pod annotations on the command line with the pod-annotations flag
Signed-off-by: Traci Kamp <traci.kamp@gmail.com>
* record PodVolumeBackup start and completion timestamps
adds startTimestamp and completionTimestamp fields to the
PodVolumeBackup status spec
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* record PodVolumeRestore start and completion timestamps
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* allow exclusion of resources using standard label
excludes any resources with the velero.io/exclude-from-backup=true label
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* ensure backup item action modifications reflected in tarball filepath
This patch ensures the updated backup item's name and namespace are used
when constructing the filepath for the tarball.
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
* changelog
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
This fix initialises an empty map if the request object's Labels map
is nil, allowing the controller to later add and modify labels on the
object.
Signed-off-by: Adnan Abdulhussein <aadnan@vmware.com>
When using `velero install`, the image used should be a reasonable
default, even if buildinfo.Version is missing (such as when using `go
build` directly).
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Velero should handle cases when the label length exceeds 63 characters.
- if the length of the backup/restore name is <= 63 characters, use it as the value of the label
- if it's > 63 characters, take the SHA256 hash of the name. the value of
the label will be the first 57 characters of the backup/restore name
plus the first six characters of the SHA256 hash.
Fixes heptio#1021
Signed-off-by: Anshul Chandra <anshulc@vmware.com>
Prior to 1.0, user-provided values such as cloud provider names didn't
need to be namespaced. This change allows those values to continue
working without the user editing the fields manually.
Fixes#1363
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Adds support for allowing a RestoreItemAction to skip item restore
This allows a RestoreItemAction plugin to signal to velero that
the returned item should be skipped rather than restored to the
cluster.
To support this, a boolean SkipRestore attribute is added to
RestoreItemActionExecuteOutput. If restore.restoreResource finds
this set to true, any remaining actions on this item are skipped,
and restore on this item is skipped. Execution continues with
the next item of this resource type.
To signal this for a particular item, the RestoreItemAction's
Execute method should call WithoutRestore() on the
RestoreItemActionExecuteOutput before returning it.
Signed-off-by: Scott Seago <sseago@redhat.com>
* Autogenerated code to support SkipRestore
Signed-off-by: Scott Seago <sseago@redhat.com>
* Added changelog for #1336
Signed-off-by: Scott Seago <sseago@redhat.com>
Original error was:
```
ark -n <redacted> restore logs <redacted>
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=nodes logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=events logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=events.events.k8s.io logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=backups.ark.heptio.com logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=restores.ark.heptio.com logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Starting restore of backup backup/<redacted>" logSource="pkg/restore/restore.go:342"
time="2019-03-06T18:31:06Z" level=info msg="error unzipping and extracting: open /tmp/604421455/resources/rolebindings.rbac.authorization.k8s.io/namespaces/<redacted>/<redacted>: too many open files" logSource="pkg/restore/restore.go:346"
```
Downloading the directory from s3 and untarring it I found 1036 files. The ulimit -n output says 1024. This is our team's best guess at a root cause. But the code fixed in the PR definitely is holding all the files open until the method closes: https://blog.learngoprogramming.com/gotchas-of-defer-in-go-1-8d070894cb01
Please note my go code abilities are not great and I did not test this. I just edited the file in github. All I did was remove the defer and put fileClose after the copy is done. Theoretically this should only hold one file open at a time now. Let me know if you want me to do any further steps.
Thank you,
-Asaf
Signed-off-by: Asaf Erlich <aerlich@groupon.com>
* Move Phase to right under Metadata(name/namespace/label/annotations)
* Move Validation errors: section right after Phase: section and only
show it if the item has a phase of FailedValidation
* For restores move Warnings and Errors under Validation errors. Do not
show Warnings or Errors if there are none.
Signed-off-by: DheerajSShetty <dheerajs@vmware.com>
Fixes#987
To create a regional disk, the URLs for the zones in which the disk is
replicated must be provided to the GCP API.
Fixes#1183
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
If a PV already exists, wait for it, it's associated PVC, and associated
namespace to be deleted before attempting to restore it.
If a namespace already exists, wait for it to be deleted before
attempting to restore it.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>