* Improve readbility and formatting of pkg/restore/restore.go
Signed-off-by: F. Gold <fgold@vmware.com>
* Update paths to include API group versions
Signed-off-by: F. Gold <fgold@vmware.com>
* Use full word, 'resource' instead of 'resrc'
Signed-off-by: F. Gold <fgold@vmware.com>
* Use pod namespace from backup when matching PVBs
In #3051, we introduced an additional check to ensure that a PVB matched
a particular pod by checking both the name and the namespace of the pod.
This caused an issue when using a namespace mapping on restore. In the
case where a namespace mapping is being used, the check for whether a
PVB matches a particular pod will fail as the PVB was created for the
original pod namespace and is not aware of the new namespace mapping
being used. This resulted in PVRs not being created for pods that were
being restored into new namespaces. The restic init containers were
being created to wait on the volume restore, however this would cause
the restored pods to block indefinitely as they would be waiting for a
volume restore that was not scheduled.
To fix this, we use the original namespace of the pod from the backup to
match the PVB to the pod being restored, not the new namespace where
the pod is being restored into.
Fixes#3467.
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Explain why the namespace mapping can't be used
Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
* Restore API group version by priority
Signed-off-by: F. Gold <fgold@vmware.com>
* Add changelog
Signed-off-by: F. Gold <fgold@vmware.com>
* Correct spelling
Signed-off-by: F. Gold <fgold@vmware.com>
* Refactor userResourceGroupVersionPriorities(...) to accept config map, adjust unit test
Signed-off-by: F. Gold <fgold@vmware.com>
* Move some unit tests into e2e
Signed-off-by: F. Gold <fgold@vmware.com>
* Add three e2e tests using Testify Suites
Summary of changes
Makefile - add testify e2e test target
go.sum - changed with go mod tidy
pkg/install/install.go - increased polling timeout
test/e2e/restore_priority_group_test.go - deleted
test/e2e/restore_test.go - deleted
test/e2e/velero_utils.go - made restic optional in velero install
test/e2e_testify/Makefile - makefile for testify e2e tests
test/e2e_testify/README.md - example command for running tests
test/e2e_testify/common_test.go - helper functions
test/e2e_testify/e2e_suite_test.go - prepare for tests and run
test/e2e_testify/restore_priority_apigv_test.go - test cases
Signed-off-by: F. Gold <fgold@vmware.com>
* Make changes per @nrb code review
Signed-off-by: F. Gold <fgold@vmware.com>
* Wait for pods in e2e tests
Signed-off-by: F. Gold <fgold@vmware.com>
* Remove testify suites e2e scaffolding moved to PR #3354
Signed-off-by: F. Gold <fgold@vmware.com>
* Make changes per @brito-rafa and Velero maintainers code reviews
- Made changes suggested by @brito-rafa in GitHub.
- We had a code review meeting with @carlisia, @dsu-igeek, @zubron, and @nrb
- and changes were made based on their suggetions:
- pull in logic from 'meetsAPIGVResotreReqs()' to restore.go.
- add TODO to remove APIGroupVersionFeatureFlag check
- have feature flag and backup version format checks in separate `if` statements.
- rename variables to be sourceGVs, targetGVs, and userGVs.
Signed-off-by: F. Gold <fgold@vmware.com>
* Convert Testify Suites e2e tests to existing Ginkgo framework
Signed-off-by: F. Gold <fgold@vmware.com>
* Made changes per @zubron PR review
Signed-off-by: F. Gold <fgold@vmware.com>
* Run go mod tidy after resolving go.sum merge conflict
Signed-off-by: F. Gold <fgold@vmware.com>
* Add feature documentation to velero.io site
Signed-off-by: F. Gold <fgold@vmware.com>
* Add config map e2e test; rename e2e test file and name
Signed-off-by: F. Gold <fgold@vmware.com>
* Update go.{mod,sum} files
Signed-off-by: F. Gold <fgold@vmware.com>
* Move CRDs and CRs to testdata folder
Signed-off-by: F. Gold <fgold@vmware.com>
* Fix typos in cert-manager to pass codespell CICD check
Signed-off-by: F. Gold <fgold@vmware.com>
* Make changes per @nrb code review round 2
- make checkAndReadDir function private
- add info level messages when priorties 1-3 API group versions can not be used
Signed-off-by: F. Gold <fgold@vmware.com>
* Make user config map rules less strict
Signed-off-by: F. Gold <fgold@vmware.com>
* Update e2e test image version in example
Signed-off-by: F. Gold <fgold@vmware.com>
* Update case A music-system controller code
Signed-off-by: F. Gold <fgold@vmware.com>
* Documentation updates
Signed-off-by: F. Gold <fgold@vmware.com>
* Update migration case documentation
Signed-off-by: F. Gold <fgold@vmware.com>
* -> Preserve nodePort support when restoring via "--preserve-nodeports" flag
Signed-off-by: Yusuf Güngör <yusuf.gungor@hepsiburada.com>
* -> Added changelog.
Signed-off-by: Yusuf Güngör <yusuf.gungor@hepsiburada.com>
* -> Unit test added.
-> Using boolptr.IsSetToTrue for bool ptr check.
Signed-off-by: Yusuf Güngör <yusuf.gungor@hepsiburada.com>
* -> Unit test added.
-> Using boolptr.IsSetToTrue for bool ptr check.
Signed-off-by: Yusuf Güngör <yusuf.gungor@hepsiburada.com>
* -> Other restore errors log level changed from info to error.
-> Documentation updated about Velero nodePort restore logic and preservation of them.
Signed-off-by: Yusuf Güngör <yusuf.gungor@hepsiburada.com>
Co-authored-by: Yusuf Güngör <yusuf.gungor@hepsiburada.com>
By running the following command:
codespell -S .git,*.png,*.jpg,*.woff,*.ttf,*.gif,*.ico -L \
iam,aks,ist,bridget,ue
Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
* fixing label for 'velero.io/change-pvc-node-selector' plugin in site document
Signed-off-by: mayank <mayank.patel@mayadata.io>
* Fixing "velero.io/change-pvc-node-selector" to fetch config using plugin name
Signed-off-by: mayank <mayank.patel@mayadata.io>
* adding changelog
Signed-off-by: mayank <mayank.patel@mayadata.io>
* Only remove the UID from a PV's claimRef
The UID is the only part of a claimRef that might prevent it from being
rebound correctly on a restore. The namespace and name within the
claimRef should be preserved in order to ensure that the PV is claimed
by the correct PVC on restore.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Remap PVs claimRef.namespace on relevant restores
When remapping namespaces, any included PVs need to have their claimRef
updated to point remapped namespaces to the new namespace name in order
to be bound to the correct PVC.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Update tests and ensure claimRef namespace remaps
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Remove lowercased uid field from unstructured PV
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Fix issues that prevented PVs from being restored
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add changelog
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Dynamically reprovision volumes without snapshots
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Update test for lower case uid field
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Remove stray debugging print statement
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Fix typo, remove extra code, add tests.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Exec hooks in restored pods
Signed-off-by: Andrew Reed <andrew@replicated.com>
* WaitExecHookHandler implements ItemHookHandler
This required adding a context.Context argument to the ItemHookHandler
interface which is unused by the DefaultItemHookHandler implementation.
It also means passing nil for the []ResourceHook argument since that
holds BackupResourceHook.
Signed-off-by: Andrew Reed <andrew@replicated.com>
* WaitExecHookHandler unit tests
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Changelog and go fmt
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Fix double import
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Default to first contaienr in pod
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Use constants for hook error modes in tests
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Revert to separate WaitExecHookHandler interface
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Negative tests for invalid timeout annotations
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Rename NamedExecRestoreHook PodExecRestoreHook
Also make field names more descriptive.
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Cleanup test names
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Separate maxHookWait and add unit tests
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Comment on maxWait <= 0
Also info log container is not running for hooks to execute in.
Also add context error to hooks not executed errors.
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Remove log about default for invalid timeout
There is no default wait or exec timeout.
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Linting
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Fix log message and rename controller to podWatcher
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Comment on exactly-once semantics for handler
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Fix logging and comments
Use filed logger for pod in handler.
Add comment about pod changes in unit tests.
Use kube util NamespaceAndName in messages.
Signed-off-by: Andrew Reed <andrew@replicated.com>
* Fix maxHookWait
Signed-off-by: Andrew Reed <andrew@replicated.com>
* fix: rename the PV if VolumeSnapshotter has modified the PV name
When VolumeSnapshotter sets the PV name via SetVolumeID and PV is
not there in the cluster, velero does not rename the PV. Which causes
the pvc to be in the lost state as pvc points to the old PV but pv object
has been renamed by VolumeSnapshotter.
Signed-off-by: Pawan <pawan@mayadata.io>
* adding a test case for pv rename
Signed-off-by: Pawan <pawan@mayadata.io>
* k8s 1.18 import wip
backup, cmd, controller, generated, restic, restore, serverstatusrequest, test and util
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* go mod tidy
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* add changelog file
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* go fmt
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* update code-generator and controller-gen in CI
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* checkout proper code-generator version, regen
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* fix remaining calls
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* regenerate CRDs with ./hack/update-generated-crd-code.sh
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* use existing context in restic and server
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* fix test cases by resetting resource version
also use main library go context, not golang.org/x/net/context, in pkg/restore/restore.go
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* clarify changelog message
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* use github.com/kubernetes-csi/external-snapshotter/v2@v2.2.0-rc1
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* run 'go mod tidy' to remove old external-snapshotter version
Signed-off-by: Andrew Lavery <laverya@umich.edu>
* Use a helper function when querying w/ backup label
Setting or querying for a backup label name should always pass the value
through the GetValidName function. This change passes query uses of the
backup label value through the GetValidName function by introducing 2
new helpers, one for making a Selector, one for making a ListOptions.
It also removes functions returning the same data, but under
unecessarily specific names.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Document using the label.GetValidName function
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Update copyright year
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Clarify labels.GetValidName and annotations
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Move functions to pkg/label
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Fix function comments
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Account for possible missing schemas on v1 CRDs
If a v1beta1 CRD without a Schema was submitted to a Kubernets v1.16
cluster, then Kubernetes will server it back as a v1 CRD without a
schema.
However, when Velero tries to restore this document, the request will be
rejected as a v1 CRD must have a schema.
This commit has some defensive coding on the restore side, as well as
potential fixes on the backup side for getting around this.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Back up nonstructural CRDs as v1beta1
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add tests for remapping plugin
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add builders for v1 CRDs
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Address review feedback
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Remove extraneous log message
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Add changelog
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
* Wait for CRDs to be available and ready
When restoring CRDs, we should wait for the definition to be ready and
available before moving on to restoring specific CRs.
While the CRDs are often ready by the time we get to restoring a CR,
there is a race condition where the CRD isn't ready.
This change waits on each CRD at restore time.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Install restic with CPU/Memory limits is optional.
If velero cannot parse resource requirements, use default value instead.
After that, the administrator won't get confused that something recovered failed.
Signed-off-by: JenTing Hsiao <jenting.hsiao@suse.com>
Migrate logic from NewUUID function into the pvRenamer function.
PR #2133 switched to a new NewUUID function that returns an error, but
the invocation of that function needs to happen within the pvRenamer
closure. Because the new function returns an error, the pvRenamer should
return the error, the signature needs to be changed and the return
checked.
Signed-off-by: John Naulty <johnnaulty@bitgo.com>
satori/go.uuid has a known issue with random uuid generation.
gofrs/uuid is still maintained and has fixed the random uuid generation
issue present in satori/go.uuid
Signed-off-by: John Naulty <johnnaulty@bitgo.com>
* restic: don't try to restore PVBs with no snapshotID
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update import paths to github.com/vmware-tanzu/...
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update other GH org refs to vmware-tanzu
Signed-off-by: Steve Kriss <krisss@vmware.com>
* site and docs: update GH org to vmware-tanzu
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update travis badge links on docs readmes
Signed-off-by: Steve Kriss <krisss@vmware.com>
* rename PV during restore when cloning a namespace
Signed-off-by: Steve Kriss <krisss@vmware.com>
* rename func and vars, switch to if..else
Signed-off-by: Steve Kriss <krisss@vmware.com>
* make pv renamer func configurable for testing purposes
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add unit test cases
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* address review feedback
Signed-off-by: Steve Kriss <krisss@vmware.com>
* address review feedback
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add restore item action to change PVC/PV storage class name
Signed-off-by: Steve Kriss <krisss@vmware.com>
* code review
Signed-off-by: Steve Kriss <krisss@vmware.com>
* change existing plugin names to lowercase/hyphenated
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add validation for new storage class name
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add test cases
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* fix imports
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update plugin names to be more consistent
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update unit tests to use pkg/test object constructors
Signed-off-by: Steve Kriss <krisss@vmware.com>
CSI volumes are mounted one level deeper than "native" kubernetes
volumes, and this needs to be appended for proper restic support.
Fixes#1313.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Velero should handle cases when the label length exceeds 63 characters.
- if the length of the backup/restore name is <= 63 characters, use it as the value of the label
- if it's > 63 characters, take the SHA256 hash of the name. the value of
the label will be the first 57 characters of the backup/restore name
plus the first six characters of the SHA256 hash.
Fixes heptio#1021
Signed-off-by: Anshul Chandra <anshulc@vmware.com>
* Adds support for allowing a RestoreItemAction to skip item restore
This allows a RestoreItemAction plugin to signal to velero that
the returned item should be skipped rather than restored to the
cluster.
To support this, a boolean SkipRestore attribute is added to
RestoreItemActionExecuteOutput. If restore.restoreResource finds
this set to true, any remaining actions on this item are skipped,
and restore on this item is skipped. Execution continues with
the next item of this resource type.
To signal this for a particular item, the RestoreItemAction's
Execute method should call WithoutRestore() on the
RestoreItemActionExecuteOutput before returning it.
Signed-off-by: Scott Seago <sseago@redhat.com>
* Autogenerated code to support SkipRestore
Signed-off-by: Scott Seago <sseago@redhat.com>
* Added changelog for #1336
Signed-off-by: Scott Seago <sseago@redhat.com>
Original error was:
```
ark -n <redacted> restore logs <redacted>
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=nodes logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=events logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=events.events.k8s.io logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=backups.ark.heptio.com logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=restores.ark.heptio.com logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Starting restore of backup backup/<redacted>" logSource="pkg/restore/restore.go:342"
time="2019-03-06T18:31:06Z" level=info msg="error unzipping and extracting: open /tmp/604421455/resources/rolebindings.rbac.authorization.k8s.io/namespaces/<redacted>/<redacted>: too many open files" logSource="pkg/restore/restore.go:346"
```
Downloading the directory from s3 and untarring it I found 1036 files. The ulimit -n output says 1024. This is our team's best guess at a root cause. But the code fixed in the PR definitely is holding all the files open until the method closes: https://blog.learngoprogramming.com/gotchas-of-defer-in-go-1-8d070894cb01
Please note my go code abilities are not great and I did not test this. I just edited the file in github. All I did was remove the defer and put fileClose after the copy is done. Theoretically this should only hold one file open at a time now. Let me know if you want me to do any further steps.
Thank you,
-Asaf
Signed-off-by: Asaf Erlich <aerlich@groupon.com>
If a PV already exists, wait for it, it's associated PVC, and associated
namespace to be deleted before attempting to restore it.
If a namespace already exists, wait for it to be deleted before
attempting to restore it.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Instead of only removing the default token from a service account when
it already exists in the cluster, always remove it. If the service
account already exists, continue to do the merging logic.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
We were checking for nil, but were getting back an empty
*unstructured.Unstructured{} instead, along with a NotFound error.
Change the logic to check for the NotFound error instead of a nil
object.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
This is to better reflect the intent of the user when node ports are
specified explicitly (as opposed to being assigned by Kubernetes). The
`last-applied-configuration` annotation added by `kubectl apply` is one
such indicator we are now leveraging.
We still default to omitting the node ports when the annotation is
missing.
Signed-off-by: Timo Reimann <ttr314@googlemail.com>
Refactor plugin management:
- support multiple plugins per executable
- support restarting a plugin process in the event it terminates
- simplify plugin lifecycle management by using separate managers for
each scope (server vs backup vs restore)
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Mirror pods are pods created from static manifest files on a node.
They're mirrored to the apiserver so they're visible when querying the
apiserver for a list of pods, but it's not possible to send a pod
containing the mirror pod annotation to the apiserver and have it be
created successfully. Instead of trying to do this, log a message that
we're skipping restoring the pod because it's a mirror pod.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
If a PV has a reclaim policy of Delete and we didn't create a snapshot
of it, don't restore the PV, as doing so would create a PV whose
underlying volume is incorrect.
Also "reset" any PVCs bound to the PV so they'll be dynamically
provisioned when restored.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
All properties from a backup will be merged into the ServiceAccount
except for the default token secret.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>