satori/go.uuid has a known issue with random uuid generation.
gofrs/uuid is still maintained and has fixed the random uuid generation
issue present in satori/go.uuid
Signed-off-by: John Naulty <johnnaulty@bitgo.com>
* restic: don't try to restore PVBs with no snapshotID
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update import paths to github.com/vmware-tanzu/...
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update other GH org refs to vmware-tanzu
Signed-off-by: Steve Kriss <krisss@vmware.com>
* site and docs: update GH org to vmware-tanzu
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update travis badge links on docs readmes
Signed-off-by: Steve Kriss <krisss@vmware.com>
* rename PV during restore when cloning a namespace
Signed-off-by: Steve Kriss <krisss@vmware.com>
* rename func and vars, switch to if..else
Signed-off-by: Steve Kriss <krisss@vmware.com>
* make pv renamer func configurable for testing purposes
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add unit test cases
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* address review feedback
Signed-off-by: Steve Kriss <krisss@vmware.com>
* address review feedback
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add restore item action to change PVC/PV storage class name
Signed-off-by: Steve Kriss <krisss@vmware.com>
* code review
Signed-off-by: Steve Kriss <krisss@vmware.com>
* change existing plugin names to lowercase/hyphenated
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add validation for new storage class name
Signed-off-by: Steve Kriss <krisss@vmware.com>
* add test cases
Signed-off-by: Steve Kriss <krisss@vmware.com>
* changelog
Signed-off-by: Steve Kriss <krisss@vmware.com>
* fix imports
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update plugin names to be more consistent
Signed-off-by: Steve Kriss <krisss@vmware.com>
* update unit tests to use pkg/test object constructors
Signed-off-by: Steve Kriss <krisss@vmware.com>
CSI volumes are mounted one level deeper than "native" kubernetes
volumes, and this needs to be appended for proper restic support.
Fixes#1313.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Velero should handle cases when the label length exceeds 63 characters.
- if the length of the backup/restore name is <= 63 characters, use it as the value of the label
- if it's > 63 characters, take the SHA256 hash of the name. the value of
the label will be the first 57 characters of the backup/restore name
plus the first six characters of the SHA256 hash.
Fixes heptio#1021
Signed-off-by: Anshul Chandra <anshulc@vmware.com>
* Adds support for allowing a RestoreItemAction to skip item restore
This allows a RestoreItemAction plugin to signal to velero that
the returned item should be skipped rather than restored to the
cluster.
To support this, a boolean SkipRestore attribute is added to
RestoreItemActionExecuteOutput. If restore.restoreResource finds
this set to true, any remaining actions on this item are skipped,
and restore on this item is skipped. Execution continues with
the next item of this resource type.
To signal this for a particular item, the RestoreItemAction's
Execute method should call WithoutRestore() on the
RestoreItemActionExecuteOutput before returning it.
Signed-off-by: Scott Seago <sseago@redhat.com>
* Autogenerated code to support SkipRestore
Signed-off-by: Scott Seago <sseago@redhat.com>
* Added changelog for #1336
Signed-off-by: Scott Seago <sseago@redhat.com>
Original error was:
```
ark -n <redacted> restore logs <redacted>
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=nodes logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=events logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=events.events.k8s.io logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=backups.ark.heptio.com logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Not including resource" groupResource=restores.ark.heptio.com logSource="pkg/restore/restore.go:124"
time="2019-03-06T18:31:06Z" level=info msg="Starting restore of backup backup/<redacted>" logSource="pkg/restore/restore.go:342"
time="2019-03-06T18:31:06Z" level=info msg="error unzipping and extracting: open /tmp/604421455/resources/rolebindings.rbac.authorization.k8s.io/namespaces/<redacted>/<redacted>: too many open files" logSource="pkg/restore/restore.go:346"
```
Downloading the directory from s3 and untarring it I found 1036 files. The ulimit -n output says 1024. This is our team's best guess at a root cause. But the code fixed in the PR definitely is holding all the files open until the method closes: https://blog.learngoprogramming.com/gotchas-of-defer-in-go-1-8d070894cb01
Please note my go code abilities are not great and I did not test this. I just edited the file in github. All I did was remove the defer and put fileClose after the copy is done. Theoretically this should only hold one file open at a time now. Let me know if you want me to do any further steps.
Thank you,
-Asaf
Signed-off-by: Asaf Erlich <aerlich@groupon.com>
If a PV already exists, wait for it, it's associated PVC, and associated
namespace to be deleted before attempting to restore it.
If a namespace already exists, wait for it to be deleted before
attempting to restore it.
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Instead of only removing the default token from a service account when
it already exists in the cluster, always remove it. If the service
account already exists, continue to do the merging logic.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
We were checking for nil, but were getting back an empty
*unstructured.Unstructured{} instead, along with a NotFound error.
Change the logic to check for the NotFound error instead of a nil
object.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
This is to better reflect the intent of the user when node ports are
specified explicitly (as opposed to being assigned by Kubernetes). The
`last-applied-configuration` annotation added by `kubectl apply` is one
such indicator we are now leveraging.
We still default to omitting the node ports when the annotation is
missing.
Signed-off-by: Timo Reimann <ttr314@googlemail.com>
Refactor plugin management:
- support multiple plugins per executable
- support restarting a plugin process in the event it terminates
- simplify plugin lifecycle management by using separate managers for
each scope (server vs backup vs restore)
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Mirror pods are pods created from static manifest files on a node.
They're mirrored to the apiserver so they're visible when querying the
apiserver for a list of pods, but it's not possible to send a pod
containing the mirror pod annotation to the apiserver and have it be
created successfully. Instead of trying to do this, log a message that
we're skipping restoring the pod because it's a mirror pod.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
If a PV has a reclaim policy of Delete and we didn't create a snapshot
of it, don't restore the PV, as doing so would create a PV whose
underlying volume is incorrect.
Also "reset" any PVCs bound to the PV so they'll be dynamically
provisioned when restored.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
All properties from a backup will be merged into the ServiceAccount
except for the default token secret.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
Completed jobs and pods may be useful in the backup for auditing
purposes, but don't recreate them when restoring.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.
The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.
There are some cases right now that are known to not be equal via this
method:
- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at https://github.com/heptio/ark/issues/354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects do not match on spec fields, such as
claimRef and cloud provider persistent disk info
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
The main Ark code was hard-coding specific support for AWS, GCE, and
Azure volume snapshots and restores, and anything else was considered
unsupported.
Add GetVolumeID and SetVolumeID to the BlockStore interface, to allow
block store plugins to handle volume snapshots and restores.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Deleting the clusterIP field when the service should be headless will
cause it to be assigned a new IP on restore; instead it should retain
the headless state after restoration.
Fixes#168
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
Previously the directory structure separated resources depending on
whether or not they were cluster or namespace scoped. All cluster
resources were restored first, then all namespace resources. Priority
did not apply across both and you could not order any namespace
resources before any cluster resources.
This restructure sorts firstly on resource type.
resources/serviceaccounts/namespaces/ns1.json
resources/nodes/cluster/node1.json
This will break old backups as the format is no longer consistent as
announced on the Google group.
Signed-off-by: Devan Goodwin <dgoodwin@redhat.com>
- Read PV's AZ info from fault-domain label of the PV object for snapshotting.
- Store PV's AZ info in the VolumeInfo.
- Add tests for reading the label from the PV object.
- Remove availability zone validation in AWS and GCP BlockStorageAdaptor.
- Add volumeAZ as a parameter to methods in the BlockStorageAdapter interface.
- Get AZ from VolumeInfo when restoring PV snapshot.
- Remove references to PV availability zone in docs.
Signed-off-by: Ashish Amarnath <ashish.amarnath@gmail.com>
- Introduces similar Include/Exclude declaration on the Restore
resource and cli flags
- Kept support for legacy Namespaces attribute until it could be
deprecating. Defining both IncludeNamespaces and Namespaces results
in a validation error and the Restore will not be processed (shouldn't
be able to occur)
Signed-off-by: Justin Nauman <justin.r.nauman@gmail.com>