Writing to GCP's object store is any async operation, so errors need to
be checked both on write and close calls, since errors like permission
violations aren't reported until a close.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
Completed jobs and pods may be useful in the backup for auditing
purposes, but don't recreate them when restoring.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
This commit adds support for auto completion for bash and zsh
shells. A new root level command called "completion" has been
introduced, and the user can get the auto completion code by
running `ark completion bash/zsh`.
For bash completion, the built-in GenBashCompletion() from cobra
has been used, but for zsh, the built-in GenZshCompletion() is
known to cause issues. The workaround has been copied from zsh
completion code of kubectl.
Signed-off-by: Shubham <shubham@linux.com>
This commit introduces validation logic to `ark restore logs`
command, the way it already exists in other commands like `ark
restore create`.
Before the logs for a restore are fetched from the server, the
server is contacted to check if the specified restore exists. If
it does not, it errors out.
Fixes#389
Signed-off-by: Shubham <shubham@linux.com>
When restoring resources that raise an already exists error, check their
equality before logging a message on the restore. If they're the same
except for some metadata, don't generate a message.
The restore process was modified so that if an object had an empty
namespace string, no namespace key is created on the object. This was to
avoid manipulating the copy of the current cluster's object by adding
the target namespace.
There are some cases right now that are known to not be equal via this
method:
- The `default` ServiceAccount in a namespace will not match, primarily
because of differing default tokens. These will be handled in their own
patch
- IP addresses for Services are recorded in the backup object, but are
either not present on the cluster object, or different. An issue for
this already exists at https://github.com/heptio/ark/issues/354
- Endpoints have differing values for `renewTime`. This may be
insubstantial, but isn't currently handled by the resetMetadataAndStatus
function.
- PersistentVolume objects do not match on spec fields, such as
claimRef and cloud provider persistent disk info
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
When the BackupDeletionController processes a request, set the request's
backup-name and backup-uid labels if they aren't currently set.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Make sure a DeleteBackupRequest has its Spec.BackupName filled in. If
not, record an error in the status and mark the request as processed.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Always request DeleteBackupRequests for a given backup so we can show
failed deletion attempts if you try to delete a backup that has PV
snapshots when Ark doesn't have a persistentVolumeProvider configured.
When creating a DeleteBackupRequest, include a label for the UID so we
can match based on name and UID when associated DeleteBackupRequests
with a given backup.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
We ran into a lot of problems using a finalizer on the backup to allow
the Ark server to clean up all associated backup data when deleting a
backup.
Users also found it less than desirable that deleting the heptio-ark
namespace resulted in all the backup data being deleted.
This removes the finalizer and replaces it with an explicit
DeleteBackupRequest that is created as a means of requesting the
deletion of a backup and all its associated data. This is what `ark
backup delete` does.
If you use kubectl to delete a backup or to delete the heptio-ark
namespace, this no longer deletes associated backups. Additionally, as
long as the heptio-ark namespace still exists, the Ark server's
BackupSyncController will continually sync backups into the heptio-ark
namespace from object storage.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
This commit adds limitranges to defaultResourcePriorities as
suggested in #385.
This is done so that pods are not restored before the LimitRange
objects, because that would lead to pods not honoring the requests
and limits set in LimitRange objects.
Fixes#385
Signed-off-by: Shubham <shubham@linux.com>
Instead of requiring the Ark admin to specify a "location" in the azure
persistentVolumeProvider config (meaning only a single location is
supported), get info about the disk (for its location) when creating a
snapshot, and get info about the snapshot (for its location) when
creating a disk from a snapshot.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
If you want to test changes to the ark server without having to rebuild
and redeploy the ark container, this change allows you to do something
like this (assuming you've created your cloud credentials file):
AWS_SHARED_CREDENTIALS_FILE=credentials-minio ark server -n heptio-ark
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Add --force and --confirm to `ark backup delete` to support forced
backup deletion. This forcibly removes the Ark GC finalizer (if it's
present) from a backup and will orphan any resources associated with the
backup, such as backup tarballs in object storage, persistent volume
snapshots, and restores for the backup.
If a backup has a deletion timestamp, display `Deleting` in `ark backup
describe` and `ark backup get`.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Always try to create the config directory when saving the client config
in case it doesn't exist.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
When running `ark <resource> create`, a request is sent to the server,
but the status is not immediately known. Inform the user that a request
was sent and provide a way to get more information on it.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
When creating a restore based on a backup that doesn't exist, the
restore should be marked as invalid and the error clearly communicated
so the user understands why the restore wasn't made.
Previously, the restore was left as in progress with an error attached.
Since restores are CRDs and must be updated via a controller, there's
currently not a way to give the client immediate errors.
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
Add the ability for the Ark server to run in any namespace.
Add `ark client config get/set` for manipulating the new client
configuration file in $HOME/.config/ark/config.json. This holds client
defaults, such as the Ark server's namespace (to avoid having to specify
the --namespace flag all the time).
Add a --namespace flag to all client commands.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
If a backup item action errors, log the error as soon as it occurs, so
it's clear when the error happened. Also include information about the
groupResource, namespace, and name of the item in the error.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
When running a backup, try to do as much as possible, collecting errors
along the way, and return an aggregate at the end. This way, if a backup
fails for most reasons, we'll be able to upload the backup log file to
object storage, which wasn't happening before.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
The main Ark code was hard-coding specific support for AWS, GCE, and
Azure volume snapshots and restores, and anything else was considered
unsupported.
Add GetVolumeID and SetVolumeID to the BlockStore interface, to allow
block store plugins to handle volume snapshots and restores.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
The log location hook was matching github.com/heptio/ark and stripping
off that + 1 more char. This meant that
github.com/heptio/ark-plugin-example/foo.go was being listed as
plugin-example/foo.go instead of
github.com/heptio/ark-plugin-example/foo.go.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
If backing up specific namespaces with "auto" cluster resources, the
actual namespace objects themselves were not being included in the
backup. Restore would create them but any labels or metadata would be
lost.
Instead handle the special case of namespace as a cluster level resource
we may still need, even if excluding most cluster level resources.
Signed-off-by: Devan Goodwin <dgoodwin@redhat.com>
If you have a large number of warnings and/or errors, the restore
object's size can exceed the maximum allowed by etcd. Move them to
object storage, and add a new describe command to fetch and display them
on the fly.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Deleting the clusterIP field when the service should be headless will
cause it to be assigned a new IP on restore; instead it should retain
the headless state after restoration.
Fixes#168
Signed-off-by: Nolan Brubaker <nolan@heptio.com>
We only need them if we've got unstructured/unknown data and we want to
convert it to typed objects.
Signed-off-by: Andy Goldstein <andy.goldstein@gmail.com>
Previously the directory structure separated resources depending on
whether or not they were cluster or namespace scoped. All cluster
resources were restored first, then all namespace resources. Priority
did not apply across both and you could not order any namespace
resources before any cluster resources.
This restructure sorts firstly on resource type.
resources/serviceaccounts/namespaces/ns1.json
resources/nodes/cluster/node1.json
This will break old backups as the format is no longer consistent as
announced on the Google group.
Signed-off-by: Devan Goodwin <dgoodwin@redhat.com>
- Read PV's AZ info from fault-domain label of the PV object for snapshotting.
- Store PV's AZ info in the VolumeInfo.
- Add tests for reading the label from the PV object.
- Remove availability zone validation in AWS and GCP BlockStorageAdaptor.
- Add volumeAZ as a parameter to methods in the BlockStorageAdapter interface.
- Get AZ from VolumeInfo when restoring PV snapshot.
- Remove references to PV availability zone in docs.
Signed-off-by: Ashish Amarnath <ashish.amarnath@gmail.com>
- Introduced a blacklist of resources that are non-restorable. The
goal being that the backup can still include these resources for
logging/auditing purposes but they are explicitly added to
ExcludedResources in the RestorController's "defaulting" logic
to ensure that if someone were to explicitly ask for nodes
that they would be expressly denied.
Signed-off-by: Justin Nauman <justin.r.nauman@gmail.com>