diff --git a/design/graph-manifest.md b/design/graph-manifest.md new file mode 100644 index 000000000..e469b2a61 --- /dev/null +++ b/design/graph-manifest.md @@ -0,0 +1,219 @@ +# Object Graph Manifest for Velero + +## Abstract + +One to two sentences that describes the goal of this proposal and the problem being solved by the proposed change. +The reader should be able to tell by the title, and the opening paragraph, if this document is relevant to them. + +Currently, Velero does not have a complete manifest of everything in the backup, aside from the backup tarball itself. +This change introduces a new data structure to be stored with a backup in object storage which will allow for more efficient operations in reporting of what a backup contains. +Additionally, this manifest should enable advancements in Velero's features and architecture, enabling dry-run support, concurrent backup and restore operations, and reliable restoration of complex applications. + +## Background + +Right now, Velero backs up items one at a time, sorted by API Group and namespace. +It also restores items one at a time, using the restoreResourcePriorities flag to indicate which order API Groups should have their objects restored first. +While this does work currently, it presents challenges for more complex applications that have their dependencies in the form of a graph rather than strictly linear. + +For example, Cluster API clusters are a set of complex Kubernetes objects that require that the "root" objects are restored first, before their "leaf" objects. +If a Cluster that a ClusterResourceSetBinding refers to does not exist, then a restore of the CAPI cluster will fail. + +Additionally, Velero does not have a reliable way to communicate what objects will be affected in a backup or restore operation without actually performing the operation. +This complicates dry-run tasks, because a user must simply perform the action without knowing what will be touched. +It also complicates allowing backups and restores to run in parallel, because there is currently no way to know if a single Kubernetes object is included in multiple backups or restores, which can lead to unreliability, deadlocking, and race conditions were Velero made to be more concurrent today. + +## Goals + +- Introduce a manifest data structure that defines the contents of a backup. +- Store the manifest data into object storage alongside existing backup data. + +## Non Goals + +This proposal seeks to enable, but not define, the following. + +- Implementing concurrency beyond what already exists in Velero. +- Implementing a dry-run feature. +- Implementing a new restore ordering procedure. + +While the data structure should take these scenarios into account, they will not be implemented alongside it. + +## High-Level Design + +To uniquely identify a Kubernetes object within a cluster or backup, the following fields are sufficient: + +- API Group and Version (example: backup.velero.io/v1) +- Namespace +- Name +- Labels + +This criteria covers the majority of Velero's inclusion or exclusion logic. +However, some additional fields enable further use cases. + +- Owners, which are other Kubernetes objects that have some relationship to this object. They may be strict or soft dependencies. +- Annotations, which provide extra metadata about the object that might be useful for other programs to consume. +- UUID generated by Kubernetes. This is useful in defining Owner relationships, providing a single, immutable key to find an object. This is _not_ considered at restore time, only internally for defining links. + +All of this information already exists within a Velero backup's tarball of resources, but extracting such data is inefficient. +The entire tarball must be downloaded and extracted, and then JSON within parsed to read labels, owners, annotations, and a UUID. +The rest of the information is encoded in the file system structure within the Velero backup tarball. +While doable, this is heavyweight in terms of time and potentially memory. + +Instead, this proposal suggests adding a new manifest structure that is kept alongside the backup tarball. +This structure would contain the above fields only, and could be used to perform inclusion/exclusion logic on a backup, select a resource from within a backup, and do set operations over backup or restore contents to identify overlapping resources. + +Here are some use cases that this data structure should enable, that have been difficult to implement prior to its existence: + +- A dry-run operation on backup, informing the user what would be selected if they were to perform the operation. + A manifest could be created and saved, allowing for a user to do a dry-run, then accept it to perform the backup. + Restore operations can be treated similarly. +- Efficient, non-overlapping parallelization of backup and restore operations. + By building or reading a manifest before performing a backup or restore, Velero can determine if there are overlapping resources. + If there are no overlaps, the operations can proceed in parallel. + If there are overlaps, the operations can proveed serially. +- Graph-based restores for non-linear dependencies. + Not all resources in a Kubernetes cluster can be defined in a strict, linear way. + They may have multiple owners, and writing BackupItemActions or RestoreItemActions to simply return a chain of owners is not an efficient way to support the many Kubernetes operators/controllers being written. + Instead, by having a manifest with enough information, Velero can build a discrete list that ensures dependencies are restored before their dependents, with less input from plugin authors. + +## Detailed Design + +The Manifest data structure would look like this, in Go type structure: + +```golang +// NamespacedItems maps a given namespace to all of its contained items. +type NamespacedItems map[string]*Item + +// APIGroupNamespaces maps an API group/version to a map of namespaces and their items. +type KindNamespaces map[string]NamespacedItems + +type Manifest struct { + // Kinds holds the top level map of all resources in a manifest. + Kinds KindNamespaces + + // Index is used to look up an individual item quickly based on UUID. + // This enables fetching owners out of the maps more efficiently at the cost of memory space. + Index map[string]*Item +} + + +// Item represents a Kubernetes resource within a backup based on it's selectable criteria. +// It is not the whole Kubernetes resource as retrieved from the API server, but rather a collection of important fields needed for filtering. +type Item struct { + // Kubernetes API group which this Item belongs to. + // Could be a core resource, or a CustomResourceDefinition. + APIGroup string + + // Version of the APIGroup that the Item belongs to. + APIVersion string + + // Kubernetes namespace which contains this item. + // Empty string for cluster-level resource. + Namespace string + + // Item's given name. + Name string + + // Map of labels that the Item had at backup time. + Labels map[string]string + + // Map of annotations that the Item had at Backup time. + // Useful for plugins that may decide to process only Items with specific annotations. + Annotations map[string]string + + // Owners is a list of UUIDs to other items that own or refer to this item. + Owners []string + + // Manifest is a pointer to the Manifest in which this object is contained. + // Useful for getting access to things like the Manifest.Index map. + Manifest *Manifest +} +``` + +In addition to the new types, the following Go interfaces would be provided for convenience. + +```golang +type Itermer interface { + // Returns the Item as a string, following the current Velero backup version 1.1.0 tarball structure format. + // ///.json + String() string + + // Owners returns a slice of realized Items that own or refer to the current Item. + // Useful for building out a full graph of Items to restore. + // Will use the UUIDs in Item.Owners to look up the owner Items in the Manifest. + Owners() []*Item + + // Kind returns the Kind of an object, which is a combination of the APIGroup and APIVersion. + // Useful for verifying the needed CustomResourceDefinition exists before actually restoring this Item. + Kind() *Item + + // Children returns a slice of all Items that refer to this item as an Owner. + Children() []*Items +} + +// This error type is being created in order to make reliable sentinel errors. +// See https://dave.cheney.net/2019/06/10/constant-time for more details. +type ManifestError string + +func (e ManifestError) Error() string { + return string(e) +} + +const ItemAlreadyExists = ManifestError("item already exists in manifest") + +type Manifester interface { + // Set returns the entire list of resources as a set of strings (using Itemer.String). + // This is useful for comparing two manifests and determining if they have any overlapping resources. + // In the future, when implementing concurrent operations, this can be used as a sanity check to ensure resources aren't being backed up or restored by two operations at once. + Set() sets.String + + // Adds an item to the appropriate APIGroup and Namespace within a Manifest + // Returns (true, nil) if the Item is successfully added to the Manifest, + // Returns (false, ItemAlreadyExists) if the Item is already in the Manifest. + Add(*Item) (bool, error) +} +``` + +### Serialization + +The entire `Manifest` should be serialized into the `manifest.json` file within the object storage for a single backup. +It is possible that this file could also be compressed for space efficiency. + +### Memory Concerns + +Because the `Manifest` is holding a minimal amount of data, memory sizes should not be a concern for most clusters. +TODO: Document known limits on API group name, resource name, and kind name character limits. + +## Security Considerations + +Introducing this manifest does not increase the attack surface of Velero, as this data is already present in the existing backups. +Storing the manifest.json file next to the existing backup data in the object storage does not change access patterns. + +## Compatibility + +The introduction of this file should trigger Velero backup version 1.2.0, but it will not interfere with Velero versions that do not support the `Manifest` as the file will be additive. +In time, this file will replace the `-resource-list.json.gz` file, but for compatibility the two will appear side by side. + +When first implemented, Velero should simply build the `Manifest` as it backs up items, and serialize it at the end. +Any logic changes that rely on the `Manifest` file must be introduced with their own design document, with their own compatibility concerns. + +## Implementation + +The `Manifest` object will _not_ be implemented as a Kubernetes CustomResourceDefinition, but rather one of Velero's own internal constructs. + +Implementation for the data structure alone should be minimal - the types will need to be defined in a `manifest` package. +Then, the backup process should create a `Manifest`, passing it to the various `*Backuppers` in the `backup` package. +These methods will insert individual `Items` into the `Manifest`. +Finally, logic should be added to the `persistence` package to ensure that the new `manifest.json` file is uploadable and allowed. + +## Alternatives Considered + +None so far. + +## Open Issues + +- When should compatibility with the `-resource-list.json.gz` file be dropped? +- What are some good test case Kubernetes resources and controllers to try this out with? +Cluster API seems like an obvious choice, but are there others? +- Since it is not implemented as a CustomResourceDefinition, how can a `Manifest` be retained so that users could issue a dry-run command, then perform their actual desire operation? +Could it be stored in Velero's temp directories? +Note that this is making Velero itself more stateful.