Implement the Repo maintanence Job configuration design.

Remove the resource parameters from the velero server CLI.

Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
pull/8145/head
Xun Jiang 2024-08-21 15:14:53 +08:00
parent b92143dad1
commit 26cc41f26d
25 changed files with 1275 additions and 577 deletions

View File

@ -0,0 +1 @@
Implement the Repo maintenance Job configuration.

View File

@ -1,7 +1,7 @@
# Repository maintenance job configuration design
## Abstract
Add this design to make the repository maintenance job can read configuration from a dedicate ConfigMap and make the Job's necessary parts configurable, e.g. `PodSpec.Affinity` and `PodSpec.resources`.
Add this design to make the repository maintenance job can read configuration from a dedicate ConfigMap and make the Job's necessary parts configurable, e.g. `PodSpec.Affinity` and `PodSpec.Resources`.
## Background
Repository maintenance is split from the Velero server to a k8s Job in v1.14 by design [repository maintenance job](Implemented/repository-maintenance.md).
@ -18,7 +18,6 @@ This design reuses the data structure introduced by design [node-agent affinity
## Goals
- Unify the repository maintenance Job configuration at one place.
- Let user can choose repository maintenance Job running on which nodes.
- Replace the existing `velero server` parameters `--maintenance-job-cpu-request`, `--maintenance-job-mem-request`, `--maintenance-job-cpu-limit` and `--maintenance-job-mem-limit` by the proposal ConfigMap.
## Non Goals
- There was an [issue](https://github.com/vmware-tanzu/velero/issues/7911) to require the whole Job's PodSpec should be configurable. That's not in the scope of this design.
@ -27,8 +26,24 @@ This design reuses the data structure introduced by design [node-agent affinity
## Compatibility
v1.14 uses the `velero server` CLI's parameter to pass the repository maintenance job configuration.
In v1.15, those parameters are removed, including `--maintenance-job-cpu-request`, `--maintenance-job-mem-request`, `--maintenance-job-cpu-limit` and `--maintenance-job-mem-limit`.
Instead, the parameters are read from the ConfigMap specified by `velero server` CLI parameter `--repo-maintenance-job-config` introduced by this design.
In v1.15, those parameters are still kept, including `--maintenance-job-cpu-request`, `--maintenance-job-mem-request`, `--maintenance-job-cpu-limit`, `--maintenance-job-mem-limit`, and `--keep-latest-maintenance-jobs`.
But the parameters read from the ConfigMap specified by `velero server` CLI parameter `--repo-maintenance-job-config` introduced by this design have a higher priority.
If there `--repo-maintenance-job-config` is not specified, then the `velero server` parameters are used if provided.
If the `velero server` parameters are not specified too, then the default values are used.
* `--keep-latest-maintenance-jobs` default value is 3.
* `--maintenance-job-cpu-request` default value is 0.
* `--maintenance-job-mem-request` default value is 0.
* `--maintenance-job-cpu-limit` default value is 0.
* `--maintenance-job-mem-limit` default value is 0.
## Deprecation
Propose to deprecate the `velero server` parameters `--maintenance-job-cpu-request`, `--maintenance-job-mem-request`, `--maintenance-job-cpu-limit`, `--maintenance-job-mem-limit`, and `--keep-latest-maintenance-jobs` in release-1.15.
That means those parameters will be deleted in release-1.17.
After deletion, those resources-related parameters are replaced by the ConfigMap specified by `velero server` CLI's parameter `--repo-maintenance-job-config`.
`--keep-latest-maintenance-jobs` is deleted from `velero server` CLI. It turns into a non-configurable internal parameter, and its value is 3.
Please check [issue 7923](https://github.com/vmware-tanzu/velero/issues/7923) for more information why deleting this parameter.
## Design
This design introduces a new ConfigMap specified by `velero server` CLI parameter `--repo-maintenance-job-config` as the source of the repository maintenance job configuration. The specified ConfigMap is read from the namespace where Velero is installed.
@ -49,14 +64,12 @@ velero server \
### Structure
The data structure for ```repo-maintenance-job-config``` is as below:
```go
type MaintenanceConfigMap map[string]Configs
type Configs struct {
// LoadAffinity is the config for data path load affinity.
LoadAffinity []*LoadAffinity `json:"loadAffinity,omitempty"`
// Resources is the config for the CPU and memory resources setting.
Resource Resources `json:"resources,omitempty"`
// PodResources is the config for the CPU and memory resources setting.
PodResources *kube.PodResources `json:"podResources,omitempty"`
}
type LoadAffinity struct {
@ -64,31 +77,28 @@ type LoadAffinity struct {
NodeSelector metav1.LabelSelector `json:"nodeSelector"`
}
type Resources struct {
// The repository maintenance job CPU request setting
CPURequest string `json:"cpuRequest,omitempty"`
// The repository maintenance job memory request setting
MemRequest string `json:"memRequest,omitempty"`
// The repository maintenance job CPU limit setting
CPULimit string `json:"cpuLimit,omitempty"`
// The repository maintenance job memory limit setting
MemLimit string `json:"memLimit,omitempty"`
type PodResources struct {
CPURequest string `json:"cpuRequest,omitempty"`
MemoryRequest string `json:"memoryRequest,omitempty"`
CPULimit string `json:"cpuLimit,omitempty"`
MemoryLimit string `json:"memoryLimit,omitempty"`
}
```
The ConfigMap content is a map.
If there is a key value as `global` in the map, the key's value is applied to all BackupRepositories maintenance jobs that don't their own specific configuration in the ConfigMap.
If there is a key value as `global` in the map, the key's value is applied to all BackupRepositories maintenance jobs that cannot find their own specific configuration in the ConfigMap.
The other keys in the map is the combination of three elements of a BackupRepository:
* The namespace in which BackupRepository backs up volume data
* The BackupRepository referenced BackupStorageLocation's name
* The BackupRepository's type. Possible values are `kopia` and `restic`
* The namespace in which BackupRepository backs up volume data.
* The BackupRepository referenced BackupStorageLocation's name.
* The BackupRepository's type. Possible values are `kopia` and `restic`.
Those three keys can identify a [unique BackupRepository](https://github.com/vmware-tanzu/velero/blob/2fc6300f2239f250b40b0488c35feae59520f2d3/pkg/repository/backup_repo_op.go#L32-L37).
If there is a key match with BackupRepository, the key's value is applied to the BackupRepository's maintenance jobs.
By this way, it's possible to let user configure before the BackupRepository is created.
This is especially convenient for administrator configuring during the Velero installation.
For example, the following BackupRepository's key should be `test-default-kopia`
For example, the following BackupRepository's key should be `test-default-kopia`.
``` yaml
- apiVersion: velero.io/v1
kind: BackupRepository
@ -119,11 +129,11 @@ A sample of the ```repo-maintenance-job-config``` ConfigMap is as below:
cat <<EOF > repo-maintenance-job-config.json
{
"global": {
resources: {
podResources: {
"cpuRequest": "100m",
"cpuLimit": "200m",
"memRequest": "100Mi",
"memLimit": "200Mi"
"memoryRequest": "100Mi",
"memoryLimit": "200Mi"
},
"loadAffinity": [
{
@ -177,18 +187,18 @@ config := Configs {
LoadAffinity: nil,
// Resources is the config for the CPU and memory resources setting.
Resources: Resources{
PodResources: &kube.PodResources{
// The repository maintenance job CPU request setting
CPURequest: "0m",
// The repository maintenance job memory request setting
MemRequest: "0Mi",
MemoryRequest: "0Mi",
// The repository maintenance job CPU limit setting
CPULimit: "0m",
// The repository maintenance job memory limit setting
MemLimit: "0Mi",
MemoryLimit: "0Mi",
},
}
```
@ -204,17 +214,32 @@ For example, the ConfigMap content has two elements.
``` json
{
"global": {
"resources": {
"loadAffinity": [
{
"nodeSelector": {
"matchExpressions": [
{
"key": "cloud.google.com/machine-family",
"operator": "In",
"values": [
"e2"
]
}
]
}
},
],
"podResources": {
"cpuRequest": "100m",
"cpuLimit": "200m",
"memRequest": "100Mi",
"memLimit": "200Mi"
"memoryRequest": "100Mi",
"memoryLimit": "200Mi"
}
},
"ns1-default-kopia": {
"resources": {
"memRequest": "400Mi",
"memLimit": "800Mi"
"podResources": {
"memoryRequest": "400Mi",
"memoryLimit": "800Mi"
}
}
}
@ -223,19 +248,29 @@ The config value used for BackupRepository backing up volume data in namespace `
``` go
config := Configs {
// LoadAffinity is the config for data path load affinity.
LoadAffinity: nil,
// The repository maintenance job CPU request setting
CPURequest: "100m",
// The repository maintenance job memory request setting
MemRequest: "400Mi",
// The repository maintenance job CPU limit setting
CPULimit: "200m",
// The repository maintenance job memory limit setting
MemLimit: "800Mi",
LoadAffinity: []*kube.LoadAffinity{
{
NodeSelector: metav1.LabelSelector{
MatchExpressions: []metav1.LabelSelectorRequirement{
{
Key: "cloud.google.com/machine-family",
Operator: metav1.LabelSelectorOpIn,
Values: []string{"e2"},
},
},
},
},
},
PodResources: &kube.PodResources{
// The repository maintenance job CPU request setting
CPURequest: "",
// The repository maintenance job memory request setting
MemoryRequest: "400Mi",
// The repository maintenance job CPU limit setting
CPULimit: "",
// The repository maintenance job memory limit setting
MemoryLimit: "800Mi",
}
}
```

View File

@ -23,7 +23,6 @@ import (
"strings"
"time"
"github.com/vmware-tanzu/velero/pkg/repository"
"github.com/vmware-tanzu/velero/pkg/uploader"
"github.com/pkg/errors"
@ -85,7 +84,8 @@ type Options struct {
DefaultSnapshotMoveData bool
DisableInformerCache bool
ScheduleSkipImmediately bool
MaintenanceCfg repository.MaintenanceConfig
PodResources kubeutil.PodResources
KeepLatestMaintenanceJobs int
}
// BindFlags adds command line values to the options struct.
@ -130,11 +130,37 @@ func (o *Options) BindFlags(flags *pflag.FlagSet) {
flags.BoolVar(&o.DefaultSnapshotMoveData, "default-snapshot-move-data", o.DefaultSnapshotMoveData, "Bool flag to configure Velero server to move data by default for all snapshots supporting data movement. Optional.")
flags.BoolVar(&o.DisableInformerCache, "disable-informer-cache", o.DisableInformerCache, "Disable informer cache for Get calls on restore. With this enabled, it will speed up restore in cases where there are backup resources which already exist in the cluster, but for very large clusters this will increase velero memory usage. Default is false (don't disable). Optional.")
flags.BoolVar(&o.ScheduleSkipImmediately, "schedule-skip-immediately", o.ScheduleSkipImmediately, "Skip the first scheduled backup immediately after creating a schedule. Default is false (don't skip).")
flags.IntVar(&o.MaintenanceCfg.KeepLatestMaitenanceJobs, "keep-latest-maintenance-jobs", o.MaintenanceCfg.KeepLatestMaitenanceJobs, "Number of latest maintenance jobs to keep each repository. Optional.")
flags.StringVar(&o.MaintenanceCfg.CPURequest, "maintenance-job-cpu-request", o.MaintenanceCfg.CPURequest, "CPU request for maintenance jobs. Default is no limit.")
flags.StringVar(&o.MaintenanceCfg.MemRequest, "maintenance-job-mem-request", o.MaintenanceCfg.MemRequest, "Memory request for maintenance jobs. Default is no limit.")
flags.StringVar(&o.MaintenanceCfg.CPULimit, "maintenance-job-cpu-limit", o.MaintenanceCfg.CPULimit, "CPU limit for maintenance jobs. Default is no limit.")
flags.StringVar(&o.MaintenanceCfg.MemLimit, "maintenance-job-mem-limit", o.MaintenanceCfg.MemLimit, "Memory limit for maintenance jobs. Default is no limit.")
flags.IntVar(
&o.KeepLatestMaintenanceJobs,
"keep-latest-maintenance-jobs",
o.KeepLatestMaintenanceJobs,
"Number of latest maintenance jobs to keep each repository. Optional.",
)
flags.StringVar(
&o.PodResources.CPURequest,
"maintenance-job-cpu-request",
o.PodResources.CPURequest,
"CPU request for maintenance jobs. Default is no limit.",
)
flags.StringVar(
&o.PodResources.MemoryRequest,
"maintenance-job-mem-request",
o.PodResources.MemoryRequest,
"Memory request for maintenance jobs. Default is no limit.",
)
flags.StringVar(
&o.PodResources.CPULimit,
"maintenance-job-cpu-limit",
o.PodResources.CPULimit,
"CPU limit for maintenance jobs. Default is no limit.",
)
flags.StringVar(
&o.PodResources.MemoryLimit,
"maintenance-job-mem-limit",
o.PodResources.MemoryLimit,
"Memory limit for maintenance jobs. Default is no limit.",
)
}
// NewInstallOptions instantiates a new, default InstallOptions struct.
@ -231,7 +257,8 @@ func (o *Options) AsVeleroOptions() (*install.VeleroOptions, error) {
DefaultSnapshotMoveData: o.DefaultSnapshotMoveData,
DisableInformerCache: o.DisableInformerCache,
ScheduleSkipImmediately: o.ScheduleSkipImmediately,
MaintenanceCfg: o.MaintenanceCfg,
PodResources: o.PodResources,
KeepLatestMaintenanceJobs: o.KeepLatestMaintenanceJobs,
}, nil
}

View File

@ -294,7 +294,7 @@ func (s *nodeAgentServer) run() {
s.logger.WithError(err).Fatal("Unable to create the pod volume restore controller")
}
var loadAffinity *nodeagent.LoadAffinity
var loadAffinity *kube.LoadAffinity
if s.dataPathConfigs != nil && len(s.dataPathConfigs.LoadAffinity) > 0 {
loadAffinity = s.dataPathConfigs.LoadAffinity[0]
s.logger.Infof("Using customized loadAffinity %v", loadAffinity)
@ -316,7 +316,21 @@ func (s *nodeAgentServer) run() {
}
}
dataUploadReconciler := controller.NewDataUploadReconciler(s.mgr.GetClient(), s.mgr, s.kubeClient, s.csiSnapshotClient.SnapshotV1(), s.dataPathMgr, loadAffinity, backupPVCConfig, podResources, clock.RealClock{}, s.nodeName, s.config.dataMoverPrepareTimeout, s.logger, s.metrics)
dataUploadReconciler := controller.NewDataUploadReconciler(
s.mgr.GetClient(),
s.mgr,
s.kubeClient,
s.csiSnapshotClient.SnapshotV1(),
s.dataPathMgr,
loadAffinity,
backupPVCConfig,
podResources,
clock.RealClock{},
s.nodeName,
s.config.dataMoverPrepareTimeout,
s.logger,
s.metrics,
)
if err = dataUploadReconciler.SetupWithManager(s.mgr); err != nil {
s.logger.WithError(err).Fatal("Unable to create the data upload controller")
}

View File

@ -11,9 +11,9 @@ import (
"github.com/vmware-tanzu/velero/pkg/cmd/util/flag"
"github.com/vmware-tanzu/velero/pkg/constant"
"github.com/vmware-tanzu/velero/pkg/podvolume"
"github.com/vmware-tanzu/velero/pkg/repository"
"github.com/vmware-tanzu/velero/pkg/types"
"github.com/vmware-tanzu/velero/pkg/uploader"
"github.com/vmware-tanzu/velero/pkg/util/kube"
"github.com/vmware-tanzu/velero/pkg/util/logging"
)
@ -47,6 +47,12 @@ const (
// defaultCredentialsDirectory is the path on disk where credential
// files will be written to
defaultCredentialsDirectory = "/tmp/credentials"
DefaultKeepLatestMaintenanceJobs = 3
DefaultMaintenanceJobCPURequest = "0"
DefaultMaintenanceJobCPULimit = "0"
DefaultMaintenanceJobMemRequest = "0"
DefaultMaintenanceJobMemLimit = "0"
)
var (
@ -164,9 +170,11 @@ type Config struct {
DefaultSnapshotMoveData bool
DisableInformerCache bool
ScheduleSkipImmediately bool
MaintenanceCfg repository.MaintenanceConfig
BackukpRepoConfig string
CredentialsDirectory string
BackupRepoConfig string
RepoMaintenanceJobConfig string
PodResources kube.PodResources
KeepLatestMaintenanceJobs int
}
func GetDefaultConfig() *Config {
@ -197,13 +205,13 @@ func GetDefaultConfig() *Config {
DisableInformerCache: defaultDisableInformerCache,
ScheduleSkipImmediately: false,
CredentialsDirectory: defaultCredentialsDirectory,
}
config.MaintenanceCfg = repository.MaintenanceConfig{
KeepLatestMaitenanceJobs: repository.DefaultKeepLatestMaitenanceJobs,
// maintenance job log setting inherited from velero server
FormatFlag: config.LogFormat,
LogLevelFlag: config.LogLevel,
PodResources: kube.PodResources{
CPURequest: DefaultMaintenanceJobCPULimit,
CPULimit: DefaultMaintenanceJobCPURequest,
MemoryRequest: DefaultMaintenanceJobMemRequest,
MemoryLimit: DefaultMaintenanceJobMemLimit,
},
KeepLatestMaintenanceJobs: DefaultKeepLatestMaintenanceJobs,
}
return config
@ -238,11 +246,48 @@ func (c *Config) BindFlags(flags *pflag.FlagSet) {
flags.BoolVar(&c.DefaultSnapshotMoveData, "default-snapshot-move-data", c.DefaultSnapshotMoveData, "Move data by default for all snapshots supporting data movement.")
flags.BoolVar(&c.DisableInformerCache, "disable-informer-cache", c.DisableInformerCache, "Disable informer cache for Get calls on restore. With this enabled, it will speed up restore in cases where there are backup resources which already exist in the cluster, but for very large clusters this will increase velero memory usage. Default is false (don't disable).")
flags.BoolVar(&c.ScheduleSkipImmediately, "schedule-skip-immediately", c.ScheduleSkipImmediately, "Skip the first scheduled backup immediately after creating a schedule. Default is false (don't skip).")
flags.IntVar(&c.MaintenanceCfg.KeepLatestMaitenanceJobs, "keep-latest-maintenance-jobs", c.MaintenanceCfg.KeepLatestMaitenanceJobs, "Number of latest maintenance jobs to keep each repository. Optional.")
flags.StringVar(&c.MaintenanceCfg.CPURequest, "maintenance-job-cpu-request", c.MaintenanceCfg.CPURequest, "CPU request for maintenance job. Default is no limit.")
flags.StringVar(&c.MaintenanceCfg.MemRequest, "maintenance-job-mem-request", c.MaintenanceCfg.MemRequest, "Memory request for maintenance job. Default is no limit.")
flags.StringVar(&c.MaintenanceCfg.CPULimit, "maintenance-job-cpu-limit", c.MaintenanceCfg.CPULimit, "CPU limit for maintenance job. Default is no limit.")
flags.StringVar(&c.MaintenanceCfg.MemLimit, "maintenance-job-mem-limit", c.MaintenanceCfg.MemLimit, "Memory limit for maintenance job. Default is no limit.")
flags.StringVar(&c.BackukpRepoConfig, "backup-repository-config", c.BackukpRepoConfig, "The name of configMap containing backup repository configurations.")
flags.Var(&c.DefaultVolumeSnapshotLocations, "default-volume-snapshot-locations", "List of unique volume providers and default volume snapshot location (provider1:location-01,provider2:location-02,...)")
flags.IntVar(
&c.KeepLatestMaintenanceJobs,
"keep-latest-maintenance-jobs",
c.KeepLatestMaintenanceJobs,
"Number of latest maintenance jobs to keep each repository. Optional.",
)
flags.StringVar(
&c.PodResources.CPURequest,
"maintenance-job-cpu-request",
c.PodResources.CPURequest,
"CPU request for maintenance job. Default is no limit.",
)
flags.StringVar(
&c.PodResources.MemoryRequest,
"maintenance-job-mem-request",
c.PodResources.MemoryRequest,
"Memory request for maintenance job. Default is no limit.",
)
flags.StringVar(
&c.PodResources.CPULimit,
"maintenance-job-cpu-limit",
c.PodResources.CPULimit,
"CPU limit for maintenance job. Default is no limit.",
)
flags.StringVar(
&c.PodResources.MemoryLimit,
"maintenance-job-mem-limit",
c.PodResources.MemoryLimit,
"Memory limit for maintenance job. Default is no limit.",
)
flags.StringVar(
&c.BackupRepoConfig,
"backup-repository-config",
c.BackupRepoConfig,
"The name of configMap containing backup repository configurations.",
)
flags.StringVar(
&c.RepoMaintenanceJobConfig,
"repo-maintenance-job-config",
c.RepoMaintenanceJobConfig,
"The name of ConfigMap containing repository maintenance Job configurations.",
)
}

View File

@ -9,11 +9,11 @@ import (
func TestGetDefaultConfig(t *testing.T) {
config := GetDefaultConfig()
assert.Equal(t, "info", config.MaintenanceCfg.LogLevelFlag.String())
assert.Equal(t, "0", config.PodResources.CPULimit)
}
func TestBindFlags(t *testing.T) {
config := GetDefaultConfig()
config.BindFlags(pflag.CommandLine)
assert.Equal(t, "info", config.MaintenanceCfg.LogLevelFlag.String())
assert.Equal(t, "0", config.PodResources.CPULimit)
}

View File

@ -469,7 +469,20 @@ func (s *server) initRepoManager() error {
s.repoLocker = repository.NewRepoLocker()
s.repoEnsurer = repository.NewEnsurer(s.mgr.GetClient(), s.logger, s.config.ResourceTimeout)
s.repoManager = repository.NewManager(s.namespace, s.mgr.GetClient(), s.repoLocker, s.repoEnsurer, s.credentialFileStore, s.credentialSecretStore, s.config.MaintenanceCfg, s.logger)
s.repoManager = repository.NewManager(
s.namespace,
s.mgr.GetClient(),
s.repoLocker,
s.repoEnsurer,
s.credentialFileStore,
s.credentialSecretStore,
s.config.RepoMaintenanceJobConfig,
s.config.PodResources,
s.config.KeepLatestMaintenanceJobs,
s.logger,
s.logLevel,
s.config.LogFormat,
)
return nil
}
@ -683,7 +696,14 @@ func (s *server) runControllers(defaultVolumeSnapshotLocations map[string]string
}
if _, ok := enabledRuntimeControllers[constant.ControllerBackupRepo]; ok {
if err := controller.NewBackupRepoReconciler(s.namespace, s.logger, s.mgr.GetClient(), s.config.RepoMaintenanceFrequency, s.config.BackukpRepoConfig, s.repoManager).SetupWithManager(s.mgr); err != nil {
if err := controller.NewBackupRepoReconciler(
s.namespace,
s.logger,
s.mgr.GetClient(),
s.config.RepoMaintenanceFrequency,
s.config.BackupRepoConfig,
s.repoManager,
).SetupWithManager(s.mgr); err != nil {
s.logger.Fatal(err, "unable to create controller", "controller", constant.ControllerBackupRepo)
}
}

View File

@ -55,19 +55,19 @@ type BackupRepoReconciler struct {
logger logrus.FieldLogger
clock clocks.WithTickerAndDelayedExecution
maintenanceFrequency time.Duration
backukpRepoConfig string
backupRepoConfig string
repositoryManager repository.Manager
}
func NewBackupRepoReconciler(namespace string, logger logrus.FieldLogger, client client.Client,
maintenanceFrequency time.Duration, backukpRepoConfig string, repositoryManager repository.Manager) *BackupRepoReconciler {
maintenanceFrequency time.Duration, backupRepoConfig string, repositoryManager repository.Manager) *BackupRepoReconciler {
c := &BackupRepoReconciler{
client,
namespace,
logger,
clocks.RealClock{},
maintenanceFrequency,
backukpRepoConfig,
backupRepoConfig,
repositoryManager,
}
@ -229,7 +229,7 @@ func (r *BackupRepoReconciler) getIdentiferByBSL(ctx context.Context, req *veler
}
func (r *BackupRepoReconciler) initializeRepo(ctx context.Context, req *velerov1api.BackupRepository, log logrus.FieldLogger) error {
log.WithField("repoConfig", r.backukpRepoConfig).Info("Initializing backup repository")
log.WithField("repoConfig", r.backupRepoConfig).Info("Initializing backup repository")
// confirm the repo's BackupStorageLocation is valid
repoIdentifier, err := r.getIdentiferByBSL(ctx, req)
@ -244,7 +244,7 @@ func (r *BackupRepoReconciler) initializeRepo(ctx context.Context, req *velerov1
})
}
config, err := getBackupRepositoryConfig(ctx, r, r.backukpRepoConfig, r.namespace, req.Name, req.Spec.RepositoryType, log)
config, err := getBackupRepositoryConfig(ctx, r, r.backupRepoConfig, r.namespace, req.Name, req.Spec.RepositoryType, log)
if err != nil {
log.WithError(err).Warn("Failed to get repo config, repo config is ignored")
} else if config != nil {

View File

@ -21,6 +21,7 @@ import (
"fmt"
"time"
snapshotter "github.com/kubernetes-csi/external-snapshotter/client/v7/clientset/versioned/typed/volumesnapshot/v1"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
corev1 "k8s.io/api/core/v1"
@ -39,8 +40,6 @@ import (
"sigs.k8s.io/controller-runtime/pkg/predicate"
"sigs.k8s.io/controller-runtime/pkg/reconcile"
snapshotter "github.com/kubernetes-csi/external-snapshotter/client/v7/clientset/versioned/typed/volumesnapshot/v1"
"github.com/vmware-tanzu/velero/pkg/apis/velero/shared"
velerov1api "github.com/vmware-tanzu/velero/pkg/apis/velero/v1"
velerov2alpha1api "github.com/vmware-tanzu/velero/pkg/apis/velero/v2alpha1"
@ -71,31 +70,49 @@ type DataUploadReconciler struct {
logger logrus.FieldLogger
snapshotExposerList map[velerov2alpha1api.SnapshotType]exposer.SnapshotExposer
dataPathMgr *datapath.Manager
loadAffinity *nodeagent.LoadAffinity
loadAffinity *kube.LoadAffinity
backupPVCConfig map[string]nodeagent.BackupPVC
podResources corev1.ResourceRequirements
preparingTimeout time.Duration
metrics *metrics.ServerMetrics
}
func NewDataUploadReconciler(client client.Client, mgr manager.Manager, kubeClient kubernetes.Interface, csiSnapshotClient snapshotter.SnapshotV1Interface,
dataPathMgr *datapath.Manager, loadAffinity *nodeagent.LoadAffinity, backupPVCConfig map[string]nodeagent.BackupPVC, podResources corev1.ResourceRequirements,
clock clocks.WithTickerAndDelayedExecution, nodeName string, preparingTimeout time.Duration, log logrus.FieldLogger, metrics *metrics.ServerMetrics) *DataUploadReconciler {
func NewDataUploadReconciler(
client client.Client,
mgr manager.Manager,
kubeClient kubernetes.Interface,
csiSnapshotClient snapshotter.SnapshotV1Interface,
dataPathMgr *datapath.Manager,
loadAffinity *kube.LoadAffinity,
backupPVCConfig map[string]nodeagent.BackupPVC,
podResources corev1.ResourceRequirements,
clock clocks.WithTickerAndDelayedExecution,
nodeName string,
preparingTimeout time.Duration,
log logrus.FieldLogger,
metrics *metrics.ServerMetrics,
) *DataUploadReconciler {
return &DataUploadReconciler{
client: client,
mgr: mgr,
kubeClient: kubeClient,
csiSnapshotClient: csiSnapshotClient,
Clock: clock,
nodeName: nodeName,
logger: log,
snapshotExposerList: map[velerov2alpha1api.SnapshotType]exposer.SnapshotExposer{velerov2alpha1api.SnapshotTypeCSI: exposer.NewCSISnapshotExposer(kubeClient, csiSnapshotClient, log)},
dataPathMgr: dataPathMgr,
loadAffinity: loadAffinity,
backupPVCConfig: backupPVCConfig,
podResources: podResources,
preparingTimeout: preparingTimeout,
metrics: metrics,
client: client,
mgr: mgr,
kubeClient: kubeClient,
csiSnapshotClient: csiSnapshotClient,
Clock: clock,
nodeName: nodeName,
logger: log,
snapshotExposerList: map[velerov2alpha1api.SnapshotType]exposer.SnapshotExposer{
velerov2alpha1api.SnapshotTypeCSI: exposer.NewCSISnapshotExposer(
kubeClient,
csiSnapshotClient,
log,
),
},
dataPathMgr: dataPathMgr,
loadAffinity: loadAffinity,
backupPVCConfig: backupPVCConfig,
podResources: podResources,
preparingTimeout: preparingTimeout,
metrics: metrics,
}
}

View File

@ -231,8 +231,21 @@ func initDataUploaderReconcilerWithError(needError ...error) (*DataUploadReconci
fakeSnapshotClient := snapshotFake.NewSimpleClientset(vsObject, vscObj)
fakeKubeClient := clientgofake.NewSimpleClientset(daemonSet)
return NewDataUploadReconciler(fakeClient, nil, fakeKubeClient, fakeSnapshotClient.SnapshotV1(), dataPathMgr, nil, map[string]nodeagent.BackupPVC{},
corev1.ResourceRequirements{}, testclocks.NewFakeClock(now), "test-node", time.Minute*5, velerotest.NewLogger(), metrics.NewServerMetrics()), nil
return NewDataUploadReconciler(
fakeClient,
nil,
fakeKubeClient,
fakeSnapshotClient.SnapshotV1(),
dataPathMgr,
nil,
map[string]nodeagent.BackupPVC{},
corev1.ResourceRequirements{},
testclocks.NewFakeClock(now),
"test-node",
time.Minute*5,
velerotest.NewLogger(),
metrics.NewServerMetrics(),
), nil
}
func dataUploadBuilder() *builder.DataUploadBuilder {

View File

@ -66,7 +66,7 @@ type CSISnapshotExposeParam struct {
VolumeSize resource.Quantity
// Affinity specifies the node affinity of the backup pod
Affinity *nodeagent.LoadAffinity
Affinity *kube.LoadAffinity
// BackupPVCConfig is the config for backupPVC (intermediate PVC) of snapshot data movement
BackupPVCConfig map[string]nodeagent.BackupPVC
@ -194,7 +194,15 @@ func (e *csiSnapshotExposer) Expose(ctx context.Context, ownerObject corev1.Obje
}
}()
backupPod, err := e.createBackupPod(ctx, ownerObject, backupPVC, csiExposeParam.OperationTimeout, csiExposeParam.HostingPodLabels, csiExposeParam.Affinity, csiExposeParam.Resources)
backupPod, err := e.createBackupPod(
ctx,
ownerObject,
backupPVC,
csiExposeParam.OperationTimeout,
csiExposeParam.HostingPodLabels,
csiExposeParam.Affinity,
csiExposeParam.Resources,
)
if err != nil {
return errors.Wrap(err, "error to create backup pod")
}
@ -425,8 +433,15 @@ func (e *csiSnapshotExposer) createBackupPVC(ctx context.Context, ownerObject co
return created, err
}
func (e *csiSnapshotExposer) createBackupPod(ctx context.Context, ownerObject corev1.ObjectReference, backupPVC *corev1.PersistentVolumeClaim, operationTimeout time.Duration,
label map[string]string, affinity *nodeagent.LoadAffinity, resources corev1.ResourceRequirements) (*corev1.Pod, error) {
func (e *csiSnapshotExposer) createBackupPod(
ctx context.Context,
ownerObject corev1.ObjectReference,
backupPVC *corev1.PersistentVolumeClaim,
operationTimeout time.Duration,
label map[string]string,
affinity *kube.LoadAffinity,
resources corev1.ResourceRequirements,
) (*corev1.Pod, error) {
podName := ownerObject.Name
containerName := string(ownerObject.UID)
@ -474,6 +489,11 @@ func (e *csiSnapshotExposer) createBackupPod(ctx context.Context, ownerObject co
userID := int64(0)
affinityList := make([]*kube.LoadAffinity, 0)
if affinity != nil {
affinityList = append(affinityList, affinity)
}
pod := &corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: podName,
@ -502,7 +522,7 @@ func (e *csiSnapshotExposer) createBackupPod(ctx context.Context, ownerObject co
},
},
},
Affinity: toSystemAffinity(affinity),
Affinity: kube.ToSystemAffinity(affinityList),
Containers: []corev1.Container{
{
Name: containerName,
@ -532,42 +552,3 @@ func (e *csiSnapshotExposer) createBackupPod(ctx context.Context, ownerObject co
return e.kubeClient.CoreV1().Pods(ownerObject.Namespace).Create(ctx, pod, metav1.CreateOptions{})
}
func toSystemAffinity(loadAffinity *nodeagent.LoadAffinity) *corev1.Affinity {
if loadAffinity == nil {
return nil
}
requirements := []corev1.NodeSelectorRequirement{}
for k, v := range loadAffinity.NodeSelector.MatchLabels {
requirements = append(requirements, corev1.NodeSelectorRequirement{
Key: k,
Values: []string{v},
Operator: corev1.NodeSelectorOpIn,
})
}
for _, exp := range loadAffinity.NodeSelector.MatchExpressions {
requirements = append(requirements, corev1.NodeSelectorRequirement{
Key: exp.Key,
Values: exp.Values,
Operator: corev1.NodeSelectorOperator(exp.Operator),
})
}
if len(requirements) == 0 {
return nil
}
return &corev1.Affinity{
NodeAffinity: &corev1.NodeAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{
NodeSelectorTerms: []corev1.NodeSelectorTerm{
{
MatchExpressions: requirements,
},
},
},
},
}
}

View File

@ -19,31 +19,27 @@ package exposer
import (
"context"
"fmt"
"reflect"
"testing"
"time"
"k8s.io/utils/pointer"
snapshotv1api "github.com/kubernetes-csi/external-snapshotter/client/v7/apis/volumesnapshot/v1"
snapshotFake "github.com/kubernetes-csi/external-snapshotter/client/v7/clientset/versioned/fake"
"github.com/pkg/errors"
"github.com/stretchr/testify/assert"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/client-go/kubernetes/fake"
clientTesting "k8s.io/client-go/testing"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
"k8s.io/utils/pointer"
clientFake "sigs.k8s.io/controller-runtime/pkg/client/fake"
velerov1 "github.com/vmware-tanzu/velero/pkg/apis/velero/v1"
"github.com/vmware-tanzu/velero/pkg/nodeagent"
velerotest "github.com/vmware-tanzu/velero/pkg/test"
"github.com/vmware-tanzu/velero/pkg/util/boolptr"
clientFake "sigs.k8s.io/controller-runtime/pkg/client/fake"
)
type reactor struct {
@ -820,105 +816,6 @@ func TestPeekExpose(t *testing.T) {
}
}
func TestToSystemAffinity(t *testing.T) {
tests := []struct {
name string
loadAffinity *nodeagent.LoadAffinity
expected *corev1.Affinity
}{
{
name: "loadAffinity is nil",
},
{
name: "loadAffinity is empty",
loadAffinity: &nodeagent.LoadAffinity{},
},
{
name: "with match label",
loadAffinity: &nodeagent.LoadAffinity{
NodeSelector: metav1.LabelSelector{
MatchLabels: map[string]string{
"key-1": "value-1",
},
},
},
expected: &corev1.Affinity{
NodeAffinity: &corev1.NodeAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{
NodeSelectorTerms: []corev1.NodeSelectorTerm{
{
MatchExpressions: []corev1.NodeSelectorRequirement{
{
Key: "key-1",
Values: []string{"value-1"},
Operator: corev1.NodeSelectorOpIn,
},
},
},
},
},
},
},
},
{
name: "with match expression",
loadAffinity: &nodeagent.LoadAffinity{
NodeSelector: metav1.LabelSelector{
MatchLabels: map[string]string{
"key-2": "value-2",
},
MatchExpressions: []metav1.LabelSelectorRequirement{
{
Key: "key-3",
Values: []string{"value-3-1", "value-3-2"},
Operator: metav1.LabelSelectorOpNotIn,
},
{
Key: "key-4",
Values: []string{"value-4-1", "value-4-2", "value-4-3"},
Operator: metav1.LabelSelectorOpDoesNotExist,
},
},
},
},
expected: &corev1.Affinity{
NodeAffinity: &corev1.NodeAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{
NodeSelectorTerms: []corev1.NodeSelectorTerm{
{
MatchExpressions: []corev1.NodeSelectorRequirement{
{
Key: "key-2",
Values: []string{"value-2"},
Operator: corev1.NodeSelectorOpIn,
},
{
Key: "key-3",
Values: []string{"value-3-1", "value-3-2"},
Operator: corev1.NodeSelectorOpNotIn,
},
{
Key: "key-4",
Values: []string{"value-4-1", "value-4-2", "value-4-3"},
Operator: corev1.NodeSelectorOpDoesNotExist,
},
},
},
},
},
},
},
},
}
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
affinity := toSystemAffinity(test.loadAffinity)
assert.True(t, reflect.DeepEqual(affinity, test.expected))
})
}
}
func Test_csiSnapshotExposer_createBackupPVC(t *testing.T) {
backup := &velerov1.Backup{
TypeMeta: metav1.TypeMeta{

View File

@ -27,7 +27,7 @@ import (
"github.com/vmware-tanzu/velero/internal/velero"
"github.com/vmware-tanzu/velero/pkg/builder"
"github.com/vmware-tanzu/velero/pkg/repository"
"github.com/vmware-tanzu/velero/pkg/util/kube"
)
type podTemplateOption func(*podTemplateConfig)
@ -52,7 +52,8 @@ type podTemplateConfig struct {
privilegedNodeAgent bool
disableInformerCache bool
scheduleSkipImmediately bool
maintenanceConfig repository.MaintenanceConfig
podResources kube.PodResources
keepLatestMaintenanceJobs int
}
func WithImage(image string) podTemplateOption {
@ -179,9 +180,15 @@ func WithScheduleSkipImmediately(b bool) podTemplateOption {
}
}
func WithMaintenanceConfig(config repository.MaintenanceConfig) podTemplateOption {
func WithPodResources(podResources kube.PodResources) podTemplateOption {
return func(c *podTemplateConfig) {
c.maintenanceConfig = config
c.podResources = podResources
}
}
func WithKeepLatestMaintenanceJobs(keepLatestMaintenanceJobs int) podTemplateOption {
return func(c *podTemplateConfig) {
c.keepLatestMaintenanceJobs = keepLatestMaintenanceJobs
}
}
@ -242,24 +249,24 @@ func Deployment(namespace string, opts ...podTemplateOption) *appsv1.Deployment
args = append(args, fmt.Sprintf("--fs-backup-timeout=%v", c.podVolumeOperationTimeout))
}
if c.maintenanceConfig.KeepLatestMaitenanceJobs > 0 {
args = append(args, fmt.Sprintf("--keep-latest-maintenance-jobs=%d", c.maintenanceConfig.KeepLatestMaitenanceJobs))
if c.keepLatestMaintenanceJobs > 0 {
args = append(args, fmt.Sprintf("--keep-latest-maintenance-jobs=%d", c.keepLatestMaintenanceJobs))
}
if c.maintenanceConfig.CPULimit != "" {
args = append(args, fmt.Sprintf("--maintenance-job-cpu-limit=%s", c.maintenanceConfig.CPULimit))
if len(c.podResources.CPULimit) > 0 {
args = append(args, fmt.Sprintf("--maintenance-job-cpu-limit=%s", c.podResources.CPULimit))
}
if c.maintenanceConfig.CPURequest != "" {
args = append(args, fmt.Sprintf("--maintenance-job-cpu-request=%s", c.maintenanceConfig.CPURequest))
if len(c.podResources.CPURequest) > 0 {
args = append(args, fmt.Sprintf("--maintenance-job-cpu-request=%s", c.podResources.CPURequest))
}
if c.maintenanceConfig.MemLimit != "" {
args = append(args, fmt.Sprintf("--maintenance-job-mem-limit=%s", c.maintenanceConfig.MemLimit))
if len(c.podResources.MemoryLimit) > 0 {
args = append(args, fmt.Sprintf("--maintenance-job-mem-limit=%s", c.podResources.MemoryLimit))
}
if c.maintenanceConfig.MemRequest != "" {
args = append(args, fmt.Sprintf("--maintenance-job-mem-request=%s", c.maintenanceConfig.MemRequest))
if len(c.podResources.MemoryRequest) > 0 {
args = append(args, fmt.Sprintf("--maintenance-job-mem-request=%s", c.podResources.MemoryRequest))
}
deployment := &appsv1.Deployment{

View File

@ -23,7 +23,7 @@ import (
"github.com/stretchr/testify/assert"
corev1 "k8s.io/api/core/v1"
"github.com/vmware-tanzu/velero/pkg/repository"
"github.com/vmware-tanzu/velero/pkg/util/kube"
)
func TestDeployment(t *testing.T) {
@ -71,17 +71,24 @@ func TestDeployment(t *testing.T) {
assert.Len(t, deploy.Spec.Template.Spec.Containers[0].Args, 2)
assert.Equal(t, "--disable-informer-cache=true", deploy.Spec.Template.Spec.Containers[0].Args[1])
deploy = Deployment("velero", WithMaintenanceConfig(repository.MaintenanceConfig{
KeepLatestMaitenanceJobs: 3,
CPURequest: "100m",
MemRequest: "256Mi",
CPULimit: "200m",
MemLimit: "512Mi",
}))
assert.Len(t, deploy.Spec.Template.Spec.Containers[0].Args, 6)
deploy = Deployment("velero", WithKeepLatestMaintenanceJobs(3))
assert.Len(t, deploy.Spec.Template.Spec.Containers[0].Args, 2)
assert.Equal(t, "--keep-latest-maintenance-jobs=3", deploy.Spec.Template.Spec.Containers[0].Args[1])
assert.Equal(t, "--maintenance-job-cpu-limit=200m", deploy.Spec.Template.Spec.Containers[0].Args[2])
assert.Equal(t, "--maintenance-job-cpu-request=100m", deploy.Spec.Template.Spec.Containers[0].Args[3])
assert.Equal(t, "--maintenance-job-mem-limit=512Mi", deploy.Spec.Template.Spec.Containers[0].Args[4])
assert.Equal(t, "--maintenance-job-mem-request=256Mi", deploy.Spec.Template.Spec.Containers[0].Args[5])
deploy = Deployment(
"velero",
WithPodResources(
kube.PodResources{
CPURequest: "100m",
MemoryRequest: "256Mi",
CPULimit: "200m",
MemoryLimit: "512Mi",
},
),
)
assert.Len(t, deploy.Spec.Template.Spec.Containers[0].Args, 5)
assert.Equal(t, "--maintenance-job-cpu-limit=200m", deploy.Spec.Template.Spec.Containers[0].Args[1])
assert.Equal(t, "--maintenance-job-cpu-request=100m", deploy.Spec.Template.Spec.Containers[0].Args[2])
assert.Equal(t, "--maintenance-job-mem-limit=512Mi", deploy.Spec.Template.Spec.Containers[0].Args[3])
assert.Equal(t, "--maintenance-job-mem-request=256Mi", deploy.Spec.Template.Spec.Containers[0].Args[4])
}

View File

@ -30,8 +30,7 @@ import (
v1crds "github.com/vmware-tanzu/velero/config/crd/v1/crds"
v2alpha1crds "github.com/vmware-tanzu/velero/config/crd/v2alpha1/crds"
velerov1api "github.com/vmware-tanzu/velero/pkg/apis/velero/v1"
"github.com/vmware-tanzu/velero/pkg/repository"
"github.com/vmware-tanzu/velero/pkg/util/logging"
"github.com/vmware-tanzu/velero/pkg/util/kube"
)
const (
@ -263,9 +262,8 @@ type VeleroOptions struct {
DefaultSnapshotMoveData bool
DisableInformerCache bool
ScheduleSkipImmediately bool
FormatFlag *logging.FormatFlag
LogLevelFlag *logging.LevelFlag
MaintenanceCfg repository.MaintenanceConfig
PodResources kube.PodResources
KeepLatestMaintenanceJobs int
}
func AllCRDs() *unstructured.UnstructuredList {
@ -350,7 +348,8 @@ func AllResources(o *VeleroOptions) *unstructured.UnstructuredList {
WithPodVolumeOperationTimeout(o.PodVolumeOperationTimeout),
WithUploaderType(o.UploaderType),
WithScheduleSkipImmediately(o.ScheduleSkipImmediately),
WithMaintenanceConfig(o.MaintenanceCfg),
WithPodResources(o.PodResources),
WithKeepLatestMaintenanceJobs(o.KeepLatestMaintenanceJobs),
}
if len(o.Features) > 0 {

View File

@ -70,25 +70,18 @@ type BackupPVC struct {
ReadOnly bool `json:"readOnly,omitempty"`
}
type PodResources struct {
CPURequest string `json:"cpuRequest,omitempty"`
MemoryRequest string `json:"memoryRequest,omitempty"`
CPULimit string `json:"cpuLimit,omitempty"`
MemoryLimit string `json:"memoryLimit,omitempty"`
}
type Configs struct {
// LoadConcurrency is the config for data path load concurrency per node.
LoadConcurrency *LoadConcurrency `json:"loadConcurrency,omitempty"`
// LoadAffinity is the config for data path load affinity.
LoadAffinity []*LoadAffinity `json:"loadAffinity,omitempty"`
LoadAffinity []*kube.LoadAffinity `json:"loadAffinity,omitempty"`
// BackupPVCConfig is the config for backupPVC (intermediate PVC) of snapshot data movement
BackupPVCConfig map[string]BackupPVC `json:"backupPVC,omitempty"`
// PodResources is the resource config for various types of pods launched by node-agent, i.e., data mover pods.
PodResources *PodResources `json:"podResources,omitempty"`
PodResources *kube.PodResources `json:"podResources,omitempty"`
}
// IsRunning checks if the node agent daemonset is running properly. If not, return the error found

View File

@ -18,11 +18,13 @@ package repository
import (
"context"
"encoding/json"
"fmt"
"sort"
"time"
appsv1 "k8s.io/api/apps/v1"
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
batchv1 "k8s.io/api/batch/v1"
v1 "k8s.io/api/core/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
@ -31,30 +33,21 @@ import (
"k8s.io/apimachinery/pkg/util/wait"
"sigs.k8s.io/controller-runtime/pkg/client"
"github.com/pkg/errors"
"github.com/vmware-tanzu/velero/pkg/repository/provider"
velerov1api "github.com/vmware-tanzu/velero/pkg/apis/velero/v1"
"github.com/vmware-tanzu/velero/pkg/util/kube"
"github.com/vmware-tanzu/velero/pkg/util/logging"
veleroutil "github.com/vmware-tanzu/velero/pkg/util/velero"
)
const RepositoryNameLabel = "velero.io/repo-name"
const DefaultKeepLatestMaitenanceJobs = 3
const DefaultMaintenanceJobCPURequest = "0"
const DefaultMaintenanceJobCPULimit = "0"
const DefaultMaintenanceJobMemRequest = "0"
const DefaultMaintenanceJobMemLimit = "0"
const (
RepositoryNameLabel = "velero.io/repo-name"
GlobalKeyForRepoMaintenanceJobCM = "global"
)
// MaintenanceConfig is the configuration for the repo maintenance job
type MaintenanceConfig struct {
KeepLatestMaitenanceJobs int
CPURequest string
MemRequest string
CPULimit string
MemLimit string
LogLevelFlag *logging.LevelFlag
FormatFlag *logging.FormatFlag
type JobConfigs struct {
// LoadAffinities is the config for repository maintenance job load affinity.
LoadAffinities []*kube.LoadAffinity `json:"loadAffinity,omitempty"`
// PodResources is the config for the CPU and memory resources setting.
PodResources *kube.PodResources `json:"podResources,omitempty"`
}
func generateJobName(repo string) string {
@ -68,117 +61,6 @@ func generateJobName(repo string) string {
return jobName
}
func buildMaintenanceJob(m MaintenanceConfig, param provider.RepoParam, cli client.Client, namespace string) (*batchv1.Job, error) {
// Get the Velero server deployment
deployment := &appsv1.Deployment{}
err := cli.Get(context.TODO(), types.NamespacedName{Name: "velero", Namespace: namespace}, deployment)
if err != nil {
return nil, err
}
// Get the environment variables from the Velero server deployment
envVars := veleroutil.GetEnvVarsFromVeleroServer(deployment)
// Get the volume mounts from the Velero server deployment
volumeMounts := veleroutil.GetVolumeMountsFromVeleroServer(deployment)
// Get the volumes from the Velero server deployment
volumes := veleroutil.GetVolumesFromVeleroServer(deployment)
// Get the service account from the Velero server deployment
serviceAccount := veleroutil.GetServiceAccountFromVeleroServer(deployment)
// Get image
image := veleroutil.GetVeleroServerImage(deployment)
// Set resource limits and requests
if m.CPURequest == "" {
m.CPURequest = DefaultMaintenanceJobCPURequest
}
if m.MemRequest == "" {
m.MemRequest = DefaultMaintenanceJobMemRequest
}
if m.CPULimit == "" {
m.CPULimit = DefaultMaintenanceJobCPULimit
}
if m.MemLimit == "" {
m.MemLimit = DefaultMaintenanceJobMemLimit
}
resources, err := kube.ParseResourceRequirements(m.CPURequest, m.MemRequest, m.CPULimit, m.MemLimit)
if err != nil {
return nil, errors.Wrap(err, "failed to parse resource requirements for maintenance job")
}
// Set arguments
args := []string{"repo-maintenance"}
args = append(args, fmt.Sprintf("--repo-name=%s", param.BackupRepo.Spec.VolumeNamespace))
args = append(args, fmt.Sprintf("--repo-type=%s", param.BackupRepo.Spec.RepositoryType))
args = append(args, fmt.Sprintf("--backup-storage-location=%s", param.BackupLocation.Name))
args = append(args, fmt.Sprintf("--log-level=%s", m.LogLevelFlag.String()))
args = append(args, fmt.Sprintf("--log-format=%s", m.FormatFlag.String()))
// build the maintenance job
job := &batchv1.Job{
ObjectMeta: metav1.ObjectMeta{
Name: generateJobName(param.BackupRepo.Name),
Namespace: param.BackupRepo.Namespace,
Labels: map[string]string{
RepositoryNameLabel: param.BackupRepo.Name,
},
},
Spec: batchv1.JobSpec{
BackoffLimit: new(int32), // Never retry
Template: v1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Name: "velero-repo-maintenance-pod",
},
Spec: v1.PodSpec{
Containers: []v1.Container{
{
Name: "velero-repo-maintenance-container",
Image: image,
Command: []string{
"/velero",
},
Args: args,
ImagePullPolicy: v1.PullIfNotPresent,
Env: envVars,
VolumeMounts: volumeMounts,
Resources: resources,
},
},
RestartPolicy: v1.RestartPolicyNever,
Volumes: volumes,
ServiceAccountName: serviceAccount,
},
},
},
}
if affinity := veleroutil.GetAffinityFromVeleroServer(deployment); affinity != nil {
job.Spec.Template.Spec.Affinity = affinity
}
if tolerations := veleroutil.GetTolerationsFromVeleroServer(deployment); tolerations != nil {
job.Spec.Template.Spec.Tolerations = tolerations
}
if nodeSelector := veleroutil.GetNodeSelectorFromVeleroServer(deployment); nodeSelector != nil {
job.Spec.Template.Spec.NodeSelector = nodeSelector
}
if labels := veleroutil.GetVeleroServerLables(deployment); len(labels) > 0 {
job.Spec.Template.Labels = labels
}
if annotations := veleroutil.GetVeleroServerAnnotations(deployment); len(annotations) > 0 {
job.Spec.Template.Annotations = annotations
}
return job, nil
}
// deleteOldMaintenanceJobs deletes old maintenance jobs and keeps the latest N jobs
func deleteOldMaintenanceJobs(cli client.Client, repo string, keep int) error {
// Get the maintenance job list by label
@ -262,3 +144,96 @@ func getLatestMaintenanceJob(cli client.Client, ns string) (*batchv1.Job, error)
return &jobList.Items[0], nil
}
// getMaintenanceJobConfig is called to get the Maintenance Job Config for the
// BackupRepository specified by the repo parameter.
//
// Params:
//
// ctx: the Go context used for controller-runtime client.
// client: the controller-runtime client.
// logger: the logger.
// veleroNamespace: the Velero-installed namespace. It's used to retrieve the BackupRepository.
// repoMaintenanceJobConfig: the repository maintenance job ConfigMap name.
// repo: the BackupRepository needs to run the maintenance Job.
func getMaintenanceJobConfig(
ctx context.Context,
client client.Client,
logger logrus.FieldLogger,
veleroNamespace string,
repoMaintenanceJobConfig string,
repo *velerov1api.BackupRepository,
) (*JobConfigs, error) {
var cm v1.ConfigMap
if err := client.Get(
ctx,
types.NamespacedName{
Namespace: veleroNamespace,
Name: repoMaintenanceJobConfig,
},
&cm,
); err != nil {
if apierrors.IsNotFound(err) {
return nil, nil
} else {
return nil, errors.Wrapf(
err,
"fail to get repo maintenance job configs %s", repoMaintenanceJobConfig)
}
}
if cm.Data == nil {
return nil, errors.Errorf("data is not available in config map %s", repoMaintenanceJobConfig)
}
// Generate the BackupRepository key.
// If using the BackupRepository name as the is more intuitive,
// but the BackupRepository generation is dynamic. We cannot assume
// they are ready when installing Velero.
// Instead we use the volume source namespace, BSL name, and the uploader
// type to represent the BackupRepository. The combination of those three
// keys can identify a unique BackupRepository.
repoJobConfigKey := repo.Spec.VolumeNamespace + "-" +
repo.Spec.BackupStorageLocation + "-" + repo.Spec.RepositoryType
var result *JobConfigs
if _, ok := cm.Data[repoJobConfigKey]; ok {
logger.Debugf("Find the repo maintenance config %s for repo %s", repoJobConfigKey, repo.Name)
result = new(JobConfigs)
if err := json.Unmarshal([]byte(cm.Data[repoJobConfigKey]), result); err != nil {
return nil, errors.Wrapf(
err,
"fail to unmarshal configs from %s's key %s",
repoMaintenanceJobConfig,
repoJobConfigKey)
}
}
if _, ok := cm.Data[GlobalKeyForRepoMaintenanceJobCM]; ok {
logger.Debugf("Find the global repo maintenance config for repo %s", repo.Name)
if result == nil {
result = new(JobConfigs)
}
globalResult := new(JobConfigs)
if err := json.Unmarshal([]byte(cm.Data[GlobalKeyForRepoMaintenanceJobCM]), globalResult); err != nil {
return nil, errors.Wrapf(
err,
"fail to unmarshal configs from %s's key %s",
repoMaintenanceJobConfig,
GlobalKeyForRepoMaintenanceJobCM)
}
if result.PodResources == nil && globalResult.PodResources != nil {
result.PodResources = globalResult.PodResources
}
if len(result.LoadAffinities) == 0 {
result.LoadAffinities = globalResult.LoadAffinities
}
}
return result, nil
}

View File

@ -25,18 +25,17 @@ import (
"github.com/sirupsen/logrus"
"github.com/stretchr/testify/assert"
appsv1 "k8s.io/api/apps/v1"
"github.com/stretchr/testify/require"
batchv1 "k8s.io/api/batch/v1"
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/client/fake"
velerov1api "github.com/vmware-tanzu/velero/pkg/apis/velero/v1"
"github.com/vmware-tanzu/velero/pkg/repository/provider"
"github.com/vmware-tanzu/velero/pkg/util/logging"
velerotest "github.com/vmware-tanzu/velero/pkg/test"
"github.com/vmware-tanzu/velero/pkg/util/kube"
)
func TestGenerateJobName1(t *testing.T) {
@ -263,146 +262,178 @@ func TestGetLatestMaintenanceJob(t *testing.T) {
// We expect the returned job to be the newer job
assert.Equal(t, newerJob.Name, job.Name)
}
func TestBuildMaintenanceJob(t *testing.T) {
func TestGetMaintenanceJobConfig(t *testing.T) {
ctx := context.Background()
logger := logrus.New()
veleroNamespace := "velero"
repoMaintenanceJobConfig := "repo-maintenance-job-config"
repo := &velerov1api.BackupRepository{
ObjectMeta: metav1.ObjectMeta{
Namespace: veleroNamespace,
Name: repoMaintenanceJobConfig,
},
Spec: velerov1api.BackupRepositorySpec{
BackupStorageLocation: "default",
RepositoryType: "kopia",
VolumeNamespace: "test",
},
}
testCases := []struct {
name string
m MaintenanceConfig
deploy *appsv1.Deployment
expectedJobName string
expectedError bool
name string
repoJobConfig *v1.ConfigMap
expectedConfig *JobConfigs
expectedError error
}{
{
name: "Valid maintenance job",
m: MaintenanceConfig{
CPURequest: "100m",
MemRequest: "128Mi",
CPULimit: "200m",
MemLimit: "256Mi",
LogLevelFlag: logging.LogLevelFlag(logrus.InfoLevel),
FormatFlag: logging.NewFormatFlag(),
},
deploy: &appsv1.Deployment{
name: "Config not exist",
expectedConfig: nil,
expectedError: nil,
},
{
name: "Invalid JSON",
repoJobConfig: &v1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Name: "velero",
Namespace: "velero",
Namespace: veleroNamespace,
Name: repoMaintenanceJobConfig,
},
Spec: appsv1.DeploymentSpec{
Template: v1.PodTemplateSpec{
Spec: v1.PodSpec{
Containers: []v1.Container{
Data: map[string]string{
"test-default-kopia": "{\"cpuRequest:\"100m\"}",
},
},
expectedConfig: nil,
expectedError: fmt.Errorf("fail to unmarshal configs from %s", repoMaintenanceJobConfig),
},
{
name: "Find config specific for BackupRepository",
repoJobConfig: &v1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Namespace: veleroNamespace,
Name: repoMaintenanceJobConfig,
},
Data: map[string]string{
"test-default-kopia": "{\"podResources\":{\"cpuRequest\":\"100m\",\"cpuLimit\":\"200m\",\"memoryRequest\":\"100Mi\",\"memoryLimit\":\"200Mi\"},\"loadAffinity\":[{\"nodeSelector\":{\"matchExpressions\":[{\"key\":\"cloud.google.com/machine-family\",\"operator\":\"In\",\"values\":[\"e2\"]}]}}]}",
},
},
expectedConfig: &JobConfigs{
PodResources: &kube.PodResources{
CPURequest: "100m",
CPULimit: "200m",
MemoryRequest: "100Mi",
MemoryLimit: "200Mi",
},
LoadAffinities: []*kube.LoadAffinity{
{
NodeSelector: metav1.LabelSelector{
MatchExpressions: []metav1.LabelSelectorRequirement{
{
Name: "velero-repo-maintenance-container",
Image: "velero-image",
Key: "cloud.google.com/machine-family",
Operator: metav1.LabelSelectorOpIn,
Values: []string{"e2"},
},
},
},
},
},
},
expectedJobName: "test-123-maintain-job",
expectedError: false,
expectedError: nil,
},
{
name: "Error getting Velero server deployment",
m: MaintenanceConfig{
CPURequest: "100m",
MemRequest: "128Mi",
CPULimit: "200m",
MemLimit: "256Mi",
LogLevelFlag: logging.LogLevelFlag(logrus.InfoLevel),
FormatFlag: logging.NewFormatFlag(),
name: "Find config specific for global",
repoJobConfig: &v1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Namespace: veleroNamespace,
Name: repoMaintenanceJobConfig,
},
Data: map[string]string{
GlobalKeyForRepoMaintenanceJobCM: "{\"podResources\":{\"cpuRequest\":\"50m\",\"cpuLimit\":\"100m\",\"memoryRequest\":\"50Mi\",\"memoryLimit\":\"100Mi\"},\"loadAffinity\":[{\"nodeSelector\":{\"matchExpressions\":[{\"key\":\"cloud.google.com/machine-family\",\"operator\":\"In\",\"values\":[\"n2\"]}]}}]}",
},
},
expectedJobName: "",
expectedError: true,
expectedConfig: &JobConfigs{
PodResources: &kube.PodResources{
CPURequest: "50m",
CPULimit: "100m",
MemoryRequest: "50Mi",
MemoryLimit: "100Mi",
},
LoadAffinities: []*kube.LoadAffinity{
{
NodeSelector: metav1.LabelSelector{
MatchExpressions: []metav1.LabelSelectorRequirement{
{
Key: "cloud.google.com/machine-family",
Operator: metav1.LabelSelectorOpIn,
Values: []string{"n2"},
},
},
},
},
},
},
expectedError: nil,
},
}
param := provider.RepoParam{
BackupRepo: &velerov1api.BackupRepository{
ObjectMeta: metav1.ObjectMeta{
Namespace: "velero",
Name: "test-123",
{
name: "Specific config supersede global config",
repoJobConfig: &v1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Namespace: veleroNamespace,
Name: repoMaintenanceJobConfig,
},
Data: map[string]string{
GlobalKeyForRepoMaintenanceJobCM: "{\"podResources\":{\"cpuRequest\":\"50m\",\"cpuLimit\":\"100m\",\"memoryRequest\":\"50Mi\",\"memoryLimit\":\"100Mi\"},\"loadAffinity\":[{\"nodeSelector\":{\"matchExpressions\":[{\"key\":\"cloud.google.com/machine-family\",\"operator\":\"In\",\"values\":[\"n2\"]}]}}]}",
"test-default-kopia": "{\"podResources\":{\"cpuRequest\":\"100m\",\"cpuLimit\":\"200m\",\"memoryRequest\":\"100Mi\",\"memoryLimit\":\"200Mi\"},\"loadAffinity\":[{\"nodeSelector\":{\"matchExpressions\":[{\"key\":\"cloud.google.com/machine-family\",\"operator\":\"In\",\"values\":[\"e2\"]}]}}]}",
},
},
Spec: velerov1api.BackupRepositorySpec{
VolumeNamespace: "test-123",
RepositoryType: "kopia",
},
},
BackupLocation: &velerov1api.BackupStorageLocation{
ObjectMeta: metav1.ObjectMeta{
Namespace: "velero",
Name: "test-location",
expectedConfig: &JobConfigs{
PodResources: &kube.PodResources{
CPURequest: "100m",
CPULimit: "200m",
MemoryRequest: "100Mi",
MemoryLimit: "200Mi",
},
LoadAffinities: []*kube.LoadAffinity{
{
NodeSelector: metav1.LabelSelector{
MatchExpressions: []metav1.LabelSelectorRequirement{
{
Key: "cloud.google.com/machine-family",
Operator: metav1.LabelSelectorOpIn,
Values: []string{"e2"},
},
},
},
},
},
},
expectedError: nil,
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
// Create a fake clientset with resources
objs := []runtime.Object{param.BackupLocation, param.BackupRepo}
if tc.deploy != nil {
objs = append(objs, tc.deploy)
}
scheme := runtime.NewScheme()
_ = appsv1.AddToScheme(scheme)
_ = velerov1api.AddToScheme(scheme)
cli := fake.NewClientBuilder().WithScheme(scheme).WithRuntimeObjects(objs...).Build()
// Call the function to test
job, err := buildMaintenanceJob(tc.m, param, cli, "velero")
// Check the error
if tc.expectedError {
assert.Error(t, err)
assert.Nil(t, job)
var fakeClient client.Client
if tc.repoJobConfig != nil {
fakeClient = velerotest.NewFakeControllerRuntimeClient(t, tc.repoJobConfig)
} else {
assert.NoError(t, err)
assert.NotNil(t, job)
assert.Contains(t, job.Name, tc.expectedJobName)
assert.Equal(t, param.BackupRepo.Namespace, job.Namespace)
assert.Equal(t, param.BackupRepo.Name, job.Labels[RepositoryNameLabel])
// Check container
assert.Len(t, job.Spec.Template.Spec.Containers, 1)
container := job.Spec.Template.Spec.Containers[0]
assert.Equal(t, "velero-repo-maintenance-container", container.Name)
assert.Equal(t, "velero-image", container.Image)
assert.Equal(t, v1.PullIfNotPresent, container.ImagePullPolicy)
// Check resources
expectedResources := v1.ResourceRequirements{
Requests: v1.ResourceList{
v1.ResourceCPU: resource.MustParse(tc.m.CPURequest),
v1.ResourceMemory: resource.MustParse(tc.m.MemRequest),
},
Limits: v1.ResourceList{
v1.ResourceCPU: resource.MustParse(tc.m.CPULimit),
v1.ResourceMemory: resource.MustParse(tc.m.MemLimit),
},
}
assert.Equal(t, expectedResources, container.Resources)
// Check args
expectedArgs := []string{
"repo-maintenance",
fmt.Sprintf("--repo-name=%s", param.BackupRepo.Spec.VolumeNamespace),
fmt.Sprintf("--repo-type=%s", param.BackupRepo.Spec.RepositoryType),
fmt.Sprintf("--backup-storage-location=%s", param.BackupLocation.Name),
fmt.Sprintf("--log-level=%s", tc.m.LogLevelFlag.String()),
fmt.Sprintf("--log-format=%s", tc.m.FormatFlag.String()),
}
assert.Equal(t, expectedArgs, container.Args)
// Check affinity
assert.Nil(t, job.Spec.Template.Spec.Affinity)
// Check tolerations
assert.Nil(t, job.Spec.Template.Spec.Tolerations)
// Check node selector
assert.Nil(t, job.Spec.Template.Spec.NodeSelector)
fakeClient = velerotest.NewFakeControllerRuntimeClient(t)
}
jobConfig, err := getMaintenanceJobConfig(
ctx,
fakeClient,
logger,
veleroNamespace,
repoMaintenanceJobConfig,
repo,
)
if tc.expectedError != nil {
require.Contains(t, err.Error(), tc.expectedError.Error())
} else {
require.NoError(t, err)
}
require.Equal(t, tc.expectedConfig, jobConfig)
})
}
}

View File

@ -23,12 +23,20 @@ import (
"github.com/pkg/errors"
"github.com/sirupsen/logrus"
appsv1 "k8s.io/api/apps/v1"
batchv1 "k8s.io/api/batch/v1"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
"sigs.k8s.io/controller-runtime/pkg/client"
"github.com/vmware-tanzu/velero/internal/credentials"
velerov1api "github.com/vmware-tanzu/velero/pkg/apis/velero/v1"
"github.com/vmware-tanzu/velero/pkg/repository/provider"
"github.com/vmware-tanzu/velero/pkg/util/filesystem"
"github.com/vmware-tanzu/velero/pkg/util/kube"
"github.com/vmware-tanzu/velero/pkg/util/logging"
veleroutil "github.com/vmware-tanzu/velero/pkg/util/velero"
)
// SnapshotIdentifier uniquely identifies a snapshot
@ -92,14 +100,20 @@ type Manager interface {
}
type manager struct {
namespace string
providers map[string]provider.Provider
client client.Client
repoLocker *RepoLocker
repoEnsurer *Ensurer
fileSystem filesystem.Interface
maintenanceCfg MaintenanceConfig
log logrus.FieldLogger
namespace string
providers map[string]provider.Provider
// client is the Velero controller manager's client.
// It's limited to resources in the Velero namespace.
client client.Client
repoLocker *RepoLocker
repoEnsurer *Ensurer
fileSystem filesystem.Interface
repoMaintenanceJobConfig string
podResources kube.PodResources
keepLatestMaintenanceJobs int
log logrus.FieldLogger
logLevel logrus.Level
logFormat *logging.FormatFlag
}
// NewManager create a new repository manager.
@ -110,18 +124,26 @@ func NewManager(
repoEnsurer *Ensurer,
credentialFileStore credentials.FileStore,
credentialSecretStore credentials.SecretStore,
maintenanceCfg MaintenanceConfig,
repoMaintenanceJobConfig string,
podResources kube.PodResources,
keepLatestMaintenanceJobs int,
log logrus.FieldLogger,
logLevel logrus.Level,
logFormat *logging.FormatFlag,
) Manager {
mgr := &manager{
namespace: namespace,
client: client,
providers: map[string]provider.Provider{},
repoLocker: repoLocker,
repoEnsurer: repoEnsurer,
fileSystem: filesystem.NewFileSystem(),
maintenanceCfg: maintenanceCfg,
log: log,
namespace: namespace,
client: client,
providers: map[string]provider.Provider{},
repoLocker: repoLocker,
repoEnsurer: repoEnsurer,
fileSystem: filesystem.NewFileSystem(),
repoMaintenanceJobConfig: repoMaintenanceJobConfig,
podResources: podResources,
keepLatestMaintenanceJobs: keepLatestMaintenanceJobs,
log: log,
logLevel: logLevel,
logFormat: logFormat,
}
mgr.providers[velerov1api.BackupRepositoryTypeRestic] = provider.NewResticRepositoryProvider(credentialFileStore, mgr.fileSystem, mgr.log)
@ -204,9 +226,24 @@ func (m *manager) PruneRepo(repo *velerov1api.BackupRepository) error {
return nil
}
jobConfig, err := getMaintenanceJobConfig(
context.Background(),
m.client,
m.log,
m.namespace,
m.repoMaintenanceJobConfig,
repo,
)
if err != nil {
log.Infof("Cannot find the repo-maintenance-job-config ConfigMap: %s. Use default value.", err.Error())
}
log.Info("Start to maintenance repo")
maintenanceJob, err := buildMaintenanceJob(m.maintenanceCfg, param, m.client, m.namespace)
maintenanceJob, err := m.buildMaintenanceJob(
jobConfig,
param,
)
if err != nil {
return errors.Wrap(err, "error to build maintenance job")
}
@ -219,8 +256,11 @@ func (m *manager) PruneRepo(repo *velerov1api.BackupRepository) error {
log.Debug("Creating maintenance job")
defer func() {
if err := deleteOldMaintenanceJobs(m.client, param.BackupRepo.Name,
m.maintenanceCfg.KeepLatestMaitenanceJobs); err != nil {
if err := deleteOldMaintenanceJobs(
m.client,
param.BackupRepo.Name,
m.keepLatestMaintenanceJobs,
); err != nil {
log.WithError(err).Error("Failed to delete maintenance job")
}
}()
@ -338,3 +378,115 @@ func (m *manager) assembleRepoParam(repo *velerov1api.BackupRepository) (provide
BackupRepo: repo,
}, nil
}
func (m *manager) buildMaintenanceJob(
config *JobConfigs,
param provider.RepoParam,
) (*batchv1.Job, error) {
// Get the Velero server deployment
deployment := &appsv1.Deployment{}
err := m.client.Get(context.TODO(), types.NamespacedName{Name: "velero", Namespace: m.namespace}, deployment)
if err != nil {
return nil, err
}
// Get the environment variables from the Velero server deployment
envVars := veleroutil.GetEnvVarsFromVeleroServer(deployment)
// Get the volume mounts from the Velero server deployment
volumeMounts := veleroutil.GetVolumeMountsFromVeleroServer(deployment)
// Get the volumes from the Velero server deployment
volumes := veleroutil.GetVolumesFromVeleroServer(deployment)
// Get the service account from the Velero server deployment
serviceAccount := veleroutil.GetServiceAccountFromVeleroServer(deployment)
// Get image
image := veleroutil.GetVeleroServerImage(deployment)
// Set resource limits and requests
cpuRequest := m.podResources.CPURequest
memRequest := m.podResources.MemoryRequest
cpuLimit := m.podResources.CPULimit
memLimit := m.podResources.MemoryLimit
if config != nil && config.PodResources != nil {
cpuRequest = config.PodResources.CPURequest
memRequest = config.PodResources.MemoryRequest
cpuLimit = config.PodResources.CPULimit
memLimit = config.PodResources.MemoryLimit
}
resources, err := kube.ParseResourceRequirements(cpuRequest, memRequest, cpuLimit, memLimit)
if err != nil {
return nil, errors.Wrap(err, "failed to parse resource requirements for maintenance job")
}
// Set arguments
args := []string{"repo-maintenance"}
args = append(args, fmt.Sprintf("--repo-name=%s", param.BackupRepo.Spec.VolumeNamespace))
args = append(args, fmt.Sprintf("--repo-type=%s", param.BackupRepo.Spec.RepositoryType))
args = append(args, fmt.Sprintf("--backup-storage-location=%s", param.BackupLocation.Name))
args = append(args, fmt.Sprintf("--log-level=%s", m.logLevel.String()))
args = append(args, fmt.Sprintf("--log-format=%s", m.logFormat.String()))
// build the maintenance job
job := &batchv1.Job{
ObjectMeta: metav1.ObjectMeta{
Name: generateJobName(param.BackupRepo.Name),
Namespace: param.BackupRepo.Namespace,
Labels: map[string]string{
RepositoryNameLabel: param.BackupRepo.Name,
},
},
Spec: batchv1.JobSpec{
BackoffLimit: new(int32), // Never retry
Template: v1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Name: "velero-repo-maintenance-pod",
},
Spec: v1.PodSpec{
Containers: []v1.Container{
{
Name: "velero-repo-maintenance-container",
Image: image,
Command: []string{
"/velero",
},
Args: args,
ImagePullPolicy: v1.PullIfNotPresent,
Env: envVars,
VolumeMounts: volumeMounts,
Resources: resources,
},
},
RestartPolicy: v1.RestartPolicyNever,
Volumes: volumes,
ServiceAccountName: serviceAccount,
},
},
},
}
if config != nil && len(config.LoadAffinities) > 0 {
affinity := kube.ToSystemAffinity(config.LoadAffinities)
job.Spec.Template.Spec.Affinity = affinity
}
if tolerations := veleroutil.GetTolerationsFromVeleroServer(deployment); tolerations != nil {
job.Spec.Template.Spec.Tolerations = tolerations
}
if nodeSelector := veleroutil.GetNodeSelectorFromVeleroServer(deployment); nodeSelector != nil {
job.Spec.Template.Spec.NodeSelector = nodeSelector
}
if labels := veleroutil.GetVeleroServerLables(deployment); len(labels) > 0 {
job.Spec.Template.Labels = labels
}
if annotations := veleroutil.GetVeleroServerAnnotations(deployment); len(annotations) > 0 {
job.Spec.Template.Annotations = annotations
}
return job, nil
}

View File

@ -17,18 +17,30 @@ limitations under the License.
package repository
import (
"fmt"
"testing"
"github.com/sirupsen/logrus"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
appsv1 "k8s.io/api/apps/v1"
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
kbclient "sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/client/fake"
velerov1 "github.com/vmware-tanzu/velero/pkg/apis/velero/v1"
velerov1api "github.com/vmware-tanzu/velero/pkg/apis/velero/v1"
"github.com/vmware-tanzu/velero/pkg/repository/provider"
"github.com/vmware-tanzu/velero/pkg/util/kube"
"github.com/vmware-tanzu/velero/pkg/util/logging"
)
func TestGetRepositoryProvider(t *testing.T) {
var fakeClient kbclient.Client
mgr := NewManager("", fakeClient, nil, nil, nil, nil, MaintenanceConfig{}, nil).(*manager)
mgr := NewManager("", fakeClient, nil, nil, nil, nil, "", kube.PodResources{}, 3, nil, logrus.InfoLevel, nil).(*manager)
repo := &velerov1.BackupRepository{}
// empty repository type
@ -47,3 +59,168 @@ func TestGetRepositoryProvider(t *testing.T) {
_, err = mgr.getRepositoryProvider(repo)
require.Error(t, err)
}
func TestBuildMaintenanceJob(t *testing.T) {
testCases := []struct {
name string
m *JobConfigs
deploy *appsv1.Deployment
logLevel logrus.Level
logFormat *logging.FormatFlag
expectedJobName string
expectedError bool
}{
{
name: "Valid maintenance job",
m: &JobConfigs{
PodResources: &kube.PodResources{
CPURequest: "100m",
MemoryRequest: "128Mi",
CPULimit: "200m",
MemoryLimit: "256Mi",
},
},
deploy: &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: "velero",
Namespace: "velero",
},
Spec: appsv1.DeploymentSpec{
Template: v1.PodTemplateSpec{
Spec: v1.PodSpec{
Containers: []v1.Container{
{
Name: "velero-repo-maintenance-container",
Image: "velero-image",
},
},
},
},
},
},
logLevel: logrus.InfoLevel,
logFormat: logging.NewFormatFlag(),
expectedJobName: "test-123-maintain-job",
expectedError: false,
},
{
name: "Error getting Velero server deployment",
m: &JobConfigs{
PodResources: &kube.PodResources{
CPURequest: "100m",
MemoryRequest: "128Mi",
CPULimit: "200m",
MemoryLimit: "256Mi",
},
},
logLevel: logrus.InfoLevel,
logFormat: logging.NewFormatFlag(),
expectedJobName: "",
expectedError: true,
},
}
param := provider.RepoParam{
BackupRepo: &velerov1api.BackupRepository{
ObjectMeta: metav1.ObjectMeta{
Namespace: "velero",
Name: "test-123",
},
Spec: velerov1api.BackupRepositorySpec{
VolumeNamespace: "test-123",
RepositoryType: "kopia",
},
},
BackupLocation: &velerov1api.BackupStorageLocation{
ObjectMeta: metav1.ObjectMeta{
Namespace: "velero",
Name: "test-location",
},
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
// Create a fake clientset with resources
objs := []runtime.Object{param.BackupLocation, param.BackupRepo}
if tc.deploy != nil {
objs = append(objs, tc.deploy)
}
scheme := runtime.NewScheme()
_ = appsv1.AddToScheme(scheme)
_ = velerov1api.AddToScheme(scheme)
cli := fake.NewClientBuilder().WithScheme(scheme).WithRuntimeObjects(objs...).Build()
mgr := NewManager(
"velero",
cli,
nil,
nil,
nil,
nil,
"",
kube.PodResources{},
3,
nil,
logrus.InfoLevel,
logging.NewFormatFlag(),
).(*manager)
// Call the function to test
job, err := mgr.buildMaintenanceJob(tc.m, param)
// Check the error
if tc.expectedError {
assert.Error(t, err)
assert.Nil(t, job)
} else {
assert.NoError(t, err)
assert.NotNil(t, job)
assert.Contains(t, job.Name, tc.expectedJobName)
assert.Equal(t, param.BackupRepo.Namespace, job.Namespace)
assert.Equal(t, param.BackupRepo.Name, job.Labels[RepositoryNameLabel])
// Check container
assert.Len(t, job.Spec.Template.Spec.Containers, 1)
container := job.Spec.Template.Spec.Containers[0]
assert.Equal(t, "velero-repo-maintenance-container", container.Name)
assert.Equal(t, "velero-image", container.Image)
assert.Equal(t, v1.PullIfNotPresent, container.ImagePullPolicy)
// Check resources
expectedResources := v1.ResourceRequirements{
Requests: v1.ResourceList{
v1.ResourceCPU: resource.MustParse(tc.m.PodResources.CPURequest),
v1.ResourceMemory: resource.MustParse(tc.m.PodResources.MemoryRequest),
},
Limits: v1.ResourceList{
v1.ResourceCPU: resource.MustParse(tc.m.PodResources.CPULimit),
v1.ResourceMemory: resource.MustParse(tc.m.PodResources.MemoryLimit),
},
}
assert.Equal(t, expectedResources, container.Resources)
// Check args
expectedArgs := []string{
"repo-maintenance",
fmt.Sprintf("--repo-name=%s", param.BackupRepo.Spec.VolumeNamespace),
fmt.Sprintf("--repo-type=%s", param.BackupRepo.Spec.RepositoryType),
fmt.Sprintf("--backup-storage-location=%s", param.BackupLocation.Name),
fmt.Sprintf("--log-level=%s", tc.logLevel.String()),
fmt.Sprintf("--log-format=%s", tc.logFormat.String()),
}
assert.Equal(t, expectedArgs, container.Args)
// Check affinity
assert.Nil(t, job.Spec.Template.Spec.Affinity)
// Check tolerations
assert.Nil(t, job.Spec.Template.Spec.Tolerations)
// Check node selector
assert.Nil(t, job.Spec.Template.Spec.NodeSelector)
}
})
}
}

View File

@ -30,6 +30,18 @@ import (
corev1client "k8s.io/client-go/kubernetes/typed/core/v1"
)
type LoadAffinity struct {
// NodeSelector specifies the label selector to match nodes
NodeSelector metav1.LabelSelector `json:"nodeSelector"`
}
type PodResources struct {
CPURequest string `json:"cpuRequest,omitempty"`
MemoryRequest string `json:"memoryRequest,omitempty"`
CPULimit string `json:"cpuLimit,omitempty"`
MemoryLimit string `json:"memoryLimit,omitempty"`
}
// IsPodRunning does a well-rounded check to make sure the specified pod is running stably.
// If not, return the error found
func IsPodRunning(pod *corev1api.Pod) error {
@ -201,3 +213,47 @@ func CollectPodLogs(ctx context.Context, podGetter corev1client.CoreV1Interface,
return nil
}
func ToSystemAffinity(loadAffinities []*LoadAffinity) *corev1api.Affinity {
if len(loadAffinities) == 0 {
return nil
}
nodeSelectorTermList := make([]corev1api.NodeSelectorTerm, 0)
for _, loadAffinity := range loadAffinities {
requirements := []corev1api.NodeSelectorRequirement{}
for k, v := range loadAffinity.NodeSelector.MatchLabels {
requirements = append(requirements, corev1api.NodeSelectorRequirement{
Key: k,
Values: []string{v},
Operator: corev1api.NodeSelectorOpIn,
})
}
for _, exp := range loadAffinity.NodeSelector.MatchExpressions {
requirements = append(requirements, corev1api.NodeSelectorRequirement{
Key: exp.Key,
Values: exp.Values,
Operator: corev1api.NodeSelectorOperator(exp.Operator),
})
}
nodeSelectorTermList = append(
nodeSelectorTermList,
corev1api.NodeSelectorTerm{
MatchExpressions: requirements,
},
)
}
if len(nodeSelectorTermList) > 0 {
result := new(corev1api.Affinity)
result.NodeAffinity = new(corev1api.NodeAffinity)
result.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution = new(corev1api.NodeSelector)
result.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms = nodeSelectorTermList
return result
}
return nil
}

View File

@ -19,6 +19,7 @@ package kube
import (
"context"
"io"
"reflect"
"strings"
"testing"
"time"
@ -697,3 +698,151 @@ func TestCollectPodLogs(t *testing.T) {
})
}
}
func TestToSystemAffinity(t *testing.T) {
tests := []struct {
name string
loadAffinities []*LoadAffinity
expected *corev1api.Affinity
}{
{
name: "loadAffinity is nil",
},
{
name: "loadAffinity is empty",
loadAffinities: []*LoadAffinity{},
},
{
name: "with match label",
loadAffinities: []*LoadAffinity{
{
NodeSelector: metav1.LabelSelector{
MatchLabels: map[string]string{
"key-1": "value-1",
},
},
},
},
expected: &corev1api.Affinity{
NodeAffinity: &corev1api.NodeAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: &corev1api.NodeSelector{
NodeSelectorTerms: []corev1api.NodeSelectorTerm{
{
MatchExpressions: []corev1api.NodeSelectorRequirement{
{
Key: "key-1",
Values: []string{"value-1"},
Operator: corev1api.NodeSelectorOpIn,
},
},
},
},
},
},
},
},
{
name: "with match expression",
loadAffinities: []*LoadAffinity{
{
NodeSelector: metav1.LabelSelector{
MatchLabels: map[string]string{
"key-2": "value-2",
},
MatchExpressions: []metav1.LabelSelectorRequirement{
{
Key: "key-3",
Values: []string{"value-3-1", "value-3-2"},
Operator: metav1.LabelSelectorOpNotIn,
},
{
Key: "key-4",
Values: []string{"value-4-1", "value-4-2", "value-4-3"},
Operator: metav1.LabelSelectorOpDoesNotExist,
},
},
},
},
},
expected: &corev1api.Affinity{
NodeAffinity: &corev1api.NodeAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: &corev1api.NodeSelector{
NodeSelectorTerms: []corev1api.NodeSelectorTerm{
{
MatchExpressions: []corev1api.NodeSelectorRequirement{
{
Key: "key-2",
Values: []string{"value-2"},
Operator: corev1api.NodeSelectorOpIn,
},
{
Key: "key-3",
Values: []string{"value-3-1", "value-3-2"},
Operator: corev1api.NodeSelectorOpNotIn,
},
{
Key: "key-4",
Values: []string{"value-4-1", "value-4-2", "value-4-3"},
Operator: corev1api.NodeSelectorOpDoesNotExist,
},
},
},
},
},
},
},
},
{
name: "multiple load affinities",
loadAffinities: []*LoadAffinity{
{
NodeSelector: metav1.LabelSelector{
MatchLabels: map[string]string{
"key-1": "value-1",
},
},
},
{
NodeSelector: metav1.LabelSelector{
MatchLabels: map[string]string{
"key-2": "value-2",
},
},
},
},
expected: &corev1api.Affinity{
NodeAffinity: &corev1api.NodeAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: &corev1api.NodeSelector{
NodeSelectorTerms: []corev1api.NodeSelectorTerm{
{
MatchExpressions: []corev1api.NodeSelectorRequirement{
{
Key: "key-1",
Values: []string{"value-1"},
Operator: corev1api.NodeSelectorOpIn,
},
},
},
{
MatchExpressions: []corev1api.NodeSelectorRequirement{
{
Key: "key-2",
Values: []string{"value-2"},
Operator: corev1api.NodeSelectorOpIn,
},
},
},
},
},
},
},
},
}
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
affinity := ToSystemAffinity(test.loadAffinities)
assert.True(t, reflect.DeepEqual(affinity, test.expected))
})
}
}

View File

@ -9,9 +9,106 @@ Before v1.14.0, Velero performs periodic maintenance on the repository within Ve
For repository maintenance jobs, there's no limit on resources by default. You could configure the job resource limitation based on target data to be backed up.
From v1.15 and on, Velero introduces a new ConfigMap, specified by `velero server --repo-maintenance-job-config` parameter, to set repository maintenance Job configuration, including Node Affinity and resources. The old `velero server` parameters ( `--maintenance-job-cpu-request`, `--maintenance-job-mem-request`, `--maintenance-job-cpu-limit`, `--maintenance-job-mem-limit`, and `--keep-latest-maintenance-jobs`) introduced in v1.14 are deprecated, and will be deleted in v1.17.
## Settings
### Resource Limitation
You can customize the maintenance job resource requests and limit when using the [velero install][1] CLI command.
### Resource Limitation and Node Affinity
Those are specified by the ConfigMap specified by `velero server --repo-maintenance-job-config` parameter.
This ConfigMap content is a Map.
If there is a key value as `global` in the map, the key's value is applied to all BackupRepositories maintenance jobs that cannot find their own specific configuration in the ConfigMap.
The other keys in the map is the combination of three elements of a BackupRepository, because those three keys can identify a unique BackupRepository:
* The namespace in which BackupRepository backs up volume data.
* The BackupRepository referenced BackupStorageLocation's name.
* The BackupRepository's type. Possible values are `kopia` and `restic`.
If there is a key match with BackupRepository, the key's value is applied to the BackupRepository's maintenance jobs.
By this way, it's possible to let user configure before the BackupRepository is created.
This is especially convenient for administrator configuring during the Velero installation.
For example, the following BackupRepository's key should be `test-default-kopia`.
``` yaml
- apiVersion: velero.io/v1
kind: BackupRepository
metadata:
generateName: test-default-kopia-
labels:
velero.io/repository-type: kopia
velero.io/storage-location: default
velero.io/volume-namespace: test
name: test-default-kopia-kgt6n
namespace: velero
spec:
backupStorageLocation: default
maintenanceFrequency: 1h0m0s
repositoryType: kopia
resticIdentifier: gs:jxun:/restic/test
volumeNamespace: test
```
You can still customize the maintenance job resource requests and limit when using the [velero install][1] CLI command.
The `LoadAffinity` structure is reused from design [node-agent affinity configuration](2).
### Affinity Example
It's possible that the users want to choose nodes that match condition A or condition B to run the job.
For example, the user want to let the nodes is in a specified machine type or the nodes locate in the us-central1-x zones to run the job.
This can be done by adding multiple entries in the `LoadAffinity` array.
The sample of the ```repo-maintenance-job-config``` ConfigMap for the above scenario is as below:
``` bash
cat <<EOF > repo-maintenance-job-config.json
{
"global": {
podResources: {
"cpuRequest": "100m",
"cpuLimit": "200m",
"memoryRequest": "100Mi",
"memoryLimit": "200Mi"
},
"loadAffinity": [
{
"nodeSelector": {
"matchExpressions": [
{
"key": "cloud.google.com/machine-family",
"operator": "In",
"values": [
"e2"
]
}
]
}
},
{
"nodeSelector": {
"matchExpressions": [
{
"key": "topology.kubernetes.io/zone",
"operator": "In",
"values": [
"us-central1-a",
"us-central1-b",
"us-central1-c"
]
}
]
}
}
]
}
}
EOF
```
This sample showcases two affinity configurations:
- matchLabels: maintenance job runs on nodes with label key `cloud.google.com/machine-family` and value `e2`.
- matchLabels: maintenance job runs on nodes located in `us-central1-a`, `us-central1-b` and `us-central1-c`.
The nodes matching one of the two conditions are selected.
To create the configMap, users need to save something like the above sample to a json file and then run below command:
```
kubectl create cm repo-maintenance-job-config -n velero --from-file=repo-maintenance-job-config.json
```
### Log
Maintenance job inherits the log level and log format settings from the Velero server, so if the Velero server enabled the debug log, the maintenance job will also open the debug level log.
@ -31,6 +128,7 @@ velero install --default-repo-maintain-frequency <DURATION>
For Kopia the default maintenance frequency is 1 hour, and Restic is 7 * 24 hours.
### Others
Maintenance jobs will inherit the labels, annotations, tolerations, affinity, nodeSelector, service account, image, environment variables, cloud-credentials etc. from Velero deployment.
Maintenance jobs will inherit the labels, annotations, toleration, nodeSelector, service account, image, environment variables, cloud-credentials etc. from Velero deployment.
[1]: velero-install.md#usage
[1]: velero-install.md#usage
[2]: node-agent-concurrency.md

View File

@ -56,13 +56,15 @@ toc:
- page: Backup PVC Configuration
url: /data-movement-backup-pvc-configuration
- page: Backup Repository Configuration
url: /backup-repository-configuration
url: /backup-repository-configuration
- page: Verifying Self-signed Certificates
url: /self-signed-certificates
- page: Changing RBAC permissions
url: /rbac
- page: Behind proxy
url: /proxy
- page: Repository Maintenance
url: /repository-maintenance
- title: Plugins
subfolderitems:
- page: Overview

View File

@ -59,6 +59,8 @@ toc:
url: /rbac
- page: Behind proxy
url: /proxy
- page: Repository Maintenance
url: /repository-maintenance
- title: Plugins
subfolderitems:
- page: Overview