--- layout: blog title: 'Kubernetes 1.23: Pod Security Graduates to Beta' date: 2021-12-09 slug: pod-security-admission-beta --- **Authors:** Jim Angel (Google), Lachlan Evenson (Microsoft) With the release of Kubernetes v1.23, [Pod Security admission](/docs/concepts/security/pod-security-admission/) has now entered beta. Pod Security is a [built-in](/docs/reference/access-authn-authz/admission-controllers/) admission controller that evaluates pod specifications against a predefined set of [Pod Security Standards](/docs/concepts/security/pod-security-standards/) and determines whether to `admit` or `deny` the pod from running. Pod Security is the successor to [PodSecurityPolicy](/docs/concepts/security/pod-security-policy/) which was deprecated in the v1.21 release, and will be removed in Kubernetes v1.25. In this article, we cover the key concepts of Pod Security along with how to use it. We hope that cluster administrators and developers alike will use this new mechanism to enforce secure defaults for their workloads. ## Why Pod Security The overall aim of Pod Security is to let you isolate workloads. You can run a cluster that runs different workloads and, without adding extra third-party tooling, implement controls that require Pods for a workload to restrict their own privileges to a defined bounding set. Pod Security overcomes key shortcomings of Kubernetes' existing, but deprecated, PodSecurityPolicy (PSP) mechanism: * Policy authorization model — challenging to deploy with controllers. * Risks around switching — a lack of dry-run/audit capabilities made it hard to enable PodSecurityPolicy. * Inconsistent and Unbounded API — the large configuration surface and evolving constraints led to a complex and confusing API. The shortcomings of PSP made it very difficult to use which led the community to reevaluate whether or not a better implementation could achieve the same goals. One of those goals was to provide an out-of-the-box solution to apply security best practices. Pod Security ships with predefined Pod Security levels that a cluster administrator can configure to meet the desired security posture. It's important to note that Pod Security doesn't have complete feature parity with the deprecated PodSecurityPolicy. Specifically, it doesn't have the ability to mutate or change Kubernetes resources to auto-remediate a policy violation on behalf of the user. Additionally, it doesn't provide fine-grained control over each allowed field and value within a pod specification or any other Kubernetes resource that you may wish to evaluate. If you need more fine-grained policy control then take a look at these [other](/docs/concepts/security/pod-security-standards/#faq) projects which support such use cases. Pod Security also adheres to Kubernetes best practices of declarative object management by denying resources that violate the policy. This requires resources to be updated in source repositories, and tooling to be updated prior to being deployed to Kubernetes. ## How Does Pod Security Work? Pod Security is a built-in [admission controller](/docs/reference/access-authn-authz/admission-controllers/) starting with Kubernetes v1.22, but can also be run as a standalone [webhook](/docs/concepts/security/pod-security-admission/#webhook). Admission controllers function by intercepting requests in the Kubernetes API server prior to persistence to storage. They can either `admit` or `deny` a request. In the case of Pod Security, pod specifications will be evaluated against a configured policy in the form of a Pod Security Standard. This means that security sensitive fields in a pod specification will only be allowed to have [specific](h/docs/concepts/security/pod-security-standards/#profile-details) values. ## Configuring Pod Security ### Pod Security Standards In order to use Pod Security we first need to understand [Pod Security Standards](/docs/concepts/security/pod-security-standards/). These standards define three different policy levels that range from permissive to restrictive. These levels are as follows: * `privileged` — open and unrestricted * `baseline` — Covers known privilege escalations while minimizing restrictions * `restricted` — Highly restricted, hardening against known and unknown privilege escalations. May cause compatibility issues Each of these policy levels define which fields are restricted within a pod specification and the allowed values. Some of the fields restricted by these policies include: * `spec.securityContext.sysctls` * `spec.hostNetwork` * `spec.volumes[*].hostPath` * `spec.containers[*].securityContext.privileged` Policy levels are applied via labels on Namespace resources, which allows for granular per-namespace policy selection. The AdmissionConfiguration in the API server can also be configured to set cluster-wide default levels and exemptions. ### Policy modes Policies are applied in a specific mode. Multiple modes (with different policy levels) can be set on the same namespace. Here is a list of modes: * `enforce` — Any Pods that violate the policy will be rejected * `audit` — Violations will be recorded as an annotation in the audit logs, but don't affect whether the pod is allowed. * `warn` — Violations will send a warning message back to the user, but don't affect whether the pod is allowed. In addition to modes you can also pin the policy to a specific version (for example v1.22). Pinning to a specific version allows the behavior to remain consistent if the policy definition changes in future Kubernetes releases. ## Hands on demo ### Prerequisites - [KinD](https://kind.sigs.k8s.io/docs/user/quick-start/#installation) - [kubectl](/docs/tasks/tools/) - [Docker](https://docs.docker.com/get-docker/) or [Podman](https://podman.io/getting-started/installation) container runtime & CLI ### Deploy a kind cluster ```shell kind create cluster --image kindest/node:v1.23.0 ``` It might take a while to start and once it's started it might take a minute or so before the node becomes ready. ```shell kubectl cluster-info --context kind-kind ``` Wait for the node STATUS to become ready. ```shell kubectl get nodes ``` The output is similar to this: ``` NAME STATUS ROLES AGE VERSION kind-control-plane Ready control-plane,master 54m v1.23.0 ``` ### Confirm Pod Security is enabled The best way to [confirm the API's default enabled plugins](/docs/reference/access-authn-authz/admission-controllers/#which-plugins-are-enabled-by-default) is to check the Kubernetes API container's help arguments. ```shell kubectl -n kube-system exec kube-apiserver-kind-control-plane -it -- kube-apiserver -h | grep "default enabled ones" ``` The output is similar to this: ``` ... --enable-admission-plugins strings admission plugins that should be enabled in addition to default enabled ones (NamespaceLifecycle, LimitRanger, ServiceAccount, TaintNodesByCondition, PodSecurity, Priority, DefaultTolerationSeconds, DefaultStorageClass, StorageObjectInUseProtection, PersistentVolumeClaimResize, RuntimeClass, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, MutatingAdmissionWebhook, ValidatingAdmissionWebhook, ResourceQuota). ... ``` `PodSecurity` is listed in the group of default enabled admission plugins. If using a cloud provider, or if you don't have access to the API server, the best way to check would be to run a quick end-to-end test: ```shell kubectl create namespace verify-pod-security kubectl label namespace verify-pod-security pod-security.kubernetes.io/enforce=restricted # The following command does NOT create a workload (--dry-run=server) kubectl -n verify-pod-security run test --dry-run=server --image=busybox --privileged kubectl delete namespace verify-pod-security ``` The output is similar to this: ``` Error from server (Forbidden): pods "test" is forbidden: violates PodSecurity "restricted:latest": privileged (container "test" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "test" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "test" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "test" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "test" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") ``` ### Configure Pod Security Policies are applied to a namespace via labels. These labels are as follows: * `pod-security.kubernetes.io/: ` (required to enable pod security) * `pod-security.kubernetes.io/-version: ` (*optional*, defaults to latest) A specific version can be supplied for each enforcement mode. The version pins the policy to the version that was shipped as part of the Kubernetes release. Pinning to a specific Kubernetes version allows for deterministic policy behavior while allowing flexibility for future updates to Pod Security Standards. The possible are `enforce`, `audit` and `warn`. ### When to use `warn`? The typical uses for `warn` are to get ready for a future change where you want to enforce a different policy. The most two common cases would be: * `warn` at the same level but a different version (e.g. pin `enforce` to *restricted+v1.23* and `warn` at *restricted+latest*) * `warn` at a stricter level (e.g. `enforce` baseline, `warn` restricted) It's not recommended to use `warn` for the exact same level+version of the policy as `enforce`. In the admission sequence, if `enforce` fails, the entire sequence fails before evaluating the `warn`. First, create a namespace called `verify-pod-security` if not created earlier. For the demo, `--overwrite` is used when labeling to allow repurposing a single namespace for multiple examples. ```shell kubectl create namespace verify-pod-security ``` ### Deploy demo workloads Each workload represents a higher level of security that would not pass the profile that comes after it. For the following examples, use the `busybox` container runs a `sleep` command for 1 million seconds (≅11 days) or until deleted. Pod Security is not interested in which container image you chose, but rather the Pod level settings and their implications for security. ### Privileged level and workload For the privileged pod, use the [privileged policy](/docs/concepts/security/pod-security-standards/#privileged). This allows the process inside a container to gain new processes (also known as "privilege escalation") and can be dangerous if untrusted. First, let's apply a restricted Pod Security level for a test. ```shell # enforces a "restricted" security policy and audits on restricted kubectl label --overwrite ns verify-pod-security \ pod-security.kubernetes.io/enforce=restricted \ pod-security.kubernetes.io/audit=restricted ``` Next, try to deploy a privileged workload in the namespace. ```shell cat < ``` ### Applying a cluster-wide policy In addition to applying labels to namespaces to configure policy you can also configure cluster-wide policies and exemptions using the AdmissionConfiguration resource. Using this resource, policy definitions are applied cluster-wide by default and any policy that is applied via namespace labels will take precedence. There is no runtime configurable API for the `AdmissionConfiguration` configuration file so a cluster administrator would need to specify a path to the file below via the `--admission-control-config-file` flag on the API server. In the following resource we are enforcing the baseline policy and warning and auditing the baseline policy. We are also making the kube-system namespace exempt from this policy. It's not recommended to alter control plane / clusters after install, so let's build a new cluster with a default policy on all namespaces. First, delete the current cluster. ```shell kind delete cluster ``` Create a Pod Security configuration that `enforce` and `audit` baseline policies while using a restricted profile to `warn` the end user. ```shell cat < pod-security.yaml apiVersion: apiserver.config.k8s.io/v1 kind: AdmissionConfiguration plugins: - name: PodSecurity configuration: apiVersion: pod-security.admission.config.k8s.io/v1beta1 kind: PodSecurityConfiguration defaults: enforce: "baseline" enforce-version: "latest" audit: "baseline" audit-version: "latest" warn: "restricted" warn-version: "latest" exemptions: # Array of authenticated usernames to exempt. usernames: [] # Array of runtime class names to exempt. runtimeClasses: [] # Array of namespaces to exempt. namespaces: [kube-system] EOF ``` For additional options, check out the official [_standards admission controller_](/docs/tasks/configure-pod-container/enforce-standards-admission-controller/#configure-the-admission-controller) docs. We now have a default baseline policy. Next pass it to the kind configuration to enable the `--admission-control-config-file` API server argument and pass the policy file. To pass a file to a kind cluster, use a configuration file to pass additional setup instructions. Kind uses `kubeadm` to provision the cluster and the configuration file has the ability to pass `kubeadmConfigPatches` for further customization. In our case, the local file is mounted into the control plane node as `/etc/kubernetes/policies/pod-security.yaml` which is then mounted into the `apiServer` container. We also pass the `--admission-control-config-file` argument pointing to the policy's location. ```shell cat < kind-config.yaml kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane kubeadmConfigPatches: - | kind: ClusterConfiguration apiServer: # enable admission-control-config flag on the API server extraArgs: admission-control-config-file: /etc/kubernetes/policies/pod-security.yaml # mount new file / directories on the control plane extraVolumes: - name: policies hostPath: /etc/kubernetes/policies mountPath: /etc/kubernetes/policies readOnly: true pathType: "DirectoryOrCreate" # mount the local file on the control plane extraMounts: - hostPath: ./pod-security.yaml containerPath: /etc/kubernetes/policies/pod-security.yaml readOnly: true EOF ``` Create a new cluster using the kind configuration file defined above. ```shell kind create cluster --image kindest/node:v1.23.0 --config kind-config.yaml ``` Let's look at the default namespace. ```shell kubectl describe namespace default ``` The output is similar to this: ``` Name: default Labels: kubernetes.io/metadata.name=default Annotations: Status: Active No resource quota. No LimitRange resource. ``` Let's create a new namespace and see if the labels apply there. ```shell kubectl create namespace test-defaults kubectl describe namespace test-defaults ``` Same. ``` Name: test-defaults Labels: kubernetes.io/metadata.name=test-defaults Annotations: Status: Active No resource quota. No LimitRange resource. ``` Can a privileged workload be deployed? ```shell cat <