477 lines
15 KiB
Markdown
477 lines
15 KiB
Markdown
---
|
|
layout: blog
|
|
title: "CRI-O: Applying seccomp profiles from OCI registries"
|
|
date: 2024-03-07
|
|
slug: cri-o-seccomp-oci-artifacts
|
|
author: >
|
|
Sascha Grunert
|
|
---
|
|
|
|
Seccomp stands for secure computing mode and has been a feature of the Linux
|
|
kernel since version 2.6.12. It can be used to sandbox the privileges of a
|
|
process, restricting the calls it is able to make from userspace into the
|
|
kernel. Kubernetes lets you automatically apply seccomp profiles loaded onto a
|
|
node to your Pods and containers.
|
|
|
|
But distributing those seccomp profiles is a major challenge in Kubernetes,
|
|
because the JSON files have to be available on all nodes where a workload can
|
|
possibly run. Projects like the [Security Profiles
|
|
Operator](https://sigs.k8s.io/security-profiles-operator) solve that problem by
|
|
running as a daemon within the cluster, which makes me wonder which part of that
|
|
distribution could be done by the [container
|
|
runtime](/docs/setup/production-environment/container-runtimes).
|
|
|
|
Runtimes usually apply the profiles from a local path, for example:
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Pod
|
|
metadata:
|
|
name: pod
|
|
spec:
|
|
containers:
|
|
- name: container
|
|
image: nginx:1.25.3
|
|
securityContext:
|
|
seccompProfile:
|
|
type: Localhost
|
|
localhostProfile: nginx-1.25.3.json
|
|
```
|
|
|
|
The profile `nginx-1.25.3.json` has to be available in the root directory of the
|
|
kubelet, appended by the `seccomp` directory. This means the default location
|
|
for the profile on-disk would be `/var/lib/kubelet/seccomp/nginx-1.25.3.json`.
|
|
If the profile is not available, then runtimes will fail on container creation
|
|
like this:
|
|
|
|
```shell
|
|
kubectl get pods
|
|
```
|
|
|
|
```console
|
|
NAME READY STATUS RESTARTS AGE
|
|
pod 0/1 CreateContainerError 0 38s
|
|
```
|
|
|
|
```shell
|
|
kubectl describe pod/pod | tail
|
|
```
|
|
|
|
```console
|
|
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
|
|
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
|
|
Events:
|
|
Type Reason Age From Message
|
|
---- ------ ---- ---- -------
|
|
Normal Scheduled 117s default-scheduler Successfully assigned default/pod to 127.0.0.1
|
|
Normal Pulling 117s kubelet Pulling image "nginx:1.25.3"
|
|
Normal Pulled 111s kubelet Successfully pulled image "nginx:1.25.3" in 5.948s (5.948s including waiting)
|
|
Warning Failed 7s (x10 over 111s) kubelet Error: setup seccomp: unable to load local profile "/var/lib/kubelet/seccomp/nginx-1.25.3.json": open /var/lib/kubelet/seccomp/nginx-1.25.3.json: no such file or directory
|
|
Normal Pulled 7s (x9 over 111s) kubelet Container image "nginx:1.25.3" already present on machine
|
|
```
|
|
|
|
The major obstacle of having to manually distribute the `Localhost` profiles
|
|
will lead many end-users to fall back to `RuntimeDefault` or even running their
|
|
workloads as `Unconfined` (with disabled seccomp).
|
|
|
|
## CRI-O to the rescue
|
|
|
|
The Kubernetes container runtime [CRI-O](https://github.com/cri-o/cri-o)
|
|
provides various features using custom annotations. The v1.30 release
|
|
[adds](https://github.com/cri-o/cri-o/pull/7719) support for a new set of
|
|
annotations called `seccomp-profile.kubernetes.cri-o.io/POD` and
|
|
`seccomp-profile.kubernetes.cri-o.io/<CONTAINER>`. Those annotations allow you
|
|
to specify:
|
|
|
|
- a seccomp profile for a specific container, when used as:
|
|
`seccomp-profile.kubernetes.cri-o.io/<CONTAINER>` (example:
|
|
`seccomp-profile.kubernetes.cri-o.io/webserver:
|
|
'registry.example/example/webserver:v1'`)
|
|
- a seccomp profile for every container within a pod, when used without the
|
|
container name suffix but the reserved name `POD`:
|
|
`seccomp-profile.kubernetes.cri-o.io/POD`
|
|
- a seccomp profile for a whole container image, if the image itself contains
|
|
the annotation `seccomp-profile.kubernetes.cri-o.io/POD` or
|
|
`seccomp-profile.kubernetes.cri-o.io/<CONTAINER>`.
|
|
|
|
CRI-O will only respect the annotation if the runtime is configured to allow it,
|
|
as well as for workloads running as `Unconfined`. All other workloads will still
|
|
use the value from the `securityContext` with a higher priority.
|
|
|
|
The annotations alone will not help much with the distribution of the profiles,
|
|
but the way they can be referenced will! For example, you can now specify
|
|
seccomp profiles like regular container images by using OCI artifacts:
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Pod
|
|
metadata:
|
|
name: pod
|
|
annotations:
|
|
seccomp-profile.kubernetes.cri-o.io/POD: quay.io/crio/seccomp:v2
|
|
spec: …
|
|
```
|
|
|
|
The image `quay.io/crio/seccomp:v2` contains a `seccomp.json` file, which
|
|
contains the actual profile content. Tools like [ORAS](https://oras.land) or
|
|
[Skopeo](https://github.com/containers/skopeo) can be used to inspect the
|
|
contents of the image:
|
|
|
|
```shell
|
|
oras pull quay.io/crio/seccomp:v2
|
|
```
|
|
|
|
```console
|
|
Downloading 92d8ebfa89aa seccomp.json
|
|
Downloaded 92d8ebfa89aa seccomp.json
|
|
Pulled [registry] quay.io/crio/seccomp:v2
|
|
Digest: sha256:f0205dac8a24394d9ddf4e48c7ac201ca7dcfea4c554f7ca27777a7f8c43ec1b
|
|
```
|
|
|
|
```shell
|
|
jq . seccomp.json | head
|
|
```
|
|
|
|
```yaml
|
|
{
|
|
"defaultAction": "SCMP_ACT_ERRNO",
|
|
"defaultErrnoRet": 38,
|
|
"defaultErrno": "ENOSYS",
|
|
"archMap": [
|
|
{
|
|
"architecture": "SCMP_ARCH_X86_64",
|
|
"subArchitectures": [
|
|
"SCMP_ARCH_X86",
|
|
"SCMP_ARCH_X32"
|
|
```
|
|
|
|
```shell
|
|
# Inspect the plain manifest of the image
|
|
skopeo inspect --raw docker://quay.io/crio/seccomp:v2 | jq .
|
|
```
|
|
|
|
```yaml
|
|
{
|
|
"schemaVersion": 2,
|
|
"mediaType": "application/vnd.oci.image.manifest.v1+json",
|
|
"config":
|
|
{
|
|
"mediaType": "application/vnd.cncf.seccomp-profile.config.v1+json",
|
|
"digest": "sha256:ca3d163bab055381827226140568f3bef7eaac187cebd76878e0b63e9e442356",
|
|
"size": 3,
|
|
},
|
|
"layers":
|
|
[
|
|
{
|
|
"mediaType": "application/vnd.oci.image.layer.v1.tar",
|
|
"digest": "sha256:92d8ebfa89aa6dd752c6443c27e412df1b568d62b4af129494d7364802b2d476",
|
|
"size": 18853,
|
|
"annotations": { "org.opencontainers.image.title": "seccomp.json" },
|
|
},
|
|
],
|
|
"annotations": { "org.opencontainers.image.created": "2024-02-26T09:03:30Z" },
|
|
}
|
|
```
|
|
|
|
The image manifest contains a reference to a specific required config media type
|
|
(`application/vnd.cncf.seccomp-profile.config.v1+json`) and a single layer
|
|
(`application/vnd.oci.image.layer.v1.tar`) pointing to the `seccomp.json` file.
|
|
But now, let's give that new feature a try!
|
|
|
|
### Using the annotation for a specific container or whole pod
|
|
|
|
CRI-O needs to be configured adequately before it can utilize the annotation. To
|
|
do this, add the annotation to the `allowed_annotations` array for the runtime.
|
|
This can be done by using a drop-in configuration
|
|
`/etc/crio/crio.conf.d/10-crun.conf` like this:
|
|
|
|
```toml
|
|
[crio.runtime]
|
|
default_runtime = "crun"
|
|
|
|
[crio.runtime.runtimes.crun]
|
|
allowed_annotations = [
|
|
"seccomp-profile.kubernetes.cri-o.io",
|
|
]
|
|
```
|
|
|
|
Now, let's run CRI-O from the latest `main` commit. This can be done by either
|
|
building it from source, using the [static binary bundles](https://github.com/cri-o/packaging?tab=readme-ov-file#using-the-static-binary-bundles-directly)
|
|
or [the prerelease packages](https://github.com/cri-o/packaging?tab=readme-ov-file#usage).
|
|
|
|
To demonstrate this, I ran the `crio` binary from my command line using a single
|
|
node Kubernetes cluster via [`local-up-cluster.sh`](https://github.com/cri-o/cri-o?tab=readme-ov-file#running-kubernetes-with-cri-o).
|
|
Now that the cluster is up and running, let's try a pod without the annotation
|
|
running as seccomp `Unconfined`:
|
|
|
|
```shell
|
|
cat pod.yaml
|
|
```
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Pod
|
|
metadata:
|
|
name: pod
|
|
spec:
|
|
containers:
|
|
- name: container
|
|
image: nginx:1.25.3
|
|
securityContext:
|
|
seccompProfile:
|
|
type: Unconfined
|
|
```
|
|
|
|
```shell
|
|
kubectl apply -f pod.yaml
|
|
```
|
|
|
|
The workload is up and running:
|
|
|
|
```shell
|
|
kubectl get pods
|
|
```
|
|
|
|
```console
|
|
NAME READY STATUS RESTARTS AGE
|
|
pod 1/1 Running 0 15s
|
|
```
|
|
|
|
And no seccomp profile got applied if I inspect the container using
|
|
[`crictl`](https://sigs.k8s.io/cri-tools):
|
|
|
|
```shell
|
|
export CONTAINER_ID=$(sudo crictl ps --name container -q)
|
|
sudo crictl inspect $CONTAINER_ID | jq .info.runtimeSpec.linux.seccomp
|
|
```
|
|
|
|
```console
|
|
null
|
|
```
|
|
|
|
Now, let's modify the pod to apply the profile `quay.io/crio/seccomp:v2` to the
|
|
container:
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Pod
|
|
metadata:
|
|
name: pod
|
|
annotations:
|
|
seccomp-profile.kubernetes.cri-o.io/container: quay.io/crio/seccomp:v2
|
|
spec:
|
|
containers:
|
|
- name: container
|
|
image: nginx:1.25.3
|
|
```
|
|
|
|
I have to delete and recreate the Pod, because only recreation will apply a new
|
|
seccomp profile:
|
|
|
|
```shell
|
|
kubectl delete pod/pod
|
|
```
|
|
|
|
```console
|
|
pod "pod" deleted
|
|
```
|
|
|
|
```shell
|
|
kubectl apply -f pod.yaml
|
|
```
|
|
|
|
```console
|
|
pod/pod created
|
|
```
|
|
|
|
The CRI-O logs will now indicate that the runtime pulled the artifact:
|
|
|
|
```console
|
|
WARN[…] Allowed annotations are specified for workload [seccomp-profile.kubernetes.cri-o.io]
|
|
INFO[…] Found container specific seccomp profile annotation: seccomp-profile.kubernetes.cri-o.io/container=quay.io/crio/seccomp:v2 id=26ddcbe6-6efe-414a-88fd-b1ca91979e93 name=/runtime.v1.RuntimeService/CreateContainer
|
|
INFO[…] Pulling OCI artifact from ref: quay.io/crio/seccomp:v2 id=26ddcbe6-6efe-414a-88fd-b1ca91979e93 name=/runtime.v1.RuntimeService/CreateContainer
|
|
INFO[…] Retrieved OCI artifact seccomp profile of len: 18853 id=26ddcbe6-6efe-414a-88fd-b1ca91979e93 name=/runtime.v1.RuntimeService/CreateContainer
|
|
```
|
|
|
|
And the container is finally using the profile:
|
|
|
|
```shell
|
|
export CONTAINER_ID=$(sudo crictl ps --name container -q)
|
|
sudo crictl inspect $CONTAINER_ID | jq .info.runtimeSpec.linux.seccomp | head
|
|
```
|
|
|
|
```yaml
|
|
{
|
|
"defaultAction": "SCMP_ACT_ERRNO",
|
|
"defaultErrnoRet": 38,
|
|
"architectures": [
|
|
"SCMP_ARCH_X86_64",
|
|
"SCMP_ARCH_X86",
|
|
"SCMP_ARCH_X32"
|
|
],
|
|
"syscalls": [
|
|
{
|
|
```
|
|
|
|
The same would work for every container in the pod, if users replace the
|
|
`/container` suffix with the reserved name `/POD`, for example:
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Pod
|
|
metadata:
|
|
name: pod
|
|
annotations:
|
|
seccomp-profile.kubernetes.cri-o.io/POD: quay.io/crio/seccomp:v2
|
|
spec:
|
|
containers:
|
|
- name: container
|
|
image: nginx:1.25.3
|
|
```
|
|
|
|
### Using the annotation for a container image
|
|
|
|
While specifying seccomp profiles as OCI artifacts on certain workloads is a
|
|
cool feature, the majority of end users would like to link seccomp profiles to
|
|
published container images. This can be done by using a container image
|
|
annotation; instead of being applied to a Kubernetes Pod, the annotation is some
|
|
metadata applied at the container image itself. For example,
|
|
[Podman](https://podman.io) can be used to add the image annotation directly
|
|
during image build:
|
|
|
|
```shell
|
|
podman build \
|
|
--annotation seccomp-profile.kubernetes.cri-o.io=quay.io/crio/seccomp:v2 \
|
|
-t quay.io/crio/nginx-seccomp:v2 .
|
|
```
|
|
|
|
The pushed image then contains the annotation:
|
|
|
|
```shell
|
|
skopeo inspect --raw docker://quay.io/crio/nginx-seccomp:v2 |
|
|
jq '.annotations."seccomp-profile.kubernetes.cri-o.io"'
|
|
```
|
|
|
|
```console
|
|
"quay.io/crio/seccomp:v2"
|
|
```
|
|
|
|
If I now use that image in an CRI-O test pod definition:
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Pod
|
|
metadata:
|
|
name: pod
|
|
# no Pod annotations set
|
|
spec:
|
|
containers:
|
|
- name: container
|
|
image: quay.io/crio/nginx-seccomp:v2
|
|
```
|
|
|
|
Then the CRI-O logs will indicate that the image annotation got evaluated and
|
|
the profile got applied:
|
|
|
|
```shell
|
|
kubectl delete pod/pod
|
|
```
|
|
|
|
```console
|
|
pod "pod" deleted
|
|
```
|
|
|
|
```shell
|
|
kubectl apply -f pod.yaml
|
|
```
|
|
|
|
```console
|
|
pod/pod created
|
|
```
|
|
|
|
```console
|
|
INFO[…] Found image specific seccomp profile annotation: seccomp-profile.kubernetes.cri-o.io=quay.io/crio/seccomp:v2 id=c1f22c59-e30e-4046-931d-a0c0fdc2c8b7 name=/runtime.v1.RuntimeService/CreateContainer
|
|
INFO[…] Pulling OCI artifact from ref: quay.io/crio/seccomp:v2 id=c1f22c59-e30e-4046-931d-a0c0fdc2c8b7 name=/runtime.v1.RuntimeService/CreateContainer
|
|
INFO[…] Retrieved OCI artifact seccomp profile of len: 18853 id=c1f22c59-e30e-4046-931d-a0c0fdc2c8b7 name=/runtime.v1.RuntimeService/CreateContainer
|
|
INFO[…] Created container 116a316cd9a11fe861dd04c43b94f45046d1ff37e2ed05a4e4194fcaab29ee63: default/pod/container id=c1f22c59-e30e-4046-931d-a0c0fdc2c8b7 name=/runtime.v1.RuntimeService/CreateContainer
|
|
```
|
|
|
|
```shell
|
|
export CONTAINER_ID=$(sudo crictl ps --name container -q)
|
|
sudo crictl inspect $CONTAINER_ID | jq .info.runtimeSpec.linux.seccomp | head
|
|
```
|
|
|
|
```yaml
|
|
{
|
|
"defaultAction": "SCMP_ACT_ERRNO",
|
|
"defaultErrnoRet": 38,
|
|
"architectures": [
|
|
"SCMP_ARCH_X86_64",
|
|
"SCMP_ARCH_X86",
|
|
"SCMP_ARCH_X32"
|
|
],
|
|
"syscalls": [
|
|
{
|
|
```
|
|
|
|
For container images, the annotation `seccomp-profile.kubernetes.cri-o.io` will
|
|
be treated in the same way as `seccomp-profile.kubernetes.cri-o.io/POD` and
|
|
applies to the whole pod. In addition to that, the whole feature also works when
|
|
using the container specific annotation on an image, for example if a container
|
|
is named `container1`:
|
|
|
|
```shell
|
|
skopeo inspect --raw docker://quay.io/crio/nginx-seccomp:v2-container |
|
|
jq '.annotations."seccomp-profile.kubernetes.cri-o.io/container1"'
|
|
```
|
|
|
|
```console
|
|
"quay.io/crio/seccomp:v2"
|
|
```
|
|
|
|
The cool thing about this whole feature is that users can now create seccomp
|
|
profiles for specific container images and store them side by side in the same
|
|
registry. Linking the images to the profiles provides a great flexibility to
|
|
maintain them over the whole application's life cycle.
|
|
|
|
### Pushing profiles using ORAS
|
|
|
|
The actual creation of the OCI object that contains a seccomp profile requires a
|
|
bit more work when using ORAS. I have the hope that tools like Podman will
|
|
simplify the overall process in the future. Right now, the container registry
|
|
needs to be [OCI compatible](https://oras.land/docs/compatible_oci_registries/#registries-supporting-oci-artifacts),
|
|
which is also the case for [Quay.io](https://quay.io). CRI-O expects the seccomp
|
|
profile object to have a container image media type
|
|
(`application/vnd.cncf.seccomp-profile.config.v1+json`), while ORAS uses
|
|
`application/vnd.oci.empty.v1+json` per default. To achieve all of that, the
|
|
following commands can be executed:
|
|
|
|
```shell
|
|
echo "{}" > config.json
|
|
oras push \
|
|
--config config.json:application/vnd.cncf.seccomp-profile.config.v1+json \
|
|
quay.io/crio/seccomp:v2 seccomp.json
|
|
```
|
|
|
|
The resulting image contains the `mediaType` that CRI-O expects. ORAS pushes a
|
|
single layer `seccomp.json` to the registry. The name of the profile does not
|
|
matter much. CRI-O will pick the first layer and check if that can act as a
|
|
seccomp profile.
|
|
|
|
## Future work
|
|
|
|
CRI-O internally manages the OCI artifacts like regular files. This provides the
|
|
benefit of moving them around, removing them if not used any more or having any
|
|
other data available than seccomp profiles. This enables future enhancements in
|
|
CRI-O on top of OCI artifacts, but also allows thinking about stacking seccomp
|
|
profiles as part of having multiple layers in an OCI artifact. The limitation
|
|
that it only works for `Unconfined` workloads for v1.30.x releases is something
|
|
different CRI-O would like to address in the future. Simplifying the overall
|
|
user experience by not compromising security seems to be the key for a
|
|
successful future of seccomp in container workloads.
|
|
|
|
The CRI-O maintainers will be happy to listen to any feedback or suggestions on
|
|
the new feature! Thank you for reading this blog post, feel free to reach out
|
|
to the maintainers via the Kubernetes [Slack channel #crio](https://kubernetes.slack.com/messages/CAZH62UR1)
|
|
or create an issue in the [GitHub repository](https://github.com/cri-o/cri-o).
|