Add seccomp default feature blog post
This adds the blog post about the new Kubernetes `SeccompDefault` alpha feature. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>pull/28951/head
parent
86aa6c434d
commit
84e472e95c
|
@ -0,0 +1,267 @@
|
||||||
|
---
|
||||||
|
layout: blog
|
||||||
|
title: "Enable seccomp for all workloads with a new v1.22 alpha feature"
|
||||||
|
date: 2021-08-25
|
||||||
|
slug: seccomp-default
|
||||||
|
---
|
||||||
|
|
||||||
|
**Author:** Sascha Grunert, Red Hat
|
||||||
|
|
||||||
|
This blog post is about a new Kubernetes feature introduced in v1.22, which adds
|
||||||
|
an additional security layer on top of the existing seccomp support. Seccomp is
|
||||||
|
a security mechanism for Linux processes to filter system calls (syscalls) based
|
||||||
|
on a set of defined rules. Applying seccomp profiles to containerized workloads
|
||||||
|
is one of the key tasks when it comes to enhancing the security of the
|
||||||
|
application deployment. Developers, site reliability engineers and
|
||||||
|
infrastructure administrators have to work hand in hand to create, distribute
|
||||||
|
and maintain the profiles over the applications life-cycle.
|
||||||
|
|
||||||
|
You can use the [`securityContext`][seccontext] field of Pods and their
|
||||||
|
containers can be used to adjust security related configurations of the
|
||||||
|
workload. Kubernetes introduced dedicated [seccomp related API
|
||||||
|
fields][seccontext] in this `SecurityContext` with the [graduation of seccomp to
|
||||||
|
General Availability (GA)][ga] in v1.19.0. This enhancement allowed an easier
|
||||||
|
way to specify if the whole pod or a specific container should run as:
|
||||||
|
|
||||||
|
[seccontext]: /docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1
|
||||||
|
[ga]: https://kubernetes.io/blog/2020/08/26/kubernetes-release-1.19-accentuate-the-paw-sitive/#graduated-to-stable
|
||||||
|
|
||||||
|
- `Unconfined`: seccomp will not be enabled
|
||||||
|
- `RuntimeDefault`: the container runtimes default profile will be used
|
||||||
|
- `Localhost`: a node local profile will be applied, which is being referenced
|
||||||
|
by a relative path to the seccomp profile root (`<kubelet-root-dir>/seccomp`)
|
||||||
|
of the kubelet
|
||||||
|
|
||||||
|
With the graduation of seccomp, nothing has changed from an overall security
|
||||||
|
perspective, because `Unconfined` is still the default. This is totally fine if
|
||||||
|
you consider this from the upgrade path and backwards compatibility perspective of
|
||||||
|
Kubernetes releases. But it also means that it is more likely that a workload
|
||||||
|
runs without seccomp at all, which should be fixed in the long term.
|
||||||
|
|
||||||
|
## `SeccompDefault` to the rescue
|
||||||
|
|
||||||
|
Kubernetes v1.22.0 introduces a new kubelet [feature gate][gate]
|
||||||
|
`SeccompDefault`, which has been added in `alpha` state as every other new
|
||||||
|
feature. This means that it is disabled by default and can be enabled manually
|
||||||
|
for every single Kubernetes node.
|
||||||
|
|
||||||
|
[gate]: /docs/reference/command-line-tools-reference/feature-gates
|
||||||
|
|
||||||
|
What does the feature do? Well, it just changes the default seccomp profile from
|
||||||
|
`Unconfined` to `RuntimeDefault`. If not specified differently in the pod
|
||||||
|
manifest, then the feature will add a higher set of security constraints by
|
||||||
|
using the default profile of the container runtime. These profiles may differ
|
||||||
|
between runtimes like [CRI-O][crio] or [containerd][ctrd]. They also differ for
|
||||||
|
its used hardware architectures. But generally speaking, those default profiles
|
||||||
|
allow a common amount of syscalls while blocking the more dangerous ones, which
|
||||||
|
are unlikely or unsafe to be used in a containerized application.
|
||||||
|
|
||||||
|
[crio]: https://github.com/cri-o/cri-o/blob/fe30d62/vendor/github.com/containers/common/pkg/seccomp/default_linux.go#L45
|
||||||
|
[ctrd]: https://github.com/containerd/containerd/blob/e1445df/contrib/seccomp/seccomp_default.go#L51
|
||||||
|
|
||||||
|
### Enabling the feature
|
||||||
|
|
||||||
|
Two kubelet configuration changes have to be made to enable the feature:
|
||||||
|
|
||||||
|
1. **Enable the feature** gate by setting the `SeccompDefault=true` via the command
|
||||||
|
line (`--feature-gates`) or the [kubelet configuration][kubelet] file.
|
||||||
|
2. **Turn on the feature** by enabling the feature by adding the
|
||||||
|
`--seccomp-default` command line flag or via the [kubelet
|
||||||
|
configuration][kubelet] file (`seccompDefault: true`).
|
||||||
|
|
||||||
|
[kubelet]: /docs/tasks/administer-cluster/kubelet-config-file
|
||||||
|
|
||||||
|
The kubelet will error on startup if only one of the above steps have been done.
|
||||||
|
|
||||||
|
### Trying it out
|
||||||
|
|
||||||
|
If the feature is enabled on a node, then you can create a new workload like
|
||||||
|
this:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Pod
|
||||||
|
metadata:
|
||||||
|
name: test-pod
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: test-container
|
||||||
|
image: nginx:1.21
|
||||||
|
```
|
||||||
|
|
||||||
|
Now it is possible to inspect the used seccomp profile by using
|
||||||
|
[`crictl`][crictl] while investigating the containers [runtime
|
||||||
|
specification][rspec]:
|
||||||
|
|
||||||
|
[crictl]: https://github.com/kubernetes-sigs/cri-tools
|
||||||
|
[rspec]: https://github.com/opencontainers/runtime-spec/blob/0c021c1/config-linux.md#seccomp
|
||||||
|
|
||||||
|
```bash
|
||||||
|
CONTAINER_ID=$(sudo crictl ps -q --name=test-container)
|
||||||
|
sudo crictl inspect $CONTAINER_ID | jq .info.runtimeSpec.linux.seccomp
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
{
|
||||||
|
"defaultAction": "SCMP_ACT_ERRNO",
|
||||||
|
"architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"],
|
||||||
|
"syscalls": [
|
||||||
|
{
|
||||||
|
"names": ["_llseek", "_newselect", "accept", …, "write", "writev"],
|
||||||
|
"action": "SCMP_ACT_ALLOW"
|
||||||
|
},
|
||||||
|
…
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
You can see that the lower level container runtime ([CRI-O][crio-home] and
|
||||||
|
[runc][runc] in our case), successfully applied the default seccomp profile.
|
||||||
|
This profile denies all syscalls per default, while allowing commonly used ones
|
||||||
|
like [`accept`][accept] or [`write`][write].
|
||||||
|
|
||||||
|
[crio-home]: https://github.com/cri-o/cri-o
|
||||||
|
[runc]: https://github.com/opencontainers/runc
|
||||||
|
[accept]: https://man7.org/linux/man-pages/man2/accept.2.html
|
||||||
|
[write]: https://man7.org/linux/man-pages/man2/write.2.html
|
||||||
|
|
||||||
|
Please note that the feature will not influence any Kubernetes API for now.
|
||||||
|
Therefore, it is not possible to retrieve the used seccomp profile via `kubectl`
|
||||||
|
`get` or `describe` if the [`SeccompProfile`][api] field is unset within the
|
||||||
|
`SecurityContext`.
|
||||||
|
|
||||||
|
[api]: https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1
|
||||||
|
|
||||||
|
The feature also works when using multiple containers within a pod, for example
|
||||||
|
if you create a pod like this:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Pod
|
||||||
|
metadata:
|
||||||
|
name: test-pod
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: test-container-nginx
|
||||||
|
image: nginx:1.21
|
||||||
|
securityContext:
|
||||||
|
seccompProfile:
|
||||||
|
type: Unconfined
|
||||||
|
- name: test-container-redis
|
||||||
|
image: redis:6.2
|
||||||
|
```
|
||||||
|
|
||||||
|
then you should see that the `test-container-nginx` runs without a seccomp profile:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo crictl inspect $(sudo crictl ps -q --name=test-container-nginx) |
|
||||||
|
jq '.info.runtimeSpec.linux.seccomp == null'
|
||||||
|
true
|
||||||
|
```
|
||||||
|
|
||||||
|
Whereas the container `test-container-redis` runs with `RuntimeDefault`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo crictl inspect $(sudo crictl ps -q --name=test-container-redis) |
|
||||||
|
jq '.info.runtimeSpec.linux.seccomp != null'
|
||||||
|
true
|
||||||
|
```
|
||||||
|
|
||||||
|
The same applies to the pod itself, which also runs with the default profile:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo crictl inspectp (sudo crictl pods -q --name test-pod) |
|
||||||
|
jq '.info.runtimeSpec.linux.seccomp != null'
|
||||||
|
true
|
||||||
|
```
|
||||||
|
|
||||||
|
### Upgrade strategy
|
||||||
|
|
||||||
|
It is recommended to enable the feature in multiple steps, whereas different
|
||||||
|
risks and mitigations exist for each one.
|
||||||
|
|
||||||
|
#### Feature gate enabling
|
||||||
|
|
||||||
|
Enabling the feature gate at the kubelet level will not turn on the feature, but
|
||||||
|
will make it possible by using the `SeccompDefault` kubelet configuration or the
|
||||||
|
`--seccomp-default` CLI flag. This can be done by an administrator for the whole
|
||||||
|
cluster or only a set of nodes.
|
||||||
|
|
||||||
|
#### Testing the Application
|
||||||
|
|
||||||
|
If you're trying this within a dedicated test environment, you have to ensure
|
||||||
|
that the application code does not trigger syscalls blocked by the
|
||||||
|
`RuntimeDefault` profile before enabling the feature on a node. This can be done
|
||||||
|
by:
|
||||||
|
|
||||||
|
- _Recommended_: Analyzing the code (manually or by running the application with
|
||||||
|
[strace][strace]) for any executed syscalls which may be blocked by the
|
||||||
|
default profiles. If that's the case, then you can override the default by
|
||||||
|
explicitly setting the pod or container to run as `Unconfined`. Alternatively,
|
||||||
|
you can create a custom seccomp profile (see optional step below).
|
||||||
|
profile based on the default by adding the additional syscalls to the
|
||||||
|
`"action": "SCMP_ACT_ALLOW"` section.
|
||||||
|
|
||||||
|
- _Recommended_: Manually set the profile to the target workload and use a
|
||||||
|
rolling upgrade to deploy into production. Rollback the deployment if the
|
||||||
|
application does not work as intended.
|
||||||
|
|
||||||
|
- _Optional_: Run the application against an end-to-end test suite to trigger
|
||||||
|
all relevant code paths with `RuntimeDefault` enabled. If a test fails, use
|
||||||
|
the same mitigation as mentioned above.
|
||||||
|
|
||||||
|
- _Optional_: Create a custom seccomp profile based on the default and change
|
||||||
|
its default action from `SCMP_ACT_ERRNO` to `SCMP_ACT_LOG`. This means that
|
||||||
|
the seccomp filter for unknown syscalls will have no effect on the application
|
||||||
|
at all, but the system logs will now indicate which syscalls may be blocked.
|
||||||
|
This requires at least a Kernel version 4.14 as well as a recent [runc][runc]
|
||||||
|
release. Monitor the application hosts audit logs (defaults to
|
||||||
|
`/var/log/audit/audit.log`) or syslog entries (defaults to `/var/log/syslog`)
|
||||||
|
for syscalls via `type=SECCOMP` (for audit) or `type=1326` (for syslog).
|
||||||
|
Compare the syscall ID with those [listed in the Linux Kernel
|
||||||
|
sources][syscalls] and add them to the custom profile. Be aware that custom
|
||||||
|
audit policies may lead into missing syscalls, depending on the configuration
|
||||||
|
of auditd.
|
||||||
|
|
||||||
|
- _Optional_: Use cluster additions like the [Security Profiles Operator][spo]
|
||||||
|
for profiling the application via its [log enrichment][logs] capabilities or
|
||||||
|
recording a profile by using its [recording feature][rec]. This makes the
|
||||||
|
above mentioned manual log investigation obsolete.
|
||||||
|
|
||||||
|
[syscalls]: https://github.com/torvalds/linux/blob/7bb7f2a/arch/x86/entry/syscalls/syscall_64.tbl
|
||||||
|
[spo]: https://github.com/kubernetes-sigs/security-profiles-operator
|
||||||
|
[logs]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/c90ef3a/installation-usage.md#record-profiles-from-workloads-with-profilerecordings
|
||||||
|
[rec]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/c90ef3a/installation-usage.md#using-the-log-enricher
|
||||||
|
[strace]: https://man7.org/linux/man-pages/man1/strace.1.html
|
||||||
|
|
||||||
|
#### Deploying the modified application
|
||||||
|
|
||||||
|
Based on the outcome of the application tests, it may be required to change the
|
||||||
|
application deployment by either specifying `Unconfined` or a custom seccomp
|
||||||
|
profile. This is not the case if the application works as intended with
|
||||||
|
`RuntimeDefault`.
|
||||||
|
|
||||||
|
#### Enable the kubelet configuration
|
||||||
|
|
||||||
|
If everything went well, then the feature is ready to be enabled by the kubelet
|
||||||
|
configuration or its corresponding CLI flag. This should be done on a per-node
|
||||||
|
basis to reduce the overall risk of missing a syscall during the investigations
|
||||||
|
when running the application tests. If it's possible to monitor audit logs
|
||||||
|
within the cluster, then it's recommended to do this for eventually missed
|
||||||
|
seccomp events. If the application works as intended then the feature can be
|
||||||
|
enabled for further nodes within the cluster.
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
Thank you for reading this blog post! I hope you enjoyed to see how the usage of
|
||||||
|
seccomp profiles has been evolved in Kubernetes over the past releases as much
|
||||||
|
as I do. On your own cluster, change the default seccomp profile to
|
||||||
|
`RuntimeDefault` (using this new feature) and see the security benefits, and, of
|
||||||
|
course, feel free to reach out any time for feedback or questions.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
_Editor's note: If you have any questions or feedback about this blog post, feel
|
||||||
|
free to reach out via the [Kubernetes slack in #sig-node][slack]._
|
||||||
|
|
||||||
|
[slack]: https://kubernetes.slack.com/messages/sig-node
|
Loading…
Reference in New Issue