Add documentation for generally available seccomp functionality

Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
pull/21278/head
hasheddan 2020-07-06 22:17:16 -05:00
parent 754c6b1a64
commit 3ad7ea77f1
No known key found for this signature in database
GPG Key ID: BD68BC686A14C271
17 changed files with 637 additions and 10 deletions

View File

@ -599,8 +599,11 @@ documentation](/docs/tutorials/clusters/apparmor/#podsecuritypolicy-annotations)
### Seccomp
The use of seccomp profiles in pods can be controlled via annotations on the
PodSecurityPolicy. Seccomp is an alpha feature in Kubernetes.
As of Kubernetes v1.19, you can use the `seccompProfile` field in the
`securityContext` of Pods or containers to [control use of seccomp
profiles](/docs/tutorials/clusters/seccomp). In prior versions, seccomp was
controlled by adding annotations to a Pod. The same PodSecurityPolicies can be
used with either version to enforce how these fields or annotations are applied.
**seccomp.security.alpha.kubernetes.io/defaultProfileName** - Annotation that
specifies the default seccomp profile to apply to containers. Possible values
@ -609,11 +612,18 @@ are:
- `unconfined` - Seccomp is not applied to the container processes (this is the
default in Kubernetes), if no alternative is provided.
- `runtime/default` - The default container runtime profile is used.
- `docker/default` - The Docker default seccomp profile is used. Deprecated as of
Kubernetes 1.11. Use `runtime/default` instead.
- `docker/default` - The Docker default seccomp profile is used. Deprecated as
of Kubernetes 1.11. Use `runtime/default` instead.
- `localhost/<path>` - Specify a profile as a file on the node located at
`<seccomp_root>/<path>`, where `<seccomp_root>` is defined via the
`--seccomp-profile-root` flag on the Kubelet.
`--seccomp-profile-root` flag on the Kubelet. If the `--seccomp-profile-root`
flag is not defined, the default path will be used, which is
`<root-dir>/seccomp` where `<root-dir>` is specified by the `--root-dir` flag.
{{< note >}}
The `--seccomp-profile-root` flag is deprecated since Kubernetes
v1.19. Users are encouraged to use the default path.
{{< /note >}}
**seccomp.security.alpha.kubernetes.io/allowedProfileNames** - Annotation that
specifies which values are allowed for the pod seccomp annotations. Specified as

View File

@ -249,13 +249,14 @@ well as lower-trust users.The following listed controls should be enforced/disal
<tr>
<td>Seccomp</td>
<td>
The 'runtime/default' seccomp profile must be required, or allow specific additional profiles.<br>
The RuntimeDefault seccomp profile must be required, or allow specific additional profiles.<br>
<br><b>Restricted Fields:</b><br>
metadata.annotations['seccomp.security.alpha.kubernetes.io/pod']<br>
metadata.annotations['container.seccomp.security.alpha.kubernetes.io/*']<br>
spec.securityContext.seccompProfile.type<br>
spec.containers[*].securityContext.seccompProfile<br>
spec.initContainers[*].securityContext.seccompProfile<br>
<br><b>Allowed Values:</b><br>
'runtime/default'<br>
undefined (container annotation)<br>
undefined / nil<br>
</td>
</tr>
</tbody>

View File

@ -359,6 +359,40 @@ for definitions of the capability constants.
Linux capability constants have the form `CAP_XXX`. But when you list capabilities in your Container manifest, you must omit the `CAP_` portion of the constant. For example, to add `CAP_SYS_TIME`, include `SYS_TIME` in your list of capabilities.
{{< /note >}}
## Set the Seccomp Profile for a Container
To set the Seccomp profile for a Container, include the `seccompProfile` field
in the `securityContext` section of your Pod or Container manifest. The
`seccompProfile` field is a
[SeccompProfile](/docs/reference/generated/kubernetes-api/{{< param "version"
>}}/#seccompprofile-v1-core) object consisting of `type` and `localhostProfile`.
Valid options for `type` include `RuntimeDefault`, `Unconfined`, and
`Localhost`. `localhostProfile` must only be set set if `type: Localhost`. It
indicates the path of the pre-configured profile on the node, relative to the
kubelet's configured Seccomp profile location (configured with the `--root-dir`
flag).
Here is an example that sets the Seccomp profile to the node's container runtime
default profile:
```yaml
...
securityContext:
seccompProfile:
type: RuntimeDefault
```
Here is an example that sets the Seccomp profile to a pre-configured file at
`<kubelet-root-dir>/seccomp/my-profiles/profile-allow.json`:
```yaml
...
securityContext:
seccompProfile:
type: Localhost
localhostProfile: my-profiles/profile-allow.json
```
## Assign SELinux labels to a Container
To assign SELinux labels to a Container, include the `seLinuxOptions` field in

View File

@ -1,8 +1,9 @@
---
reviewers:
- stclair
title: AppArmor
title: Restrict a Container's Access to Resources with AppArmor
content_type: tutorial
weight: 10
---
<!-- overview -->

View File

@ -0,0 +1,368 @@
---
reviewers:
- hasheddan
- pjbgf
- saschagrunert
title: Restrict a Container's Syscalls with Seccomp
content_type: tutorial
weight: 20
---
<!-- overview -->
{{< feature-state for_k8s_version="v1.19" state="stable" >}}
Seccomp stands for secure computing mode and has been a feature of the Linux
kernel since version 2.6.12. It can be used to sandbox the privileges of a
process, restricting the calls it is able to make from userspace into the
kernel. Kubernetes lets you automatically apply seccomp profiles loaded onto a
Node to your Pods and containers.
Identifying the privileges required for your workloads can be difficult. In this
tutorial, you will go through how to load seccomp profiles into a local
Kubernetes cluster, how to apply them to a Pod, and how you can begin to craft
profiles that give only the necessary privileges to your container processes.
## {{% heading "objectives" %}}
* Learn how to load seccomp profiles on a node
* Learn how to apply a seccomp profile to a container
* Observe auditing of syscalls made by a container process
* Observe behavior when a missing profile is specified
* Observe a violation of a seccomp profile
* Learn how to create fine-grained seccomp profiles
* Learn how to apply a container runtime default seccomp profile
## {{% heading "prerequisites" %}}
In order to complete all steps in this tutorial, you must install
[kind](https://kind.sigs.k8s.io/docs/user/quick-start/) and
[kubectl](/doc/tasks/tools/install-kubectl/). This tutorial will show examples
with both alpha (pre-v1.19) and generally available seccomp functionality, so
make sure that your cluster is [configured
correctly](https://kind.sigs.k8s.io/docs/user/quick-start/#setting-kubernetes-version)
for the version you are using.
<!-- steps -->
## Create Seccomp Profiles
The contents of these profiles will be explored later on, but for now go ahead
and download them into a directory named `profiles/` so that they can be loaded
into the cluster.
{{< tabs name="tab_with_code" >}}
{{{< tab name="audit.json" >}}
{{< codenew file="pods/security/seccomp/profiles/audit.json" >}}
{{< /tab >}}
{{< tab name="violation.json" >}}
{{< codenew file="pods/security/seccomp/profiles/violation.json" >}}
{{< /tab >}}}
{{< tab name="fine-grained.json" >}}
{{< codenew file="pods/security/seccomp/profiles/fine-grained.json" >}}
{{< /tab >}}}
{{< /tabs >}}
## Create a Local Kubernetes Cluster with Kind
For simplicity, [kind](https://kind.sigs.k8s.io/) can be used to create a single
node cluster with the seccomp profiles loaded. Kind runs Kubernetes in Docker,
so each node of the cluster is actually just a container. This allows for files
to be mounted in the filesystem of each container just as one might load files
onto a node.
{{< codenew file="pods/security/seccomp/kind.yaml" >}}
<br>
Download the example above, and save it to a file named `kind.yaml`. Then create
the cluster with the configuration.
```
kind create cluster --config=kind.yaml
```
Once the cluster is ready, identify the container running as the single node
cluster:
```
docker ps
```
You should see output indicating that a container is running with name
`kind-control-plane`.
```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6a96207fed4b kindest/node:v1.18.2 "/usr/local/bin/entr…" 27 seconds ago Up 24 seconds 127.0.0.1:42223->6443/tcp kind-control-plane
```
If observing the filesystem of that container, one should see that the
`profiles/` directory has been successfully loaded into the default seccomp path
of the kubelet. Use `docker exec` to run a command in the Pod:
```
docker exec -it 6a96207fed4b ls /var/lib/kubelet/seccomp/profiles
```
```
audit.json fine-grained.json violation.json
```
## Create a Pod with a Seccomp profile for syscall auditing
To start off, apply the `audit.json` profile, which will log all syscalls of the
process, to a new Pod.
Download the correct manifest for your Kubernetes version:
{{< tabs name="audit_pods" >}}
{{< tab name="v1.19 or Later (GA)" >}}
{{< codenew file="pods/security/seccomp/ga/audit-pod.yaml" >}}
{{< /tab >}}}
{{{< tab name="Pre-v1.19 (alpha)" >}}
{{< codenew file="pods/security/seccomp/alpha/audit-pod.yaml" >}}
{{< /tab >}}
{{< /tabs >}}
<br>
Create the Pod in the cluster:
```
kubectl apply -f audit-pod.yaml
```
This profile does not restrict any syscalls, so the Pod should start
successfully.
```
kubectl get pod/audit-pod
```
```
NAME READY STATUS RESTARTS AGE
audit-pod 1/1 Running 0 30s
```
In order to be able to interact with this endpoint exposed by this
container,create a NodePort Service that allows access to the endpoint from
inside the kind control plane container.
```
kubectl expose pod/audit-pod --type NodePort --port 5678
```
Check what port the Service has been assigned on the node.
```
kubectl get svc/audit-pod
```
```
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
audit-pod NodePort 10.111.36.142 <none> 5678:32373/TCP 72s
```
Now you can `curl` the endpoint from inside the kind control plane container at
the port exposed by this Service. Use `docker exec` to run a command in the Pod:
```
docker exec -it 6a96207fed4b curl localhost:32373
```
```
just made some syscalls!
```
You can see that the process is running, but what syscalls did it actually make?
Because this Pod is running in a local cluster, you should be able to see those
in `/var/log/syslog`. Open up a new terminal window and `tail` the output for
calls from `http-echo`:
```
tail -f /var/log/syslog | grep 'http-echo'
```
You should already see some logs of syscalls made by `http-echo`, and if you
`curl` the endpoint in the control plane container you will see more written.
```
Jul 6 15:37:40 my-machine kernel: [369128.669452] audit: type=1326 audit(1594067860.484:14536): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=51 compat=0 ip=0x46fe1f code=0x7ffc0000
Jul 6 15:37:40 my-machine kernel: [369128.669453] audit: type=1326 audit(1594067860.484:14537): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=54 compat=0 ip=0x46fdba code=0x7ffc0000
Jul 6 15:37:40 my-machine kernel: [369128.669455] audit: type=1326 audit(1594067860.484:14538): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=202 compat=0 ip=0x455e53 code=0x7ffc0000
Jul 6 15:37:40 my-machine kernel: [369128.669456] audit: type=1326 audit(1594067860.484:14539): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=288 compat=0 ip=0x46fdba code=0x7ffc0000
Jul 6 15:37:40 my-machine kernel: [369128.669517] audit: type=1326 audit(1594067860.484:14540): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=0 compat=0 ip=0x46fd44 code=0x7ffc0000
Jul 6 15:37:40 my-machine kernel: [369128.669519] audit: type=1326 audit(1594067860.484:14541): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=270 compat=0 ip=0x4559b1 code=0x7ffc0000
Jul 6 15:38:40 my-machine kernel: [369188.671648] audit: type=1326 audit(1594067920.488:14559): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=270 compat=0 ip=0x4559b1 code=0x7ffc0000
Jul 6 15:38:40 my-machine kernel: [369188.671726] audit: type=1326 audit(1594067920.488:14560): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=202 compat=0 ip=0x455e53 code=0x7ffc0000
```
You can begin to understand the syscalls required by the `http-echo` process by
looking at the `syscall=` entry on each line. While these are unlikely to
encompass all syscalls it uses, it can serve as a basis for a seccomp profile
for this container.
Clean up that Pod and Service before moving to the next section:
```
kubectl delete pod/audit-pod
kubectl delete svc/audit-pod
```
## Create Pod with Seccomp Profile that Causes Violation
For demonstration, apply a profile to the Pod that does not allow for any
syscalls.
Download the correct manifest for your Kubernetes version:
{{< tabs name="violation_pods" >}}
{{< tab name="v1.19 or Later (GA)" >}}
{{< codenew file="pods/security/seccomp/ga/violation-pod.yaml" >}}
{{< /tab >}}}
{{{< tab name="Pre-v1.19 (alpha)" >}}
{{< codenew file="pods/security/seccomp/alpha/violation-pod.yaml" >}}
{{< /tab >}}
{{< /tabs >}}
<br>
Create the Pod in the cluster:
```
kubectl apply -f violation-pod.yaml
```
If you check the status of the Pod, you should see that it failed to start.
```
kubectl get pod/violation-pod
```
```
NAME READY STATUS RESTARTS AGE
violation-pod 0/1 CrashLoopBackOff 1 6s
```
As seen in the previous example, the `http-echo` process requires quite a few
syscalls. Here seccomp has been instructed to error on any syscall by setting
`"defaultAction": "SCMP_ACT_ERRNO"`. This is extremely secure, but removes the
ability to do anything meaningful. What you really want is to give workloads
only the privileges they need.
Clean up that Pod and Service before moving to the next section:
```
kubectl delete pod/violation-pod
kubectl delete svc/violation-pod
```
## Create Pod with Seccomp Profile that Only Allows Necessary Syscalls
If you take a look at the `fine-pod.json`, you will notice some of the syscalls
seen in the first example where the profile set `"defaultAction":
"SCMP_ACT_LOG"`. Now the profile is setting `"defaultAction": "SCMP_ACT_ERRNO"`,
but explicitly allowing a set of syscalls in the `"action": "SCMP_ACT_ALLOW"`
block. Ideally, the container will run successfully and you will see no messages
sent to `syslog`.
Download the correct manifest for your Kubernetes version:
{{< tabs name="fine_pods" >}}
{{< tab name="v1.19 or Later (GA)" >}}
{{< codenew file="pods/security/seccomp/ga/fine-pod.yaml" >}}
{{< /tab >}}}
{{{< tab name="Pre-v1.19 (alpha)" >}}
{{< codenew file="pods/security/seccomp/alpha/fine-pod.yaml" >}}
{{< /tab >}}
{{< /tabs >}}
<br>
Create the Pod in your cluster:
```
kubectl apply -f fine-pod.yaml
```
The Pod should start successfully.
```
kubectl get pod/fine-pod
```
```
NAME READY STATUS RESTARTS AGE
fine-pod 1/1 Running 0 30s
```
Open up a new terminal window and `tail` the output for calls from `http-echo`:
```
tail -f /var/log/syslog | grep 'http-echo'
```
Expose the Pod with a NodePort Service:
```
kubectl expose pod/fine-pod --type NodePort --port 5678
```
Check what port the Service has been assigned on the node:
```
kubectl get svc/fine-pod
```
```
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fine-pod NodePort 10.111.36.142 <none> 5678:32373/TCP 72s
```
`curl` the endpoint from inside the kind control plane container:
```
docker exec -it 6a96207fed4b curl localhost:32373
```
```
just made some syscalls!
```
You should see no output in the `syslog` because the profile allowed all
necessary syscalls and specified that an error should occur if one outside of
the list is invoked. This is an ideal situation from a security perspective, but
required some effort in analyzing the program. It would be nice if there was a
simple way to get closer to this security without requiring as much effort.
Clean up that Pod and Service before moving to the next section:
```
kubectl delete pod/fine-pod
kubectl delete svc/fine-pod
```
## Create Pod that uses the Container Runtime Default Seccomp Profile
Most container runtimes provide a sane set of default syscalls that are allowed
or not. The defaults can easily be applied in Kubernetes by using the
`runtime/default` annotation or setting the seccomp type in the security context
of a pod or container to `RuntimeDefault`.
Download the correct manifest for your Kubernetes version:
{{< tabs name="default_pods" >}}
{{< tab name="v1.19 or Later (GA)" >}}
{{< codenew file="pods/security/seccomp/ga/default-pod.yaml" >}}
{{< /tab >}}}
{{{< tab name="Pre-v1.19 (alpha)" >}}
{{< codenew file="pods/security/seccomp/alpha/default-pod.yaml" >}}
{{< /tab >}}
{{< /tabs >}}
<br>
The default seccomp profile should provide adequate access for most workloads.
## {{% heading "whatsnext" %}}
Additional resources:
* [A Seccomp Overview](https://lwn.net/Articles/656307/)
* [Seccomp Security Profiles for Docker](https://docs.docker.com/engine/security/seccomp/)

View File

@ -0,0 +1,16 @@
apiVersion: v1
kind: Pod
metadata:
name: audit-pod
labels:
app: audit-pod
annotations:
seccomp.security.alpha.kubernetes.io/pod: localhost/profiles/audit.json
spec:
containers:
- name: test-container
image: hashicorp/http-echo:0.2.3
args:
- "-text=just made some syscalls!"
securityContext:
allowPrivilegeEscalation: false

View File

@ -0,0 +1,16 @@
apiVersion: v1
kind: Pod
metadata:
name: default-pod
labels:
app: default-pod
annotations:
seccomp.security.alpha.kubernetes.io/pod: runtime/default
spec:
containers:
- name: test-container
image: hashicorp/http-echo:0.2.3
args:
- "-text=just made some syscalls!"
securityContext:
allowPrivilegeEscalation: false

View File

@ -0,0 +1,16 @@
apiVersion: v1
kind: Pod
metadata:
name: fine-pod
labels:
app: fine-pod
annotations:
seccomp.security.alpha.kubernetes.io/pod: localhost/profiles/fine-grained.json
spec:
containers:
- name: test-container
image: hashicorp/http-echo:0.2.3
args:
- "-text=just made some syscalls!"
securityContext:
allowPrivilegeEscalation: false

View File

@ -0,0 +1,16 @@
apiVersion: v1
kind: Pod
metadata:
name: violation-pod
labels:
app: violation-pod
annotations:
seccomp.security.alpha.kubernetes.io/pod: localhost/profiles/violation.json
spec:
containers:
- name: test-container
image: hashicorp/http-echo:0.2.3
args:
- "-text=just made some syscalls!"
securityContext:
allowPrivilegeEscalation: false

View File

@ -0,0 +1,18 @@
apiVersion: v1
kind: Pod
metadata:
name: audit-pod
labels:
app: audit-pod
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/audit.json
containers:
- name: test-container
image: hashicorp/http-echo:0.2.3
args:
- "-text=just made some syscalls!"
securityContext:
allowPrivilegeEscalation: false

View File

@ -0,0 +1,17 @@
apiVersion: v1
kind: Pod
metadata:
name: audit-pod
labels:
app: audit-pod
spec:
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: test-container
image: hashicorp/http-echo:0.2.3
args:
- "-text=just made some syscalls!"
securityContext:
allowPrivilegeEscalation: false

View File

@ -0,0 +1,18 @@
apiVersion: v1
kind: Pod
metadata:
name: fine-pod
labels:
app: fine-pod
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/fine-grained.json
containers:
- name: test-container
image: hashicorp/http-echo:0.2.3
args:
- "-text=just made some syscalls!"
securityContext:
allowPrivilegeEscalation: false

View File

@ -0,0 +1,18 @@
apiVersion: v1
kind: Pod
metadata:
name: violation-pod
labels:
app: violation-pod
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/violation.json
containers:
- name: test-container
image: hashicorp/http-echo:0.2.3
args:
- "-text=just made some syscalls!"
securityContext:
allowPrivilegeEscalation: false

View File

@ -0,0 +1,7 @@
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
- role: control-plane
extraMounts:
- hostPath: "./profiles"
containerPath: "/var/lib/kubelet/seccomp/profiles"

View File

@ -0,0 +1,3 @@
{
"defaultAction": "SCMP_ACT_LOG"
}

View File

@ -0,0 +1,65 @@
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86_64",
"SCMP_ARCH_X86",
"SCMP_ARCH_X32"
],
"syscalls": [
{
"names": [
"accept4",
"epoll_wait",
"pselect6",
"futex",
"madvise",
"epoll_ctl",
"getsockname",
"setsockopt",
"vfork",
"mmap",
"read",
"write",
"close",
"arch_prctl",
"sched_getaffinity",
"munmap",
"brk",
"rt_sigaction",
"rt_sigprocmask",
"sigaltstack",
"gettid",
"clone",
"bind",
"socket",
"openat",
"readlinkat",
"exit_group",
"epoll_create1",
"listen",
"rt_sigreturn",
"sched_yield",
"clock_gettime",
"connect",
"dup2",
"epoll_pwait",
"execve",
"exit",
"fcntl",
"getpid",
"getuid",
"ioctl",
"mprotect",
"nanosleep",
"open",
"poll",
"recvfrom",
"sendto",
"set_tid_address",
"setitimer",
"writev"
],
"action": "SCMP_ACT_ALLOW"
}
]
}

View File

@ -0,0 +1,3 @@
{
"defaultAction": "SCMP_ACT_ERRNO"
}