From af4cec641c9df7b55aa104c4281264d0f095af30 Mon Sep 17 00:00:00 2001
From: Sascha Grunert <sgrunert@redhat.com>
Date: Mon, 3 Apr 2023 12:18:27 +0200
Subject: [PATCH] Add blog post about recording seccomp profiles in edge
 scenarios

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
---
 .../2023-05-18-seccomp-profiles-edge.md       | 282 ++++++++++++++++++
 1 file changed, 282 insertions(+)
 create mode 100644 content/en/blog/_posts/2023-05-18-seccomp-profiles-edge.md

diff --git a/content/en/blog/_posts/2023-05-18-seccomp-profiles-edge.md b/content/en/blog/_posts/2023-05-18-seccomp-profiles-edge.md
new file mode 100644
index 00000000000..b661c618c24
--- /dev/null
+++ b/content/en/blog/_posts/2023-05-18-seccomp-profiles-edge.md
@@ -0,0 +1,282 @@
+---
+layout: blog
+title: "Having fun with seccomp profiles on the edge"
+date: 2023-05-18
+slug: seccomp-profiles-edge
+---
+
+**Author**: Sascha Grunert
+
+The [Security Profiles Operator (SPO)][spo] is a feature-rich
+[operator][operator] for Kubernetes to make managing seccomp, SELinux and
+AppArmor profiles easier than ever. Recording those profiles from scratch is one
+of the key features of this operator, which usually involves the integration
+into large CI/CD systems. Being able to test the recording capabilities of the
+operator in edge cases is one of the recent development efforts of the SPO and
+makes it excitingly easy to play around with seccomp profiles.
+
+[spo]: https://github.com/kubernetes-sigs/security-profiles-operator
+[operator]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator
+
+## Recording seccomp profiles with `spoc record`
+
+The [v0.8.0][spo-latest] release of the Security Profiles Operator shipped a new
+command line interface called `spoc`, a little helper tool for recording and
+replaying seccomp profiles among various other things that are out of scope of
+this blog post.
+
+[spo-latest]: https://github.com/kubernetes-sigs/security-profiles-operator/releases/v0.8.0
+
+Recording a seccomp profile requires a binary to be executed, which can be a
+simple golang application which just calls [`uname(2)`][uname]:
+
+```go
+package main
+
+import (
+	"syscall"
+)
+
+func main() {
+	utsname := syscall.Utsname{}
+	if err := syscall.Uname(&utsname); err != nil {
+		panic(err)
+	}
+}
+```
+
+[uname]: https://man7.org/linux/man-pages/man2/uname.2.html
+
+Building a binary from that code can be done by:
+
+```console
+> go build -o main main.go
+> ldd ./main
+        not a dynamic executable
+```
+
+Now it's possible to download the latest binary of [`spoc` from
+GitHub][spoc-latest] and run the application on Linux with it:
+
+[spoc-latest]: https://github.com/kubernetes-sigs/security-profiles-operator/releases/download/v0.8.0/spoc.amd64
+
+```console
+> sudo ./spoc record ./main
+10:08:25.591945 Loading bpf module
+10:08:25.591958 Using system btf file
+libbpf: loading object 'recorder.bpf.o' from buffer
+…
+libbpf: prog 'sys_enter': relo #3: patched insn #22 (ALU/ALU64) imm 16 -> 16
+10:08:25.610767 Getting bpf program sys_enter
+10:08:25.610778 Attaching bpf tracepoint
+10:08:25.611574 Getting syscalls map
+10:08:25.611582 Getting pid_mntns map
+10:08:25.613097 Module successfully loaded
+10:08:25.613311 Processing events
+10:08:25.613693 Running command with PID: 336007
+10:08:25.613835 Received event: pid: 336007, mntns: 4026531841
+10:08:25.613951 No container ID found for PID (pid=336007, mntns=4026531841, err=unable to find container ID in cgroup path)
+10:08:25.614856 Processing recorded data
+10:08:25.614975 Found process mntns 4026531841 in bpf map
+10:08:25.615110 Got syscalls: read, close, mmap, rt_sigaction, rt_sigprocmask, madvise, nanosleep, clone, uname, sigaltstack, arch_prctl, gettid, futex, sched_getaffinity, exit_group, openat
+10:08:25.615195 Adding base syscalls: access, brk, capget, capset, chdir, chmod, chown, close_range, dup2, dup3, epoll_create1, epoll_ctl, epoll_pwait, execve, faccessat2, fchdir, fchmodat, fchown, fchownat, fcntl, fstat, fstatfs, getdents64, getegid, geteuid, getgid, getpid, getppid, getuid, ioctl, keyctl, lseek, mkdirat, mknodat, mount, mprotect, munmap, newfstatat, openat2, pipe2, pivot_root, prctl, pread64, pselect6, readlink, readlinkat, rt_sigreturn, sched_yield, seccomp, set_robust_list, set_tid_address, setgid, setgroups, sethostname, setns, setresgid, setresuid, setsid, setuid, statfs, statx, symlinkat, tgkill, umask, umount2, unlinkat, unshare, write
+10:08:25.616293 Wrote seccomp profile to: /tmp/profile.yaml
+10:08:25.616298 Unloading bpf module
+```
+
+I have to execute `spoc` as root because it will internally run an [ebpf][ebpf]
+program by reusing the same code parts from the Security Profiles Operator
+itself. I can see that the bpf module got loaded successfully and `spoc`
+attached the required tracepoint to it. Then it will track the main application
+by using its [mount namespace][mntns] and process the recorded syscall data. The
+nature of ebpf programs is that they see the whole context of the Kernel, which
+means that `spoc` tracks all syscalls of the system, but does not interfere with
+their execution.
+
+[ebpf]: https://ebpf.io
+[mntns]: https://man7.org/linux/man-pages/man7/mount_namespaces.7.html
+
+The logs indicate that `spoc` found the syscalls `read`, `close`,
+`mmap` and so on, including `uname`. All other syscalls than `uname` are coming
+from the golang runtime and its garbage collection, which already adds overhead
+to a basic application like in our demo. I can also see from the log line
+`Adding base syscalls: …` that `spoc` adds a bunch of base syscalls to the
+resulting profile. Those are used by the OCI runtime (like [runc][runc] or
+[crun][crun]) in order to be able to run a container. This means that `spoc`
+can be used to record seccomp profiles which then can be containerized directly.
+This behavior can be disabled in `spoc` by using the `--no-base-syscalls`/`-n`
+or customized via the `--base-syscalls`/`-b` command line flags This can be
+helpful in cases where different OCI runtimes other than crun and runc are used,
+or if I just want to record the seccomp profile for the application and stack
+it with another [base profile][base].
+
+[runc]: https://github.com/opencontainers/runc
+[crun]: https://github.com/containers/crun
+[base]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/35ebdda/installation-usage.md#base-syscalls-for-a-container-runtime
+
+The resulting profile is now available in `/tmp/profile.yaml`, but the default
+location can be changed using the `--output-file value`/`-o` flag:
+
+```console
+> cat /tmp/profile.yaml
+```
+
+```yaml
+apiVersion: security-profiles-operator.x-k8s.io/v1beta1
+kind: SeccompProfile
+metadata:
+  creationTimestamp: null
+  name: main
+spec:
+  architectures:
+    - SCMP_ARCH_X86_64
+  defaultAction: SCMP_ACT_ERRNO
+  syscalls:
+    - action: SCMP_ACT_ALLOW
+      names:
+        - access
+        - arch_prctl
+        - brk
+        - …
+        - uname
+        - …
+status: {}
+```
+
+The seccomp profile Custom Resource Definition (CRD) can be directly used
+together with the Security Profiles Operator for managing it within Kubernetes.
+`spoc` is also capable of producing raw seccomp profiles (as JSON), by using the
+`--type`/`-t` `raw-seccomp` flag:
+
+```console
+> sudo ./spoc record --type raw-seccomp ./main
+…
+52.628827 Wrote seccomp profile to: /tmp/profile.json
+```
+
+```
+> jq . /tmp/profile.json
+```
+
+```json
+{
+  "defaultAction": "SCMP_ACT_ERRNO",
+  "architectures": ["SCMP_ARCH_X86_64"],
+  "syscalls": [
+    {
+      "names": ["access", "…", "write"],
+      "action": "SCMP_ACT_ALLOW"
+    }
+  ]
+}
+```
+
+The utility `spoc record` allows us to record complex seccomp profiles directly
+from binary invocations in any Linux system which is capable of running the ebpf
+code within the Kernel. But it can do more: How about modifying the seccomp
+profile and then testing it by using `spoc run`.
+
+## Running seccomp profiles with `spoc run`
+
+`spoc` is also able to run binaries with applied seccomp profiles, making it
+easy to test any modification to it. To do that, just run:
+
+```console
+> sudo ./spoc run ./main
+10:29:58.153263 Reading file /tmp/profile.yaml
+10:29:58.153311 Assuming YAML profile
+10:29:58.154138 Setting up seccomp
+10:29:58.154178 Load seccomp profile
+10:29:58.154189 Starting audit log enricher
+10:29:58.154224 Enricher reading from file /var/log/audit/audit.log
+10:29:58.155356 Running command with PID: 437880
+>
+```
+
+It looks like that the application exited successfully, which is anticipated
+because I did not modify the previously recorded profile yet. I can also
+specify a custom location for the profile by using the `--profile`/`-p` flag,
+but this was not necessary because I did not modify the default output location
+from the record. `spoc` will automatically determine if it's a raw (JSON) or CRD
+(YAML) based seccomp profile and then apply it to the process.
+
+The Security Profiles Operator supports a [log enricher feature][enricher],
+which provides additional seccomp related information by parsing the audit logs.
+`spoc run` uses the enricher in the same way to provide more data to the end
+users when it comes to debugging seccomp profiles.
+
+[enricher]: https://github.com/kubernetes-sigs/security-profiles-operator/blob/35ebdda/installation-usage.md#using-the-log-enricher
+
+Now I have to modify the profile to see anything valuable in the output. For
+example, I could remove the allowed `uname` syscall:
+
+```console
+> jq 'del(.syscalls[0].names[] | select(. == "uname"))' /tmp/profile.json > /tmp/no-uname-profile.json
+```
+
+And then try to run it again with the new profile `/tmp/no-uname-profile.json`:
+
+```
+> sudo ./spoc run -p /tmp/no-uname-profile.json ./main
+10:39:12.707798 Reading file /tmp/no-uname-profile.json
+10:39:12.707892 Setting up seccomp
+10:39:12.707920 Load seccomp profile
+10:39:12.707982 Starting audit log enricher
+10:39:12.707998 Enricher reading from file /var/log/audit/audit.log
+10:39:12.709164 Running command with PID: 480512
+panic: operation not permitted
+
+goroutine 1 [running]:
+main.main()
+        /path/to/main.go:10 +0x85
+10:39:12.713035 Unable to run: launch runner: wait for command: exit status 2
+```
+
+Alright, that was expected! The applied seccomp profile blocks the `uname`
+syscall, which results in an "operation not permitted" error. This error is
+pretty generic and does not provide any hint on what got blocked by seccomp.
+It is generally extremely difficult to predict how applications behave if single
+syscalls are forbidden by seccomp. It could be possible that the application
+terminates like in our simple demo, but it could also lead to a strange
+misbehavior and the application does not stop at all.
+
+If I now change the default seccomp action of the profile from `SCMP_ACT_ERRNO`
+to `SCMP_ACT_LOG` like this:
+
+```console
+> jq '.defaultAction = "SCMP_ACT_LOG"' /tmp/no-uname-profile.json > /tmp/no-uname-profile-log.json
+```
+
+Then the log enricher will give us a hint that the `uname` syscall got blocked
+when using `spoc run`:
+
+```
+> sudo ./spoc run -p /tmp/no-uname-profile-log.json ./main
+10:48:07.470126 Reading file /tmp/no-uname-profile-log.json
+10:48:07.470234 Setting up seccomp
+10:48:07.470245 Load seccomp profile
+10:48:07.470302 Starting audit log enricher
+10:48:07.470339 Enricher reading from file /var/log/audit/audit.log
+10:48:07.470889 Running command with PID: 522268
+10:48:07.472007 Seccomp: uname (63)
+```
+
+The application will not terminate any more, but seccomp will log the behavior
+to `/var/log/audit/audit.log` and `spoc` will parse the data to correlate it
+directly to our program. Generating the log messages to the audit subsystem
+comes with a large performance overhead and should be handled with care in
+production systems. It also comes with a security risk when running untrusted
+apps in audit mode in production environments.
+
+This demo should give you an impression how to debug seccomp profile issues with
+applications, probably by using our shiny new helper tool powered by the
+features of the Security Profiles Operator. `spoc` is a flexible and portable
+binary suitable for edge cases where resources are limited and even Kubernetes
+itself may not be available with its full capabilities.
+
+Thank you for reading this blog post! If you're interested in more, providing
+feedback or asking for help, then feel free to get in touch with us directly via
+[Slack (#security-profiles-operator)][slack] or the [mailing list][mail].
+
+[slack]: https://kubernetes.slack.com/messages/security-profiles-operator
+[mail]: https://groups.google.com/forum/#!forum/kubernetes-dev