Merge pull request #45821 from saschagrunert/blog-cri-streaming-explained
Add blog post about: CRI streaming explainedpull/46086/head
commit
9e1fd64f48
|
@ -0,0 +1,338 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Container Runtime Interface streaming explained"
|
||||
date: 2024-05-01
|
||||
slug: cri-streaming-explained
|
||||
author: Sascha Grunert
|
||||
---
|
||||
|
||||
The Kubernetes [Container Runtime Interface (CRI)](/docs/concepts/architecture/cri)
|
||||
acts as the main connection between the [kubelet](/docs/reference/command-line-tools-reference/kubelet)
|
||||
and the [Container Runtime](/docs/setup/production-environment/container-runtimes).
|
||||
Those runtimes have to provide a [gRPC](https://grpc.io) server which has to
|
||||
fulfill a Kubernetes defined [Protocol Buffer](https://protobuf.dev) interface.
|
||||
[This API definition](https://github.com/kubernetes/cri-api/blob/63929b3/pkg/apis/runtime/v1/api.proto)
|
||||
evolves over time, for example when contributors add new features or fields are
|
||||
going to become deprecated.
|
||||
|
||||
In this blog post, I'd like to dive into the functionality and history of three
|
||||
extraordinary Remote Procedure Calls (RPCs), which are truly outstanding in
|
||||
terms of how they work: `Exec`, `Attach` and `PortForward`.
|
||||
|
||||
**Exec** can be used to run dedicated commands within the container and stream
|
||||
the output to a client like [kubectl](/docs/reference/kubectl) or
|
||||
[crictl](/docs/tasks/debug/debug-cluster/crictl). It also allows interaction with
|
||||
that process using standard input (stdin), for example if users want to run a
|
||||
new shell instance within an existing workload.
|
||||
|
||||
**Attach** streams the output of the currently running process via [standard I/O](https://en.wikipedia.org/wiki/Standard_streams)
|
||||
from the container to the client and also allows interaction with them. This is
|
||||
particularly useful if users want to see what is going on in the container and
|
||||
be able to interact with the process.
|
||||
|
||||
**PortForward** can be utilized to forward a port from the host to the container
|
||||
to be able to interact with it using third party network tools. This allows it
|
||||
to bypass [Kubernetes services](/docs/concepts/services-networking/service)
|
||||
for a certain workload and interact with its network interface.
|
||||
|
||||
## What is so special about them?
|
||||
|
||||
All RPCs of the CRI either use the [gRPC unary calls](https://grpc.io/docs/what-is-grpc/core-concepts/#unary-rpc)
|
||||
for communication or the [server side streaming](https://grpc.io/docs/what-is-grpc/core-concepts/#server-streaming-rpc)
|
||||
feature (only `GetContainerEvents` right now). This means that mainly all RPCs
|
||||
retrieve a single client request and have to return a single server response.
|
||||
The same applies to `Exec`, `Attach`, and `PortForward`, where their [protocol definition](https://github.com/kubernetes/cri-api/blob/63929b3/pkg/apis/runtime/v1/api.proto#L94-L99)
|
||||
looks like this:
|
||||
|
||||
```protobuf
|
||||
// Exec prepares a streaming endpoint to execute a command in the container.
|
||||
rpc Exec(ExecRequest) returns (ExecResponse) {}
|
||||
```
|
||||
|
||||
```protobuf
|
||||
// Attach prepares a streaming endpoint to attach to a running container.
|
||||
rpc Attach(AttachRequest) returns (AttachResponse) {}
|
||||
```
|
||||
|
||||
```protobuf
|
||||
// PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
|
||||
rpc PortForward(PortForwardRequest) returns (PortForwardResponse) {}
|
||||
```
|
||||
|
||||
The requests carry everything required to allow the server to do the work,
|
||||
for example, the `ContainerId` or command (`Cmd`) to be run in case of `Exec`.
|
||||
More interestingly, all of their responses only contain a `url`:
|
||||
|
||||
```protobuf
|
||||
message ExecResponse {
|
||||
// Fully qualified URL of the exec streaming server.
|
||||
string url = 1;
|
||||
}
|
||||
```
|
||||
|
||||
```protobuf
|
||||
message AttachResponse {
|
||||
// Fully qualified URL of the attach streaming server.
|
||||
string url = 1;
|
||||
}
|
||||
```
|
||||
|
||||
```protobuf
|
||||
message PortForwardResponse {
|
||||
// Fully qualified URL of the port-forward streaming server.
|
||||
string url = 1;
|
||||
}
|
||||
```
|
||||
|
||||
Why is it implemented like that? Well, [the original design document](https://docs.google.com/document/d/1MreuHzNvkBW6q7o_zehm1CBOBof3shbtMTGtUpjpRmY)
|
||||
for those RPCs even predates [Kubernetes Enhancements Proposals (KEPs)](https://github.com/kubernetes/enhancements)
|
||||
and was originally outlined back in 2016. The kubelet had a native
|
||||
implementation for `Exec`, `Attach`, and `PortForward` before the
|
||||
initiative to bring the functionality to the CRI started. Before that,
|
||||
everything was bound to [Docker](https://www.docker.com) or the later abandoned
|
||||
container runtime [rkt](https://github.com/rkt/rkt).
|
||||
|
||||
The CRI related design document also elaborates on the option to use native RPC
|
||||
streaming for exec, attach, and port forward. The downsides outweighed this
|
||||
approach: the kubelet would still create a network bottleneck and future
|
||||
runtimes would not be free in choosing the server implementation details. Also,
|
||||
another option that the Kubelet implements a portable, runtime-agnostic solution
|
||||
has been abandoned over the final one, because this would mean another project
|
||||
to maintain which nevertheless would be runtime dependent.
|
||||
|
||||
This means, that the basic flow for `Exec`, `Attach` and `PortForward`
|
||||
was proposed to look like this:
|
||||
|
||||
{{< mermaid >}}
|
||||
sequenceDiagram
|
||||
participant crictl
|
||||
participant kubectl
|
||||
participant API as API Server
|
||||
participant kubelet
|
||||
participant runtime as Container Runtime
|
||||
participant streaming as Streaming Server
|
||||
alt Client alternatives
|
||||
Note over kubelet,runtime: Container Runtime Interface (CRI)
|
||||
kubectl->>API: exec, attach, port-forward
|
||||
API->>kubelet:
|
||||
kubelet->>runtime: Exec, Attach, PortForward
|
||||
else
|
||||
Note over crictl,runtime: Container Runtime Interface (CRI)
|
||||
crictl->>runtime: Exec, Attach, PortForward
|
||||
end
|
||||
runtime->>streaming: New Session
|
||||
streaming->>runtime: HTTP endpoint (URL)
|
||||
alt Client alternatives
|
||||
runtime->>kubelet: Response URL
|
||||
kubelet->>API:
|
||||
API-->>streaming: Connection upgrade (SPDY or WebSocket)
|
||||
streaming-)API: Stream data
|
||||
API-)kubectl: Stream data
|
||||
else
|
||||
runtime->>crictl: Response URL
|
||||
crictl-->>streaming: Connection upgrade (SPDY or WebSocket)
|
||||
streaming-)crictl: Stream data
|
||||
end
|
||||
{{< /mermaid >}}
|
||||
|
||||
Clients like crictl or the kubelet (via kubectl) request a new exec, attach or
|
||||
port forward session from the runtime using the gRPC interface. The runtime
|
||||
implements a streaming server that also manages the active sessions. This
|
||||
streaming server provides an HTTP endpoint for the client to connect to. The
|
||||
client upgrades the connection to use the [SPDY](https://en.wikipedia.org/wiki/SPDY)
|
||||
streaming protocol or (in the future) to a [WebSocket](https://en.wikipedia.org/wiki/WebSocket)
|
||||
connection and starts to stream the data back and forth.
|
||||
|
||||
This implementation allows runtimes to have the flexibility to implement
|
||||
`Exec`, `Attach` and `PortForward` the way they want, and also allows a
|
||||
simple test path. Runtimes can change the underlying implementation to support
|
||||
any kind of feature without having a need to modify the CRI at all.
|
||||
|
||||
Many smaller enhancements to this overall approach have been merged into
|
||||
Kubernetes in the past years, but the general pattern has always stayed the
|
||||
same. The kubelet source code transformed into [a reusable library](https://github.com/kubernetes/kubernetes/blob/db9fcfe/staging/src/k8s.io/kubelet/pkg/cri/streaming),
|
||||
which is nowadays usable from container runtimes to implement the basic
|
||||
streaming capability.
|
||||
|
||||
## How does the streaming actually work?
|
||||
|
||||
At a first glance, it looks like all three RPCs work the same way, but that's
|
||||
not the case. It's possible to group the functionality of **Exec** and
|
||||
**Attach**, while **PortForward** follows a distinct internal protocol
|
||||
definition.
|
||||
|
||||
### Exec and Attach
|
||||
|
||||
Kubernetes defines **Exec** and **Attach** as _remote commands_, where its
|
||||
protocol definition exists in [five different versions](https://github.com/kubernetes/kubernetes/blob/9791f0d/staging/src/k8s.io/apimachinery/pkg/util/remotecommand/constants.go#L28-L52):
|
||||
|
||||
| # | Version | Note |
|
||||
| --- | ------------------- | ---------------------------------------------------------------------------------------------------------------------- |
|
||||
| 1 | `channel.k8s.io` | Initial (unversioned) SPDY sub protocol ([#13394](https://issues.k8s.io/13394), [#13395](https://issues.k8s.io/13395)) |
|
||||
| 2 | `v2.channel.k8s.io` | Resolves the issues present in the first version ([#15961](https://github.com/kubernetes/kubernetes/pull/15961)) |
|
||||
| 3 | `v3.channel.k8s.io` | Adds support for resizing container terminals ([#25273](https://github.com/kubernetes/kubernetes/pull/25273)) |
|
||||
| 4 | `v4.channel.k8s.io` | Adds support for exit codes using JSON errors ([#26541](https://github.com/kubernetes/kubernetes/pull/26541)) |
|
||||
| 5 | `v5.channel.k8s.io` | Adds support for a CLOSE signal ([#119157](https://github.com/kubernetes/kubernetes/pull/119157)) |
|
||||
|
||||
On top of that, there is an overall effort to replace the SPDY transport
|
||||
protocol using WebSockets as part [KEP #4006](https://github.com/kubernetes/enhancements/issues/4006).
|
||||
Runtimes have to satisfy those protocols over their life cycle to stay up to
|
||||
date with the Kubernetes implementation.
|
||||
|
||||
Let's assume that a client uses the latest (`v5`) version of the protocol as
|
||||
well as communicating over WebSockets. In that case, the general flow would be:
|
||||
|
||||
1. The client requests an URL endpoint for **Exec** or **Attach** using the CRI.
|
||||
|
||||
- The server (runtime) validates the request, inserts it into a connection
|
||||
tracking cache, and provides the HTTP endpoint URL for that request.
|
||||
|
||||
1. The client connects to that URL, upgrades the connection to establish
|
||||
a WebSocket, and starts to stream data.
|
||||
|
||||
- In the case of **Attach**, the server has to stream the main container process
|
||||
data to the client.
|
||||
- In the case of **Exec**, the server has to create the subprocess command within
|
||||
the container and then streams the output to the client.
|
||||
|
||||
If stdin is required, then the server needs to listen for that as well and
|
||||
redirect it to the corresponding process.
|
||||
|
||||
Interpreting data for the defined protocol is fairly simple: The first
|
||||
byte of every input and output packet [defines](https://github.com/kubernetes/kubernetes/blob/9791f0d/staging/src/k8s.io/apimachinery/pkg/util/remotecommand/constants.go#L57-L64)
|
||||
the actual stream:
|
||||
|
||||
| First Byte | Type | Description |
|
||||
| ---------- | --------------- | ---------------------------------------- |
|
||||
| `0` | standard input | Data streamed from stdin |
|
||||
| `1` | standard output | Data streamed to stdout |
|
||||
| `2` | standard error | Data streamed to stderr |
|
||||
| `3` | stream error | A streaming error occurred |
|
||||
| `4` | stream resize | A terminal resize event |
|
||||
| `255` | stream close | Stream should be closed (for WebSockets) |
|
||||
|
||||
How should runtimes now implement the streaming server methods for **Exec** and
|
||||
**Attach** by using the provided kubelet library? The key is that the streaming
|
||||
server implementation in the kubelet [outlines an interface](https://github.com/kubernetes/kubernetes/blob/db9fcfe/staging/src/k8s.io/kubelet/pkg/cri/streaming/server.go#L63-L68)
|
||||
called `Runtime` which has to be fulfilled by the actual container runtime if it
|
||||
wants to use that library:
|
||||
|
||||
```go
|
||||
// Runtime is the interface to execute the commands and provide the streams.
|
||||
type Runtime interface {
|
||||
Exec(ctx context.Context, containerID string, cmd []string, in io.Reader, out, err io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error
|
||||
Attach(ctx context.Context, containerID string, in io.Reader, out, err io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error
|
||||
PortForward(ctx context.Context, podSandboxID string, port int32, stream io.ReadWriteCloser) error
|
||||
}
|
||||
```
|
||||
|
||||
Everything related to the protocol interpretation is
|
||||
already in place and runtimes only have to implement the actual `Exec` and
|
||||
`Attach` logic. For example, the container runtime [CRI-O](https://github.com/cri-o/cri-o)
|
||||
does it [like this pseudo code](https://github.com/cri-o/cri-o/blob/2a0867/server/container_exec.go#L27-L46):
|
||||
|
||||
```go
|
||||
func (s StreamService) Exec(
|
||||
ctx context.Context,
|
||||
containerID string,
|
||||
cmd []string,
|
||||
stdin io.Reader, stdout, stderr io.WriteCloser,
|
||||
tty bool,
|
||||
resizeChan <-chan remotecommand.TerminalSize,
|
||||
) error {
|
||||
// Retrieve the container by the provided containerID
|
||||
// …
|
||||
|
||||
// Update the container status and verify that the workload is running
|
||||
// …
|
||||
|
||||
// Execute the command and stream the data
|
||||
return s.runtimeServer.Runtime().ExecContainer(
|
||||
s.ctx, c, cmd, stdin, stdout, stderr, tty, resizeChan,
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
### PortForward
|
||||
|
||||
Forwarding ports to a container works a bit differently when comparing it to
|
||||
streaming IO data from a workload. The server still has to provide a URL
|
||||
endpoint for the client to connect to, but then the container runtime has to
|
||||
enter the network namespace of the container, allocate the port as well as
|
||||
stream the data back and forth. There is no simple protocol definition available
|
||||
like for **Exec** or **Attach**. This means that the client will stream the
|
||||
plain SPDY frames (with or without an additional WebSocket connection) which can
|
||||
be interpreted using libraries like [moby/spdystream](https://github.com/moby/spdystream).
|
||||
|
||||
Luckily, the kubelet library already provides the `PortForward` interface method
|
||||
which has to be implemented by the runtime. [CRI-O does that]() by (simplified):
|
||||
|
||||
```go
|
||||
func (s StreamService) PortForward(
|
||||
ctx context.Context,
|
||||
podSandboxID string,
|
||||
port int32,
|
||||
stream io.ReadWriteCloser,
|
||||
) error {
|
||||
// Retrieve the pod sandbox by the provided podSandboxID
|
||||
sandboxID, err := s.runtimeServer.PodIDIndex().Get(podSandboxID)
|
||||
sb := s.runtimeServer.GetSandbox(sandboxID)
|
||||
// …
|
||||
|
||||
// Get the network namespace path on disk for that sandbox
|
||||
netNsPath := sb.NetNsPath()
|
||||
// …
|
||||
|
||||
// Enter the network namespace and stream the data
|
||||
return s.runtimeServer.Runtime().PortForwardContainer(
|
||||
ctx, sb.InfraContainer(), netNsPath, port, stream,
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
## Future work
|
||||
|
||||
The flexibility Kubernetes provides for the RPCs `Exec`, `Attach` and
|
||||
`PortForward` is truly outstanding compared to other methods. Nevertheless,
|
||||
container runtimes have to keep up with the latest and greatest implementations
|
||||
to support those features in a meaningful way. The general effort to support
|
||||
WebSockets is not only a plain Kubernetes thing, it also has to be supported by
|
||||
container runtimes as well as clients like `crictl`.
|
||||
|
||||
For example, `crictl` v1.30 features a new `--transport` flag for the
|
||||
subcommands `exec`, `attach` and `port-forward`
|
||||
([#1383](https://github.com/kubernetes-sigs/cri-tools/pull/1383),
|
||||
[#1385](https://github.com/kubernetes-sigs/cri-tools/pull/1385))
|
||||
to allow choosing between `websocket` and `spdy`.
|
||||
|
||||
CRI-O is going an experimental path by moving the streaming server
|
||||
implementation into [conmon-rs](https://github.com/containers/conmon-rs)
|
||||
(a substitute for the container monitor [conmon](https://github.com/containers/conmon)). conmon-rs is
|
||||
a [Rust](https://www.rust-lang.org) implementation of the original container
|
||||
monitor and allows streaming WebSockets directly using supported libraries
|
||||
([#2070](https://github.com/containers/conmon-rs/pull/2070)). The major benefit
|
||||
of this approach is that CRI-O does not even have to be running while conmon-rs
|
||||
can keep active **Exec**, **Attach** and **PortForward** sessions open. The
|
||||
simplified flow when using crictl directly will then look like this:
|
||||
|
||||
{{< mermaid >}}
|
||||
sequenceDiagram
|
||||
autonumber
|
||||
participant crictl
|
||||
participant runtime as Container Runtime
|
||||
participant conmon-rs
|
||||
Note over crictl,runtime: Container Runtime Interface (CRI)
|
||||
crictl->>runtime: Exec, Attach, PortForward
|
||||
Note over runtime,conmon-rs: Cap’n Proto
|
||||
runtime->>conmon-rs: Serve Exec, Attach, PortForward
|
||||
conmon-rs->>runtime: HTTP endpoint (URL)
|
||||
runtime->>crictl: Response URL
|
||||
crictl-->>conmon-rs: Connection upgrade to WebSocket
|
||||
conmon-rs-)crictl: Stream data
|
||||
{{< /mermaid >}}
|
||||
|
||||
All of those enhancements require iterative design decisions, while the original
|
||||
well-conceived implementation acts as the foundation for those. I really hope
|
||||
you've enjoyed this compact journey through the history of CRI RPCs. Feel free
|
||||
to reach out to me anytime for suggestions or feedback using the
|
||||
[official Kubernetes Slack](https://kubernetes.slack.com/team/U53SUDBD4).
|
Loading…
Reference in New Issue