Merge pull request #29658 from kubernetes/dev-1.23
Official 1.23 Release Docspull/30798/head snapshot-initial-v1.23
commit
f0bb277540
38
config.toml
38
config.toml
|
@ -139,10 +139,10 @@ time_format_default = "January 02, 2006 at 3:04 PM PST"
|
|||
description = "Production-Grade Container Orchestration"
|
||||
showedit = true
|
||||
|
||||
latest = "v1.22"
|
||||
latest = "v1.23"
|
||||
|
||||
fullversion = "v1.22.0"
|
||||
version = "v1.22"
|
||||
fullversion = "v1.23.0"
|
||||
version = "v1.23"
|
||||
githubbranch = "main"
|
||||
docsbranch = "main"
|
||||
deprecated = false
|
||||
|
@ -179,40 +179,40 @@ js = [
|
|||
]
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.22.0"
|
||||
version = "v1.22"
|
||||
githubbranch = "v1.22.0"
|
||||
fullversion = "v1.23.0"
|
||||
version = "v1.23"
|
||||
githubbranch = "v1.23.0"
|
||||
docsbranch = "main"
|
||||
url = "https://kubernetes.io"
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.21.4"
|
||||
fullversion = "v1.22.4"
|
||||
version = "v1.22"
|
||||
githubbranch = "v1.22.4"
|
||||
docsbranch = "release-1.22"
|
||||
url = "https://v1-22.docs.kubernetes.io"
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.21.7"
|
||||
version = "v1.21"
|
||||
githubbranch = "v1.21.4"
|
||||
githubbranch = "v1.21.7"
|
||||
docsbranch = "release-1.21"
|
||||
url = "https://v1-21.docs.kubernetes.io"
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.20.10"
|
||||
fullversion = "v1.20.13"
|
||||
version = "v1.20"
|
||||
githubbranch = "v1.20.10"
|
||||
githubbranch = "v1.20.13"
|
||||
docsbranch = "release-1.20"
|
||||
url = "https://v1-20.docs.kubernetes.io"
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.19.14"
|
||||
fullversion = "v1.19.16"
|
||||
version = "v1.19"
|
||||
githubbranch = "v1.19.14"
|
||||
githubbranch = "v1.19.16"
|
||||
docsbranch = "release-1.19"
|
||||
url = "https://v1-19.docs.kubernetes.io"
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.18.20"
|
||||
version = "v1.18"
|
||||
githubbranch = "v1.18.20"
|
||||
docsbranch = "release-1.18"
|
||||
url = "https://v1-18.docs.kubernetes.io"
|
||||
|
||||
# User interface configuration
|
||||
[params.ui]
|
||||
# Enable to show the side bar menu in its compact state.
|
||||
|
|
|
@ -4,6 +4,8 @@ title: 'Health checking gRPC servers on Kubernetes'
|
|||
date: 2018-10-01
|
||||
---
|
||||
|
||||
_Built-in gRPC probes were introduced in Kubernetes 1.23. To learn more, see [Configure Liveness, Readiness and Startup Probes](/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-grpc-liveness-probe)._
|
||||
|
||||
**Author**: [Ahmet Alp Balkan](https://twitter.com/ahmetb) (Google)
|
||||
|
||||
[gRPC](https://grpc.io) is on its way to becoming the lingua franca for
|
||||
|
|
|
@ -0,0 +1,51 @@
|
|||
---
|
||||
title: Container Runtime Interface (CRI)
|
||||
content_type: concept
|
||||
weight: 50
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
The CRI is a plugin interface which enables the kubelet to use a wide variety of
|
||||
container runtimes, without having a need to recompile the cluster components.
|
||||
|
||||
You need a working
|
||||
{{<glossary_tooltip text="container runtime" term_id="container-runtime">}} on
|
||||
each Node in your cluster, so that the
|
||||
{{< glossary_tooltip text="kubelet" term_id="kubelet" >}} can launch
|
||||
{{< glossary_tooltip text="Pods" term_id="pod" >}} and their containers.
|
||||
|
||||
{{< glossary_definition term_id="container-runtime-interface" length="all" >}}
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## The API {#api}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.23" state="stable" >}}
|
||||
|
||||
The kubelet acts as a client when connecting to the container runtime via gRPC.
|
||||
The runtime and image service endpoints have to be available in the container
|
||||
runtime, which can be configured separately within the kubelet by using the
|
||||
`--image-service-endpoint` and `--container-runtime-endpoint` [command line
|
||||
flags](/docs/reference/command-line-tools-reference/kubelet)
|
||||
|
||||
For Kubernetes v{{< skew currentVersion >}}, the kubelet prefers to use CRI `v1`.
|
||||
If a container runtime does not support `v1` of the CRI, then the kubelet tries to
|
||||
negotiate any older supported version.
|
||||
The v{{< skew currentVersion >}} kubelet can also negotiate CRI `v1alpha2`, but
|
||||
this version is considered as deprecated.
|
||||
If the kubelet cannot negotiate a supported CRI version, the kubelet gives up
|
||||
and doesn't register as a node.
|
||||
|
||||
## Upgrading
|
||||
|
||||
When upgrading Kubernetes, then the kubelet tries to automatically select the
|
||||
latest CRI version on restart of the component. If that fails, then the fallback
|
||||
will take place as mentioned above. If a gRPC re-dial was required because the
|
||||
container runtime has been upgraded, then the container runtime must also
|
||||
support the initially selected version or the redial is expected to fail. This
|
||||
requires a restart of the kubelet.
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
- Learn more about the CRI [protocol definition](https://github.com/kubernetes/cri-api/blob/c75ef5b/pkg/apis/runtime/v1/api.proto)
|
|
@ -424,20 +424,104 @@ for gracefully terminating normal pods, and the last 10 seconds would be
|
|||
reserved for terminating [critical pods](/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/#marking-pod-as-critical).
|
||||
|
||||
{{< note >}}
|
||||
When pods were evicted during the graceful node shutdown, they are marked as failed.
|
||||
Running `kubectl get pods` shows the status of the the evicted pods as `Shutdown`.
|
||||
When pods were evicted during the graceful node shutdown, they are marked as shutdown.
|
||||
Running `kubectl get pods` shows the status of the the evicted pods as `Terminated`.
|
||||
And `kubectl describe pod` indicates that the pod was evicted because of node shutdown:
|
||||
|
||||
```
|
||||
Status: Failed
|
||||
Reason: Shutdown
|
||||
Message: Node is shutting, evicting pods
|
||||
Reason: Terminated
|
||||
Message: Pod was terminated in response to imminent node shutdown.
|
||||
```
|
||||
|
||||
Failed pod objects will be preserved until explicitly deleted or [cleaned up by the GC](/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection).
|
||||
This is a change of behavior compared to abrupt node termination.
|
||||
{{< /note >}}
|
||||
|
||||
### Pod Priority based graceful node shutdown {#pod-priority-graceful-node-shutdown}
|
||||
|
||||
{{< feature-state state="alpha" for_k8s_version="v1.23" >}}
|
||||
|
||||
To provide more flexibility during graceful node shutdown around the ordering
|
||||
of pods during shutdown, graceful node shutdown honors the PriorityClass for
|
||||
Pods, provided that you enabled this feature in your cluster. The feature
|
||||
allows allows cluster administers to explicitly define the ordering of pods
|
||||
during graceful node shutdown based on [priority
|
||||
classes](docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass).
|
||||
|
||||
The [Graceful Node Shutdown](#graceful-node-shutdown) feature, as described
|
||||
above, shuts down pods in two phases, non-critical pods, followed by critical
|
||||
pods. If additional flexibility is needed to explicitly define the ordering of
|
||||
pods during shutdown in a more granular way, pod priority based graceful
|
||||
shutdown can be used.
|
||||
|
||||
When graceful node shutdown honors pod priorities, this makes it possible to do
|
||||
graceful node shutdown in multiple phases, each phase shutting down a
|
||||
particular priority class of pods. The kubelet can be configured with the exact
|
||||
phases and shutdown time per phase.
|
||||
|
||||
Assuming the following custom pod [priority
|
||||
classes](docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass)
|
||||
in a cluster,
|
||||
|
||||
|Pod priority class name|Pod priority class value|
|
||||
|-------------------------|------------------------|
|
||||
|`custom-class-a` | 100000 |
|
||||
|`custom-class-b` | 10000 |
|
||||
|`custom-class-c` | 1000 |
|
||||
|`regular/unset` | 0 |
|
||||
|
||||
Within the [kubelet configuration](/docs/reference/config-api/kubelet-config.v1beta1/#kubelet-config-k8s-io-v1beta1-KubeletConfiguration)
|
||||
the settings for `shutdownGracePeriodByPodPriority` could look like:
|
||||
|
||||
|Pod priority class value|Shutdown period|
|
||||
|------------------------|---------------|
|
||||
| 100000 |10 seconds |
|
||||
| 10000 |180 seconds |
|
||||
| 1000 |120 seconds |
|
||||
| 0 |60 seconds |
|
||||
|
||||
The corresponding kubelet config YAML configuration would be:
|
||||
|
||||
```yaml
|
||||
shutdownGracePeriodByPodPriority:
|
||||
- priority: 100000
|
||||
shutdownGracePeriodSeconds: 10
|
||||
- priority: 10000
|
||||
shutdownGracePeriodSeconds: 180
|
||||
- priority: 1000
|
||||
shutdownGracePeriodSeconds: 120
|
||||
- priority: 0
|
||||
shutdownGracePeriodSeconds: 60
|
||||
```
|
||||
|
||||
The above table implies that any pod with priority value >= 100000 will get
|
||||
just 10 seconds to stop, any pod with value >= 10000 and < 100000 will get 180
|
||||
seconds to stop, any pod with value >= 1000 and < 10000 will get 120 seconds to stop.
|
||||
Finally, all other pods will get 60 seconds to stop.
|
||||
|
||||
One doesn't have to specify values corresponding to all of the classes. For
|
||||
example, you could instead use these settings:
|
||||
|
||||
|Pod priority class value|Shutdown period|
|
||||
|------------------------|---------------|
|
||||
| 100000 |300 seconds |
|
||||
| 1000 |120 seconds |
|
||||
| 0 |60 seconds |
|
||||
|
||||
|
||||
In the above case, the pods with custom-class-b will go into the same bucket
|
||||
as custom-class-c for shutdown.
|
||||
|
||||
If there are no pods in a particular range, then the kubelet does not wait
|
||||
for pods in that priority range. Instead, the kubelet immediately skips to the
|
||||
next priority class value range.
|
||||
|
||||
If this feature is enabled and no configuration is provided, then no ordering
|
||||
action will be taken.
|
||||
|
||||
Using this feature, requires enabling the
|
||||
`GracefulNodeShutdownBasedOnPodPriority` feature gate, and setting the kubelet
|
||||
config's `ShutdownGracePeriodByPodPriority` to the desired configuration
|
||||
containing the pod priority class values and their respective shutdown periods.
|
||||
|
||||
## Swap memory management {#swap-memory}
|
||||
|
||||
{{< feature-state state="alpha" for_k8s_version="v1.22" >}}
|
||||
|
|
|
@ -64,7 +64,7 @@ This means that containers within a `Pod` can all reach each other's ports on
|
|||
usage, but this is no different from processes in a VM. This is called the
|
||||
"IP-per-pod" model.
|
||||
|
||||
How this is implemented is a detail of the particular container runtime in use.
|
||||
How this is implemented is a detail of the particular container runtime in use. Likewise, the networking option you choose may support [dual-stack IPv4/IPv6 networking](/docs/concepts/services-networking/dual-stack/); implementations vary.
|
||||
|
||||
It is possible to request ports on the `Node` itself which forward to your `Pod`
|
||||
(called host ports), but this is a very niche operation. How that forwarding is
|
||||
|
|
|
@ -22,14 +22,62 @@ generates log messages for the Kubernetes system components.
|
|||
|
||||
For more information about klog configuration, see the [Command line tool reference](/docs/reference/command-line-tools-reference/).
|
||||
|
||||
An example of the klog native format:
|
||||
Kubernetes is in the process of simplifying logging in its components. The
|
||||
following klog command line flags [are
|
||||
deprecated](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components)
|
||||
starting with Kubernetes 1.23 and will be removed in a future release:
|
||||
|
||||
- `--add-dir-header`
|
||||
- `--alsologtostderr`
|
||||
- `--log-backtrace-at`
|
||||
- `--log-dir`
|
||||
- `--log-file`
|
||||
- `--log-file-max-size`
|
||||
- `--logtostderr`
|
||||
- `--one-output`
|
||||
- `--skip-headers`
|
||||
- `--skip-log-headers`
|
||||
- `--stderrthreshold`
|
||||
|
||||
Output will always be written to stderr, regardless of the output
|
||||
format. Output redirection is expected to be handled by the component which
|
||||
invokes a Kubernetes component. This can be a POSIX shell or a tool like
|
||||
systemd.
|
||||
|
||||
In some cases, for example a distroless container or a Windows system service,
|
||||
those options are not available. Then the
|
||||
[`kube-log-runner`](https://github.com/kubernetes/kubernetes/blob/d2a8a81639fcff8d1221b900f66d28361a170654/staging/src/k8s.io/component-base/logs/kube-log-runner/README.md)
|
||||
binary can be used as wrapper around a Kubernetes component to redirect
|
||||
output. A prebuilt binary is included in several Kubernetes base images under
|
||||
its traditional name as `/go-runner` and as `kube-log-runner` in server and
|
||||
node release archives.
|
||||
|
||||
This table shows how `kube-log-runner` invocations correspond to shell redirection:
|
||||
|
||||
| Usage | POSIX shell (such as bash) | `kube-log-runner <options> <cmd>` |
|
||||
| -----------------------------------------|----------------------------|-------------------------------------------------------------|
|
||||
| Merge stderr and stdout, write to stdout | `2>&1` | `kube-log-runner` (default behavior) |
|
||||
| Redirect both into log file | `1>>/tmp/log 2>&1` | `kube-log-runner -log-file=/tmp/log` |
|
||||
| Copy into log file and to stdout | `2>&1 \| tee -a /tmp/log` | `kube-log-runner -log-file=/tmp/log -also-stdout` |
|
||||
| Redirect only stdout into log file | `>/tmp/log` | `kube-log-runner -log-file=/tmp/log -redirect-stderr=false` |
|
||||
|
||||
### Klog output
|
||||
|
||||
An example of the traditional klog native format:
|
||||
```
|
||||
I1025 00:15:15.525108 1 httplog.go:79] GET /api/v1/namespaces/kube-system/pods/metrics-server-v0.3.1-57c75779f-9p8wg: (1.512ms) 200 [pod_nanny/v0.0.0 (linux/amd64) kubernetes/$Format 10.56.1.19:51756]
|
||||
```
|
||||
|
||||
The message string may contain line breaks:
|
||||
```
|
||||
I1025 00:15:15.525108 1 example.go:79] This is a message
|
||||
which has a line break.
|
||||
```
|
||||
|
||||
|
||||
### Structured Logging
|
||||
|
||||
{{< feature-state for_k8s_version="v1.19" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.23" state="beta" >}}
|
||||
|
||||
{{< warning >}}
|
||||
Migration to structured log messages is an ongoing process. Not all log messages are structured in this version. When parsing log files, you must also handle unstructured log messages.
|
||||
|
@ -38,9 +86,11 @@ Log formatting and value serialization are subject to change.
|
|||
{{< /warning>}}
|
||||
|
||||
Structured logging introduces a uniform structure in log messages allowing for programmatic extraction of information. You can store and process structured logs with less effort and cost.
|
||||
New message format is backward compatible and enabled by default.
|
||||
The code which generates a log message determines whether it uses the traditional unstructured klog output
|
||||
or structured logging.
|
||||
|
||||
Format of structured logs:
|
||||
The default formatting of structured log messages is as text, with a format that
|
||||
is backward compatible with traditional klog:
|
||||
|
||||
```ini
|
||||
<klog header> "<message>" <key1>="<value1>" <key2>="<value2>" ...
|
||||
|
@ -52,6 +102,13 @@ Example:
|
|||
I1025 00:15:15.525108 1 controller_utils.go:116] "Pod status updated" pod="kube-system/kubedns" status="ready"
|
||||
```
|
||||
|
||||
Strings are quoted. Other values are formatted with
|
||||
[`%+v`](https://pkg.go.dev/fmt#hdr-Printing), which may cause log messages to
|
||||
continue on the next line [depending on the data](https://github.com/kubernetes/kubernetes/issues/106428).
|
||||
```
|
||||
I1025 00:15:15.525108 1 example.go:116] "Example" data="This is text with a line break\nand \"quotation marks\"." someInt=1 someFloat=0.1 someStruct={StringField: First line,
|
||||
second line.}
|
||||
```
|
||||
|
||||
### JSON log format
|
||||
|
||||
|
@ -82,7 +139,7 @@ Example of JSON log format (pretty printed):
|
|||
|
||||
Keys with special meaning:
|
||||
* `ts` - timestamp as Unix time (required, float)
|
||||
* `v` - verbosity (required, int, default 0)
|
||||
* `v` - verbosity (only for info and not for error messages, int)
|
||||
* `err` - error string (optional, string)
|
||||
* `msg` - message (required, string)
|
||||
|
||||
|
@ -139,4 +196,5 @@ The `logrotate` tool rotates logs daily, or once the log size is greater than 10
|
|||
|
||||
* Read about the [Kubernetes Logging Architecture](/docs/concepts/cluster-administration/logging/)
|
||||
* Read about [Structured Logging](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging)
|
||||
* Read about [deprecation of klog flags](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components)
|
||||
* Read about the [Conventions for logging severity](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md)
|
||||
|
|
|
@ -77,7 +77,7 @@ failure.
|
|||
In the webhook model, Kubernetes makes a network request to a remote service.
|
||||
In the *Binary Plugin* model, Kubernetes executes a binary (program).
|
||||
Binary plugins are used by the kubelet (e.g.
|
||||
[Flex Volume Plugins](/docs/concepts/storage/volumes/#flexVolume)
|
||||
[Flex Volume Plugins](/docs/concepts/storage/volumes/#flexvolume)
|
||||
and [Network Plugins](/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/))
|
||||
and by kubectl.
|
||||
|
||||
|
@ -163,6 +163,8 @@ After a request is authorized, if it is a write operation, it also goes through
|
|||
) allow users to mount volume types without built-in support by having the
|
||||
Kubelet call a Binary Plugin to mount the volume.
|
||||
|
||||
FlexVolume is deprecated since Kubernetes v1.23. The Out-of-tree CSI driver is the recommended way to write volume drivers in Kubernetes. See [Kubernetes Volume Plugin FAQ for Storage Vendors](https://github.com/kubernetes/community/blob/master/sig-storage/volume-plugin-faq.md#kubernetes-volume-plugin-faq-for-storage-vendors) for more information.
|
||||
|
||||
|
||||
### Device Plugins
|
||||
|
||||
|
|
|
@ -197,6 +197,8 @@ service PodResourcesLister {
|
|||
}
|
||||
```
|
||||
|
||||
### `List` gRPC endpoint {#grpc-endpoint-list}
|
||||
|
||||
The `List` endpoint provides information on resources of running pods, with details such as the
|
||||
id of exclusively allocated CPUs, device id as it was reported by device plugins and id of
|
||||
the NUMA node where these devices are allocated. Also, for NUMA-based machines, it contains the information about memory and hugepages reserved for a container.
|
||||
|
@ -246,10 +248,35 @@ message ContainerDevices {
|
|||
TopologyInfo topology = 3;
|
||||
}
|
||||
```
|
||||
{{< note >}}
|
||||
cpu_ids in the `ContainerResources` in the `List` endpoint correspond to exclusive CPUs allocated
|
||||
to a partilar container. If the goal is to evaluate CPUs that belong to the shared pool, the `List`
|
||||
endpoint needs to be used in conjunction with the `GetAllocatableResources` endpoint as explained
|
||||
below:
|
||||
1. Call `GetAllocatableResources` to get a list of all the allocatable CPUs
|
||||
2. Call `GetCpuIds` on all `ContainerResources` in the system
|
||||
3. Subtract out all of the CPUs from the `GetCpuIds` calls from the `GetAllocatableResources` call
|
||||
{{< /note >}}
|
||||
|
||||
### `GetAllocatableResources` gRPC endpoint {#grpc-endpoint-getallocatableresources}
|
||||
|
||||
{{< feature-state state="beta" for_k8s_version="v1.23" >}}
|
||||
|
||||
GetAllocatableResources provides information on resources initially available on the worker node.
|
||||
It provides more information than kubelet exports to APIServer.
|
||||
|
||||
{{< note >}}
|
||||
`GetAllocatableResources` should only be used to evaluate [allocatable](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable)
|
||||
resources on a node. If the goal is to evaluate free/unallocated resources it should be used in
|
||||
conjunction with the List() endpoint. The result obtained by `GetAllocatableResources` would remain
|
||||
the same unless the underlying resources exposed to kubelet change. This happens rarely but when
|
||||
it does (for example: hotplug/hotunplug, device health changes), client is expected to call
|
||||
`GetAlloctableResources` endpoint.
|
||||
However, calling `GetAllocatableResources` endpoint is not sufficient in case of cpu and/or memory
|
||||
update and Kubelet needs to be restarted to reflect the correct resource capacity and allocatable.
|
||||
{{< /note >}}
|
||||
|
||||
|
||||
```gRPC
|
||||
// AllocatableResourcesResponses contains informations about all the devices known by the kubelet
|
||||
message AllocatableResourcesResponse {
|
||||
|
@ -259,6 +286,13 @@ message AllocatableResourcesResponse {
|
|||
}
|
||||
|
||||
```
|
||||
Starting from Kubernetes v1.23, the `GetAllocatableResources` is enabled by default.
|
||||
You can disable it by turning off the
|
||||
`KubeletPodResourcesGetAllocatable` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
|
||||
|
||||
Preceding Kubernetes v1.23, to enable this feature `kubelet` must be started with the following flag:
|
||||
|
||||
`--feature-gates=KubeletPodResourcesGetAllocatable=true`
|
||||
|
||||
`ContainerDevices` do expose the topology information declaring to which NUMA cells the device is affine.
|
||||
The NUMA cells are identified using a opaque integer ID, which value is consistent to what device
|
||||
|
|
|
@ -37,8 +37,11 @@ if you are writing an application using the Kubernetes API.
|
|||
|
||||
Complete API details are documented using [OpenAPI](https://www.openapis.org/).
|
||||
|
||||
The Kubernetes API server serves an OpenAPI spec via the `/openapi/v2` endpoint.
|
||||
You can request the response format using request headers as follows:
|
||||
### OpenAPI V2
|
||||
|
||||
The Kubernetes API server serves an aggregated OpenAPI v2 spec via the
|
||||
`/openapi/v2` endpoint. You can request the response format using
|
||||
request headers as follows:
|
||||
|
||||
<table>
|
||||
<caption style="display:none">Valid request header values for OpenAPI v2 queries</caption>
|
||||
|
@ -77,6 +80,55 @@ about this format, see the [Kubernetes Protobuf serialization](https://github.co
|
|||
Interface Definition Language (IDL) files for each schema located in the Go
|
||||
packages that define the API objects.
|
||||
|
||||
### OpenAPI V3
|
||||
|
||||
{{< feature-state state="alpha" for_k8s_version="v1.23" >}}
|
||||
|
||||
Kubernetes v1.23 offers initial support for publishing its APIs as OpenAPI v3; this is an
|
||||
alpha feature that is disabled by default.
|
||||
You can enable the alpha feature by turning on the
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) named `OpenAPIV3`
|
||||
for the kube-apiserver component.
|
||||
|
||||
With the feature enabled, the Kubernetes API server serves an
|
||||
aggregated OpenAPI v3 spec per Kubernetes group version at the
|
||||
`/openapi/v3/apis/<group>/<version>` endpoint. Please refer to the
|
||||
table below for accepted request headers.
|
||||
|
||||
<table>
|
||||
<caption style="display:none">Valid request header values for OpenAPI v3 queries</caption>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Header</th>
|
||||
<th style="min-width: 50%;">Possible values</th>
|
||||
<th>Notes</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td><code>Accept-Encoding</code></td>
|
||||
<td><code>gzip</code></td>
|
||||
<td><em>not supplying this header is also acceptable</em></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="3"><code>Accept</code></td>
|
||||
<td><code>application/com.github.proto-openapi.spec.v3@v1.0+protobuf</code></td>
|
||||
<td><em>mainly for intra-cluster use</em></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>application/json</code></td>
|
||||
<td><em>default</em></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>*</code></td>
|
||||
<td><em>serves </em><code>application/json</code></td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
A discovery endpoint `/openapi/v3` is provided to see a list of all
|
||||
group/versions available. This endpoint only returns JSON.
|
||||
|
||||
## Persistence
|
||||
|
||||
Kubernetes stores the serialized state of objects by writing them into
|
||||
|
|
|
@ -13,13 +13,13 @@ min-kubernetes-server-version: v1.22
|
|||
|
||||
<!-- overview -->
|
||||
|
||||
{{< feature-state for_k8s_version="v1.22" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.23" state="beta" >}}
|
||||
|
||||
The Kubernetes [Pod Security Standards](/docs/concepts/security/pod-security-standards/) define
|
||||
different isolation levels for Pods. These standards let you define how you want to restrict the
|
||||
behavior of pods in a clear, consistent fashion.
|
||||
|
||||
As an Alpha feature, Kubernetes offers a built-in _Pod Security_ {{< glossary_tooltip
|
||||
As an Beta feature, Kubernetes offers a built-in _Pod Security_ {{< glossary_tooltip
|
||||
text="admission controller" term_id="admission-controller" >}}, the successor
|
||||
to [PodSecurityPolicies](/docs/concepts/policy/pod-security-policy/). Pod security restrictions
|
||||
are applied at the {{< glossary_tooltip text="namespace" term_id="namespace" >}} level when pods
|
||||
|
@ -32,15 +32,40 @@ The PodSecurityPolicy API is deprecated and will be
|
|||
|
||||
<!-- body -->
|
||||
|
||||
## Enabling the Alpha feature
|
||||
## Enabling the `PodSecurity` admission plugin
|
||||
|
||||
Setting pod security controls by namespace is an alpha feature. You must enable the `PodSecurity`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) in order to use it.
|
||||
In v1.23, the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
is a Beta feature and is enabled by default.
|
||||
|
||||
In v1.22, the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
is an Alpha feature and must be enabled in `kube-apiserver` in order to use the built-in admission plugin.
|
||||
|
||||
```shell
|
||||
--feature-gates="...,PodSecurity=true"
|
||||
```
|
||||
|
||||
## Alternative: installing the `PodSecurity` admission webhook {#webhook}
|
||||
|
||||
For environments where the built-in `PodSecurity` admission plugin cannot be used,
|
||||
either because the cluster is older than v1.22, or the `PodSecurity` feature cannot be enabled,
|
||||
the `PodSecurity` admission logic is also available as a Beta [validating admission webhook](https://git.k8s.io/pod-security-admission/webhook).
|
||||
|
||||
A pre-built container image, certificate generation scripts, and example manifests
|
||||
are available at [https://git.k8s.io/pod-security-admission/webhook](https://git.k8s.io/pod-security-admission/webhook).
|
||||
|
||||
To install:
|
||||
```shell
|
||||
git clone git@github.com:kubernetes/pod-security-admission.git
|
||||
cd pod-security-admission/webhook
|
||||
make certs
|
||||
kubectl apply -k .
|
||||
```
|
||||
|
||||
{{< note >}}
|
||||
The generated certificate is valid for 2 years. Before it expires,
|
||||
regenerate the certificate or remove the webhook in favor of the built-in admission plugin.
|
||||
{{< /note >}}
|
||||
|
||||
## Pod Security levels
|
||||
|
||||
Pod Security admission places requirements on a Pod's [Security
|
||||
|
@ -52,7 +77,7 @@ page for an in-depth look at those requirements.
|
|||
|
||||
## Pod Security Admission labels for namespaces
|
||||
|
||||
Provided that you have enabled this feature, you can configure namespaces to define the admission
|
||||
Once the feature is enabled or the webhook is installed, you can configure namespaces to define the admission
|
||||
control mode you want to use for pod security in each namespace. Kubernetes defines a set of
|
||||
{{< glossary_tooltip term_id="label" text="labels" >}} that you can set to define which of the
|
||||
predefined Pod Security Standard levels you want to use for a namespace. The label you select
|
||||
|
|
|
@ -373,6 +373,24 @@ fail validation.
|
|||
</small>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td style="white-space: nowrap">Running as Non-root user (v1.23+)</td>
|
||||
<td>
|
||||
<p>Containers must not set <tt>runAsUser</tt> to 0</p>
|
||||
<p><strong>Restricted Fields</strong></p>
|
||||
<ul>
|
||||
<li><code>spec.securityContext.runAsUser</code></li>
|
||||
<li><code>spec.containers[*].securityContext.runAsUser</code></li>
|
||||
<li><code>spec.initContainers[*].securityContext.runAsUser</code></li>
|
||||
<li><code>spec.ephemeralContainers[*].securityContext.runAsUser</code></li>
|
||||
</ul>
|
||||
<p><strong>Allowed Values</strong></p>
|
||||
<ul>
|
||||
<li>any non-zero value</li>
|
||||
<li><code>undefined/null</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td style="white-space: nowrap">Non-root groups <em>(optional)</em></td>
|
||||
<td>
|
||||
|
|
|
@ -16,7 +16,7 @@ weight: 70
|
|||
|
||||
<!-- overview -->
|
||||
|
||||
{{< feature-state for_k8s_version="v1.21" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.23" state="stable" >}}
|
||||
|
||||
IPv4/IPv6 dual-stack networking enables the allocation of both IPv4 and IPv6 addresses to {{< glossary_tooltip text="Pods" term_id="pod" >}} and {{< glossary_tooltip text="Services" term_id="service" >}}.
|
||||
|
||||
|
@ -47,8 +47,6 @@ The following prerequisites are needed in order to utilize IPv4/IPv6 dual-stack
|
|||
|
||||
## Configure IPv4/IPv6 dual-stack
|
||||
|
||||
To use IPv4/IPv6 dual-stack, ensure the `IPv6DualStack` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled for the relevant components of your cluster. (Starting in 1.21, IPv4/IPv6 dual-stack defaults to enabled.)
|
||||
|
||||
To configure IPv4/IPv6 dual-stack, set dual-stack cluster network assignments:
|
||||
|
||||
* kube-apiserver:
|
||||
|
@ -65,9 +63,6 @@ An example of an IPv4 CIDR: `10.244.0.0/16` (though you would supply your own ad
|
|||
|
||||
An example of an IPv6 CIDR: `fdXY:IJKL:MNOP:15::/64` (this shows the format but is not a valid address - see [RFC 4193](https://tools.ietf.org/html/rfc4193))
|
||||
|
||||
Starting in 1.21, IPv4/IPv6 dual-stack defaults to enabled.
|
||||
You can disable it when necessary by specifying `--feature-gates="IPv6DualStack=false"`
|
||||
on the kube-apiserver, kube-controller-manager, kubelet, and kube-proxy command line.
|
||||
{{< /note >}}
|
||||
|
||||
## Services
|
||||
|
@ -81,7 +76,7 @@ set the `.spec.ipFamilyPolicy` field to one of the following values:
|
|||
|
||||
* `SingleStack`: Single-stack service. The control plane allocates a cluster IP for the Service, using the first configured service cluster IP range.
|
||||
* `PreferDualStack`:
|
||||
* Allocates IPv4 and IPv6 cluster IPs for the Service. (If the cluster has `--feature-gates="IPv6DualStack=false"`, this setting follows the same behavior as `SingleStack`.)
|
||||
* Allocates IPv4 and IPv6 cluster IPs for the Service.
|
||||
* `RequireDualStack`: Allocates Service `.spec.ClusterIPs` from both IPv4 and IPv6 address ranges.
|
||||
* Selects the `.spec.ClusterIP` from the list of `.spec.ClusterIPs` based on the address family of the first element in the `.spec.ipFamilies` array.
|
||||
|
||||
|
@ -124,7 +119,7 @@ These examples demonstrate the behavior of various dual-stack Service configurat
|
|||
|
||||
#### Dual-stack defaults on existing Services
|
||||
|
||||
These examples demonstrate the default behavior when dual-stack is newly enabled on a cluster where Services already exist. (Upgrading an existing cluster to 1.21 will enable dual-stack unless `--feature-gates="IPv6DualStack=false"` is set.)
|
||||
These examples demonstrate the default behavior when dual-stack is newly enabled on a cluster where Services already exist. (Upgrading an existing cluster to 1.21 or beyond will enable dual-stack.)
|
||||
|
||||
1. When dual-stack is enabled on a cluster, existing Services (whether `IPv4` or `IPv6`) are configured by the control plane to set `.spec.ipFamilyPolicy` to `SingleStack` and set `.spec.ipFamilies` to the address family of the existing Service. The existing Service cluster IP will be stored in `.spec.ClusterIPs`.
|
||||
|
||||
|
|
|
@ -219,25 +219,98 @@ of the controller that should implement the class.
|
|||
|
||||
{{< codenew file="service/networking/external-lb.yaml" >}}
|
||||
|
||||
IngressClass resources contain an optional parameters field. This can be used to
|
||||
reference additional implementation-specific configuration for this class.
|
||||
The `.spec.parameters` field of an IngressClass lets you reference another
|
||||
resource that provides configuration related to that IngressClass.
|
||||
|
||||
#### Namespace-scoped parameters
|
||||
The specific type of parameters to use depends on the ingress controller
|
||||
that you specify in the `.spec.controller` field of the IngressClass.
|
||||
|
||||
{{< feature-state for_k8s_version="v1.22" state="beta" >}}
|
||||
### IngressClass scope
|
||||
|
||||
`Parameters` field has a `scope` and `namespace` field that can be used to
|
||||
reference a namespace-specific resource for configuration of an Ingress class.
|
||||
`Scope` field defaults to `Cluster`, meaning, the default is cluster-scoped
|
||||
resource. Setting `Scope` to `Namespace` and setting the `Namespace` field
|
||||
will reference a parameters resource in a specific namespace:
|
||||
Depending on your ingress controller, you may be able to use parameters
|
||||
that you set cluster-wide, or just for one namespace.
|
||||
|
||||
Namespace-scoped parameters avoid the need for a cluster-scoped CustomResourceDefinition
|
||||
for a parameters resource. This further avoids RBAC-related resources
|
||||
that would otherwise be required to grant permissions to cluster-scoped
|
||||
resources.
|
||||
{{< tabs name="tabs_ingressclass_parameter_scope" >}}
|
||||
{{% tab name="Cluster" %}}
|
||||
The default scope for IngressClass parameters is cluster-wide.
|
||||
|
||||
{{< codenew file="service/networking/namespaced-params.yaml" >}}
|
||||
If you set the `.spec.parameters` field and don't set
|
||||
`.spec.parameters.scope`, or if you set `.spec.parameters.scope` to
|
||||
`Cluster`, then the IngressClass refers to a cluster-scoped resource.
|
||||
The `kind` (in combination the `apiGroup`) of the parameters
|
||||
refers to a cluster-scoped API (possibly a custom resource), and
|
||||
the `name` of the parameters identifies a specific cluster scoped
|
||||
resource for that API.
|
||||
|
||||
For example:
|
||||
```yaml
|
||||
---
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: IngressClass
|
||||
metadata:
|
||||
name: external-lb-1
|
||||
spec:
|
||||
controller example.com/ingress-controller
|
||||
parameters:
|
||||
# The parameters for this IngressClass are specified in a
|
||||
# ClusterIngressParameter (API group k8s.example.net) named
|
||||
# "external-config-1". This definition tells Kubernetes to
|
||||
# look for a cluster-scoped parameter resource.
|
||||
scope: Cluster
|
||||
apiGroup: k8s.example.net
|
||||
kind: ClusterIngressParameter
|
||||
name: external-config-1
|
||||
```
|
||||
{{% /tab %}}
|
||||
{{% tab name="Namespaced" %}}
|
||||
{{< feature-state for_k8s_version="v1.23" state="stable" >}}
|
||||
|
||||
If you set the `.spec.parameters` field and set
|
||||
`.spec.parameters.scope` to `Namespace`, then the IngressClass refers
|
||||
to a namespaced-scoped resource. You must also set the `namespace`
|
||||
field within `.spec.parameters` to the namespace that contains
|
||||
the parameters you want to use.
|
||||
|
||||
The `kind` (in combination the `apiGroup`) of the parameters
|
||||
refers to a namespaced API (for example: ConfigMap), and
|
||||
the `name` of the parameters identifies a specific resource
|
||||
in the namespace you specified in `namespace`.
|
||||
|
||||
Namespace-scoped parameters help the cluster operator delegate control over the
|
||||
configuration (for example: load balancer settings, API gateway definition)
|
||||
that is used for a workload. If you used a cluster-scoped parameter then either:
|
||||
|
||||
- the cluster operator team needs to approve a different team's changes every
|
||||
time there's a new configuration change being applied.
|
||||
- the cluster operator must define specific access controls, such as
|
||||
[RBAC](/docs/reference/access-authn-authz/rbac/) roles and bindings, that let
|
||||
the application team make changes to the cluster-scoped parameters resource.
|
||||
|
||||
The IngressClass API itself is always cluster-scoped.
|
||||
|
||||
Here is an example of an IngressClass that refers to parameters that are
|
||||
namespaced:
|
||||
```yaml
|
||||
---
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: IngressClass
|
||||
metadata:
|
||||
name: external-lb-2
|
||||
spec:
|
||||
controller example.com/ingress-controller
|
||||
parameters:
|
||||
# The parameters for this IngressClass are specified in an
|
||||
# IngressParameter (API group k8s.example.com) named "external-config",
|
||||
# that's in the "external-configuration" configuration namespace.
|
||||
scope: Namespace
|
||||
apiGroup: k8s.example.com
|
||||
kind: IngressParameter
|
||||
namespace: external-configuration
|
||||
name: external-config
|
||||
```
|
||||
|
||||
{{% /tab %}}
|
||||
{{< /tabs >}}
|
||||
|
||||
### Deprecated annotation
|
||||
|
||||
|
|
|
@ -68,6 +68,6 @@ When the [feature gate](/docs/reference/command-line-tools-reference/feature-gat
|
|||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
* Read about [enabling Topology Aware Hints](/docs/tasks/administer-cluster/enabling-topology-aware-hints)
|
||||
* Read about [Topology Aware Hints](/docs/concepts/services-networking/topology-aware-hints)
|
||||
* Read about [Service External Traffic Policy](/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip)
|
||||
* Read [Connecting Applications with Services](/docs/concepts/services-networking/connect-applications-service/)
|
||||
|
|
|
@ -9,7 +9,7 @@ weight: 45
|
|||
|
||||
<!-- overview -->
|
||||
|
||||
{{< feature-state for_k8s_version="v1.21" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.23" state="beta" >}}
|
||||
|
||||
_Topology Aware Hints_ enable topology aware routing by including suggestions
|
||||
for how clients should consume endpoints. This approach adds metadata to enable
|
||||
|
@ -35,8 +35,7 @@ can then consume those hints, and use them to influence how traffic to is routed
|
|||
|
||||
## Using Topology Aware Hints
|
||||
|
||||
If you have [enabled](/docs/tasks/administer-cluster/enabling-topology-aware-hints) the
|
||||
overall feature, you can activate Topology Aware Hints for a Service by setting the
|
||||
You can activate Topology Aware Hints for a Service by setting the
|
||||
`service.kubernetes.io/topology-aware-hints` annotation to `auto`. This tells
|
||||
the EndpointSlice controller to set topology hints if it is deemed safe.
|
||||
Importantly, this does not guarantee that hints will always be set.
|
||||
|
@ -156,5 +155,4 @@ zone.
|
|||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
* Read about [enabling Topology Aware Hints](/docs/tasks/administer-cluster/enabling-topology-aware-hints/)
|
||||
* Read [Connecting Applications with Services](/docs/concepts/services-networking/connect-applications-service/)
|
||||
|
|
|
@ -130,10 +130,7 @@ As a cluster administrator, you can use a [PodSecurityPolicy](/docs/concepts/pol
|
|||
|
||||
### Generic ephemeral volumes
|
||||
|
||||
{{< feature-state for_k8s_version="v1.21" state="beta" >}}
|
||||
|
||||
This feature requires the `GenericEphemeralVolume` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to be
|
||||
enabled. Because this is a beta feature, it is enabled by default.
|
||||
{{< feature-state for_k8s_version="v1.23" state="stable" >}}
|
||||
|
||||
Generic ephemeral volumes are similar to `emptyDir` volumes in the
|
||||
sense that they provide a per-pod directory for scratch data that is
|
||||
|
@ -245,7 +242,6 @@ PVCs indirectly if they can create Pods, even if they do not have
|
|||
permission to create PVCs directly. Cluster administrators must be
|
||||
aware of this. If this does not fit their security model, they have
|
||||
two choices:
|
||||
- Explicitly disable the feature through the feature gate.
|
||||
- Use a [Pod Security
|
||||
Policy](/docs/concepts/policy/pod-security-policy/) where the
|
||||
`volumes` list does not contain the `ephemeral` volume type
|
||||
|
@ -274,4 +270,3 @@ See [local ephemeral storage](/docs/concepts/configuration/manage-resources-cont
|
|||
|
||||
- For more information on the design, see the
|
||||
[Generic ephemeral inline volumes KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1698-generic-ephemeral-volumes/README.md).
|
||||
- For more information on further development of this feature, see the [enhancement tracking issue #1698](https://github.com/kubernetes/enhancements/issues/1698).
|
||||
|
|
|
@ -220,19 +220,19 @@ to `Retain`, including cases where you are reusing an existing PV.
|
|||
|
||||
{{< feature-state for_k8s_version="v1.11" state="beta" >}}
|
||||
|
||||
Support for expanding PersistentVolumeClaims (PVCs) is now enabled by default. You can expand
|
||||
Support for expanding PersistentVolumeClaims (PVCs) is enabled by default. You can expand
|
||||
the following types of volumes:
|
||||
|
||||
* gcePersistentDisk
|
||||
* azureDisk
|
||||
* azureFile
|
||||
* awsElasticBlockStore
|
||||
* Cinder
|
||||
* cinder (deprecated)
|
||||
* {{< glossary_tooltip text="csi" term_id="csi" >}}
|
||||
* flexVolume (deprecated)
|
||||
* gcePersistentDisk
|
||||
* glusterfs
|
||||
* rbd
|
||||
* Azure File
|
||||
* Azure Disk
|
||||
* Portworx
|
||||
* FlexVolumes
|
||||
* {{< glossary_tooltip text="CSI" term_id="csi" >}}
|
||||
* portworxVolume
|
||||
|
||||
You can only expand a PVC if its storage class's `allowVolumeExpansion` field is set to true.
|
||||
|
||||
|
@ -269,8 +269,8 @@ When a volume contains a file system, the file system is only resized when a new
|
|||
the PersistentVolumeClaim in `ReadWrite` mode. File system expansion is either done when a Pod is starting up
|
||||
or when a Pod is running and the underlying file system supports online expansion.
|
||||
|
||||
FlexVolumes allow resize if the driver is set with the `RequiresFSResize` capability to `true`.
|
||||
The FlexVolume can be resized on Pod restart.
|
||||
FlexVolumes (deprecated since Kubernetes v1.23) allow resize if the driver is configured with the
|
||||
`RequiresFSResize` capability to `true`. The FlexVolume can be resized on Pod restart.
|
||||
|
||||
#### Resizing an in-use PersistentVolumeClaim
|
||||
|
||||
|
@ -298,6 +298,11 @@ Expanding EBS volumes is a time-consuming operation. Also, there is a per-volume
|
|||
|
||||
#### Recovering from Failure when Expanding Volumes
|
||||
|
||||
If a user specifies a new size that is too big to be satisfied by underlying storage system, expansion of PVC will be continuously retried until user or cluster administrator takes some action. This can be undesirable and hence Kubernetes provides following methods of recovering from such failures.
|
||||
|
||||
{{< tabs name="recovery_methods" >}}
|
||||
{{% tab name="Manually with Cluster Administrator access" %}}
|
||||
|
||||
If expanding underlying storage fails, the cluster administrator can manually recover the Persistent Volume Claim (PVC) state and cancel the resize requests. Otherwise, the resize requests are continuously retried by the controller without administrator intervention.
|
||||
|
||||
1. Mark the PersistentVolume(PV) that is bound to the PersistentVolumeClaim(PVC) with `Retain` reclaim policy.
|
||||
|
@ -306,6 +311,30 @@ If expanding underlying storage fails, the cluster administrator can manually re
|
|||
4. Re-create the PVC with smaller size than PV and set `volumeName` field of the PVC to the name of the PV. This should bind new PVC to existing PV.
|
||||
5. Don't forget to restore the reclaim policy of the PV.
|
||||
|
||||
{{% /tab %}}
|
||||
{{% tab name="By requesting expansion to smaller size" %}}
|
||||
{{% feature-state for_k8s_version="v1.23" state="alpha" %}}
|
||||
|
||||
{{< note >}}
|
||||
Recovery from failing PVC expansion by users is available as an alpha feature since Kubernetes 1.23. The `RecoverVolumeExpansionFailure` feature must be enabled for this feature to work. Refer to the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) documentation for more information.
|
||||
{{< /note >}}
|
||||
|
||||
If the feature gates `ExpandPersistentVolumes` and `RecoverVolumeExpansionFailure` are both
|
||||
enabled in your cluster, and expansion has failed for a PVC, you can retry expansion with a
|
||||
smaller size than the previously requested value. To request a new expansion attempt with a
|
||||
smaller proposed size, edit `.spec.resources` for that PVC and choose a value that is less than the
|
||||
value you previously tried.
|
||||
This is useful if expansion to a higher value did not succeed because of capacity constraint.
|
||||
If that has happened, or you suspect that it might have, you can retry expansion by specifying a
|
||||
size that is within the capacity limits of underlying storage provider. You can monitor status of resize operation by watching `.status.resizeStatus` and events on the PVC.
|
||||
|
||||
Note that,
|
||||
although you can a specify a lower amount of storage than what was requested previously,
|
||||
the new value must still be higher than `.status.capacity`.
|
||||
Kubernetes does not support shrinking a PVC to less than its current size.
|
||||
{{% /tab %}}
|
||||
{{% /tabs %}}
|
||||
|
||||
|
||||
## Types of Persistent Volumes
|
||||
|
||||
|
@ -317,7 +346,6 @@ PersistentVolume types are implemented as plugins. Kubernetes currently supports
|
|||
* [`cephfs`](/docs/concepts/storage/volumes/#cephfs) - CephFS volume
|
||||
* [`csi`](/docs/concepts/storage/volumes/#csi) - Container Storage Interface (CSI)
|
||||
* [`fc`](/docs/concepts/storage/volumes/#fc) - Fibre Channel (FC) storage
|
||||
* [`flexVolume`](/docs/concepts/storage/volumes/#flexVolume) - FlexVolume
|
||||
* [`gcePersistentDisk`](/docs/concepts/storage/volumes/#gcepersistentdisk) - GCE Persistent Disk
|
||||
* [`glusterfs`](/docs/concepts/storage/volumes/#glusterfs) - Glusterfs volume
|
||||
* [`hostPath`](/docs/concepts/storage/volumes/#hostpath) - HostPath volume
|
||||
|
@ -335,6 +363,8 @@ The following types of PersistentVolume are deprecated. This means that support
|
|||
|
||||
* [`cinder`](/docs/concepts/storage/volumes/#cinder) - Cinder (OpenStack block storage)
|
||||
(**deprecated** in v1.18)
|
||||
* [`flexVolume`](/docs/concepts/storage/volumes/#flexvolume) - FlexVolume
|
||||
(**deprecated** in v1.23)
|
||||
* [`flocker`](/docs/concepts/storage/volumes/#flocker) - Flocker storage
|
||||
(**deprecated** in v1.22)
|
||||
* [`quobyte`](/docs/concepts/storage/volumes/#quobyte) - Quobyte volume
|
||||
|
|
|
@ -830,11 +830,11 @@ GitHub project has [instructions](https://github.com/quobyte/quobyte-csi#quobyte
|
|||
### rbd
|
||||
|
||||
An `rbd` volume allows a
|
||||
[Rados Block Device](https://docs.ceph.com/en/latest/rbd/) (RBD) volume to mount into your
|
||||
Pod. Unlike `emptyDir`, which is erased when a pod is removed, the contents of
|
||||
an `rbd` volume are preserved and the volume is unmounted. This
|
||||
means that a RBD volume can be pre-populated with data, and that data can
|
||||
be shared between pods.
|
||||
[Rados Block Device](https://docs.ceph.com/en/latest/rbd/) (RBD) volume to mount
|
||||
into your Pod. Unlike `emptyDir`, which is erased when a pod is removed, the
|
||||
contents of an `rbd` volume are preserved and the volume is unmounted. This
|
||||
means that a RBD volume can be pre-populated with data, and that data can be
|
||||
shared between pods.
|
||||
|
||||
{{< note >}}
|
||||
You must have a Ceph installation running before you can use RBD.
|
||||
|
@ -849,6 +849,38 @@ Simultaneous writers are not allowed.
|
|||
See the [RBD example](https://github.com/kubernetes/examples/tree/master/volumes/rbd)
|
||||
for more details.
|
||||
|
||||
#### RBD CSI migration {#rbd-csi-migration}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.23" state="alpha" >}}
|
||||
|
||||
The `CSIMigration` feature for `RBD`, when enabled, redirects all plugin
|
||||
operations from the existing in-tree plugin to the `rbd.csi.ceph.com` {{<
|
||||
glossary_tooltip text="CSI" term_id="csi" >}} driver. In order to use this
|
||||
feature, the
|
||||
[Ceph CSI driver](https://github.com/ceph/ceph-csi)
|
||||
must be installed on the cluster and the `CSIMigration` and `CSIMigrationRBD`
|
||||
[feature gates](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
must be enabled.
|
||||
|
||||
{{< note >}}
|
||||
|
||||
As a Kubernetes cluster operator that administers storage, here are the
|
||||
prerequisites that you must complete before you attempt migration to the
|
||||
RBD CSI driver:
|
||||
|
||||
* You must install the Ceph CSI driver (`rbd.csi.ceph.com`), v3.5.0 or above,
|
||||
into your Kubernetes cluster.
|
||||
* considering the `clusterID` field is a required parameter for CSI driver for
|
||||
its operations, but in-tree StorageClass has `monitors` field as a required
|
||||
parameter, a Kubernetes storage admin has to create a clusterID based on the
|
||||
monitors hash ( ex:`#echo -n
|
||||
'<monitors_string>' | md5sum`) in the CSI config map and keep the monitors
|
||||
under this clusterID configuration.
|
||||
* Also, if the value of `adminId` in the in-tree Storageclass is different from
|
||||
`admin`, the `adminSecretName` mentioned in the in-tree Storageclass has to be
|
||||
patched with the base64 value of the `adminId` parameter value, otherwise this
|
||||
step can be skipped. {{< /note >}}
|
||||
|
||||
### secret
|
||||
|
||||
A `secret` volume is used to pass sensitive information, such as passwords, to
|
||||
|
@ -1018,6 +1050,16 @@ but new volumes created by the vSphere CSI driver will not be honoring these par
|
|||
|
||||
To turn off the `vsphereVolume` plugin from being loaded by the controller manager and the kubelet, you need to set `InTreePluginvSphereUnregister` feature flag to `true`. You must install a `csi.vsphere.vmware.com` {{< glossary_tooltip text="CSI" term_id="csi" >}} driver on all worker nodes.
|
||||
|
||||
#### Portworx CSI migration
|
||||
{{< feature-state for_k8s_version="v1.23" state="alpha" >}}
|
||||
|
||||
The `CSIMigration` feature for Portworx has been added but disabled by default in Kubernetes 1.23 since it's in alpha state.
|
||||
It redirects all plugin operations from the existing in-tree plugin to the
|
||||
`pxd.portworx.com` Container Storage Interface (CSI) Driver.
|
||||
[Portworx CSI Driver](https://docs.portworx.com/portworx-install-with-kubernetes/storage-operations/csi/)
|
||||
must be installed on the cluster.
|
||||
To enable the feature, set `CSIMigrationPortworx=true` in kube-controller-manager and kubelet.
|
||||
|
||||
## Using subPath {#using-subpath}
|
||||
|
||||
Sometimes, it is useful to share one volume for multiple uses in a single pod.
|
||||
|
@ -1113,8 +1155,7 @@ To learn about requesting space using a resource specification, see
|
|||
## Out-of-tree volume plugins
|
||||
|
||||
The out-of-tree volume plugins include
|
||||
{{< glossary_tooltip text="Container Storage Interface" term_id="csi" >}} (CSI)
|
||||
and FlexVolume. These plugins enable storage vendors to create custom storage plugins
|
||||
{{< glossary_tooltip text="Container Storage Interface" term_id="csi" >}} (CSI), and also FlexVolume (which is deprecated). These plugins enable storage vendors to create custom storage plugins
|
||||
without adding their plugin source code to the Kubernetes repository.
|
||||
|
||||
Previously, all volume plugins were "in-tree". The "in-tree" plugins were built, linked, compiled,
|
||||
|
@ -1247,13 +1288,21 @@ are listed in [Types of Volumes](#volume-types).
|
|||
|
||||
### flexVolume
|
||||
|
||||
FlexVolume is an out-of-tree plugin interface that has existed in Kubernetes
|
||||
since version 1.2 (before CSI). It uses an exec-based model to interface with
|
||||
drivers. The FlexVolume driver binaries must be installed in a pre-defined volume
|
||||
plugin path on each node and in some cases the control plane nodes as well.
|
||||
{{< feature-state for_k8s_version="v1.23" state="deprecated" >}}
|
||||
|
||||
Pods interact with FlexVolume drivers through the `flexvolume` in-tree volume plugin.
|
||||
For more details, see the [FlexVolume](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-storage/flexvolume.md) examples.
|
||||
FlexVolume is an out-of-tree plugin interface that uses an exec-based model to interface
|
||||
with storage drivers. The FlexVolume driver binaries must be installed in a pre-defined
|
||||
volume plugin path on each node and in some cases the control plane nodes as well.
|
||||
|
||||
Pods interact with FlexVolume drivers through the `flexVolume` in-tree volume plugin.
|
||||
For more details, see the FlexVolume [README](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-storage/flexvolume.md#readme) document.
|
||||
|
||||
{{< note >}}
|
||||
FlexVolume is deprecated. Using an out-of-tree CSI driver is the recommended way to integrate external storage with Kubernetes.
|
||||
|
||||
Maintainers of FlexVolume driver should implement a CSI Driver and help to migrate users of FlexVolume drivers to CSI.
|
||||
Users of FlexVolume should move their workloads to use the equivalent CSI Driver.
|
||||
{{< /note >}}
|
||||
|
||||
## Mount propagation
|
||||
|
||||
|
|
|
@ -151,6 +151,8 @@ and set this flag to `false`. For example:
|
|||
* For instructions on creating and working with CronJobs, and for an example
|
||||
of a CronJob manifest,
|
||||
see [Running automated tasks with CronJobs](/docs/tasks/job/automated-tasks-with-cron-jobs/).
|
||||
* For instructions to clean up failed or completed jobs automatically,
|
||||
see [Clean up Jobs automatically](/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically)
|
||||
* `CronJob` is part of the Kubernetes REST API.
|
||||
Read the {{< api-reference page="workload-resources/cron-job-v1" >}}
|
||||
object definition to understand the API for Kubernetes cron jobs.
|
||||
|
|
|
@ -436,7 +436,10 @@ version of Kubernetes you're using](/docs/home/supported-doc-versions/).
|
|||
When a Job is created, the Job controller will immediately begin creating Pods
|
||||
to satisfy the Job's requirements and will continue to do so until the Job is
|
||||
complete. However, you may want to temporarily suspend a Job's execution and
|
||||
resume it later. To suspend a Job, you can update the `.spec.suspend` field of
|
||||
resume it later, or start Jobs in suspended state and have a custom controller
|
||||
decide later when to start them.
|
||||
|
||||
To suspend a Job, you can update the `.spec.suspend` field of
|
||||
the Job to true; later, when you want to resume it again, update it to false.
|
||||
Creating a Job with `.spec.suspend` set to true will create it in the suspended
|
||||
state.
|
||||
|
@ -522,6 +525,32 @@ directly a result of toggling the `.spec.suspend` field. In the time between
|
|||
these two events, we see that no Pods were created, but Pod creation restarted
|
||||
as soon as the Job was resumed.
|
||||
|
||||
### Mutable Scheduling Directives
|
||||
|
||||
{{< feature-state for_k8s_version="v1.23" state="beta" >}}
|
||||
|
||||
{{< note >}}
|
||||
In order to use this behavior, you must enable the `JobMutableNodeSchedulingDirectives`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
on the [API server](/docs/reference/command-line-tools-reference/kube-apiserver/).
|
||||
It is enabled by default.
|
||||
{{< /note >}}
|
||||
|
||||
In most cases a parallel job will want the pods to run with constraints,
|
||||
like all in the same zone, or all either on GPU model x or y but not a mix of both.
|
||||
|
||||
The [suspend](#suspending-a-job) field is the first step towards achieving those semantics. Suspend allows a
|
||||
custom queue controller to decide when a job should start; However, once a job is unsuspended,
|
||||
a custom queue controller has no influence on where the pods of a job will actually land.
|
||||
|
||||
This feature allows updating a Job's scheduling directives before it starts, which gives custom queue
|
||||
controllers the ability to influence pod placement while at the same time offloading actual
|
||||
pod-to-node assignment to kube-scheduler. This is allowed only for suspended Jobs that have never
|
||||
been unsuspended before.
|
||||
|
||||
The fields in a Job's pod template that can be updated are node affinity, node selector,
|
||||
tolerations, labels and annotations.
|
||||
|
||||
### Specifying your own Pod selector
|
||||
|
||||
Normally, when you create a Job object, you do not specify `.spec.selector`.
|
||||
|
@ -591,18 +620,19 @@ mismatch.
|
|||
|
||||
### Job tracking with finalizers
|
||||
|
||||
{{< feature-state for_k8s_version="v1.22" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.23" state="beta" >}}
|
||||
|
||||
{{< note >}}
|
||||
In order to use this behavior, you must enable the `JobTrackingWithFinalizers`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
on the [API server](/docs/reference/command-line-tools-reference/kube-apiserver/)
|
||||
and the [controller manager](/docs/reference/command-line-tools-reference/kube-controller-manager/).
|
||||
It is disabled by default.
|
||||
It is enabled by default.
|
||||
|
||||
When enabled, the control plane tracks new Jobs using the behavior described
|
||||
below. Existing Jobs are unaffected. As a user, the only difference you would
|
||||
see is that the control plane tracking of Job completion is more accurate.
|
||||
below. Jobs created before the feature was enabled are unaffected. As a user,
|
||||
the only difference you would see is that the control plane tracking of Job
|
||||
completion is more accurate.
|
||||
{{< /note >}}
|
||||
|
||||
When this feature isn't enabled, the Job {{< glossary_tooltip term_id="controller" >}}
|
||||
|
|
|
@ -77,6 +77,7 @@ spec:
|
|||
app: nginx # has to match .spec.template.metadata.labels
|
||||
serviceName: "nginx"
|
||||
replicas: 3 # by default is 1
|
||||
minReadySeconds: 10 # by default is 0
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
|
@ -112,9 +113,24 @@ In the above example:
|
|||
The name of a StatefulSet object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
## Pod Selector
|
||||
### Pod Selector
|
||||
|
||||
You must set the `.spec.selector` field of a StatefulSet to match the labels of its `.spec.template.metadata.labels`. Prior to Kubernetes 1.8, the `.spec.selector` field was defaulted when omitted. In 1.8 and later versions, failing to specify a matching Pod Selector will result in a validation error during StatefulSet creation.
|
||||
You must set the `.spec.selector` field of a StatefulSet to match the labels of its `.spec.template.metadata.labels`. In 1.8 and later versions, failing to specify a matching Pod Selector will result in a validation error during StatefulSet creation.
|
||||
|
||||
### Volume Claim Templates
|
||||
|
||||
You can set the `.spec.volumeClaimTemplates` which can provide stable storage using [PersistentVolumes](/docs/concepts/storage/persistent-volumes/) provisioned by a PersistentVolume Provisioner.
|
||||
|
||||
|
||||
### Minimum ready seconds
|
||||
|
||||
{{< feature-state for_k8s_version="v1.23" state="beta" >}}
|
||||
|
||||
`.spec.minReadySeconds` is an optional field that specifies the minimum number of seconds for which a newly
|
||||
created Pod should be ready without any of its containers crashing, for it to be considered available.
|
||||
Please note that this feature is beta and enabled by default. Please opt out by unsetting the StatefulSetMinReadySeconds flag, if you don't
|
||||
want this feature to be enabled. This field defaults to 0 (the Pod will be considered
|
||||
available as soon as it is ready). To learn more about when a Pod is considered ready, see [Container Probes](/docs/concepts/workloads/pods/pod-lifecycle/#container-probes).
|
||||
|
||||
## Pod Identity
|
||||
|
||||
|
@ -284,16 +300,84 @@ After reverting the template, you must also delete any Pods that StatefulSet had
|
|||
already attempted to run with the bad configuration.
|
||||
StatefulSet will then begin to recreate the Pods using the reverted template.
|
||||
|
||||
### Minimum ready seconds
|
||||
|
||||
{{< feature-state for_k8s_version="v1.22" state="alpha" >}}
|
||||
## PersistentVolumeClaim retention
|
||||
|
||||
`.spec.minReadySeconds` is an optional field that specifies the minimum number of seconds for which a newly
|
||||
created Pod should be ready without any of its containers crashing, for it to be considered available.
|
||||
This defaults to 0 (the Pod will be considered available as soon as it is ready). To learn more about when
|
||||
a Pod is considered ready, see [Container Probes](/docs/concepts/workloads/pods/pod-lifecycle/#container-probes).
|
||||
{{< feature-state for_k8s_version="v1.23" state="alpha" >}}
|
||||
|
||||
Please note that this field only works if you enable the `StatefulSetMinReadySeconds` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
|
||||
The optional `.spec.persistentVolumeClaimRetentionPolicy` field controls if
|
||||
and how PVCs are deleted during the lifecycle of a StatefulSet. You must enable the
|
||||
`StatefulSetAutoDeletePVC` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
to use this field. Once enabled, there are two policies you can configure for each
|
||||
StatefulSet:
|
||||
|
||||
`whenDeleted`
|
||||
: configures the volume retention behavior that applies when the StatefulSet is deleted
|
||||
|
||||
`whenScaled`
|
||||
: configures the volume retention behavior that applies when the replica count of
|
||||
the StatefulSet is reduced; for example, when scaling down the set.
|
||||
|
||||
For each policy that you can configure, you can set the value to either `Delete` or `Retain`.
|
||||
|
||||
`Delete`
|
||||
: The PVCs created from the StatefulSet `volumeClaimTemplate` are deleted for each Pod
|
||||
affected by the policy. With the `whenDeleted` policy all PVCs from the
|
||||
`volumeClaimTemplate` are deleted after their Pods have been deleted. With the
|
||||
`whenScaled` policy, only PVCs corresponding to Pod replicas being scaled down are
|
||||
deleted, after their Pods have been deleted.
|
||||
|
||||
`Retain` (default)
|
||||
: PVCs from the `volumeClaimTemplate` are not affected when their Pod is
|
||||
deleted. This is the behavior before this new feature.
|
||||
|
||||
Bear in mind that these policies **only** apply when Pods are being removed due to the
|
||||
StatefulSet being deleted or scaled down. For example, if a Pod associated with a StatefulSet
|
||||
fails due to node failure, and the control plane creates a replacement Pod, the StatefulSet
|
||||
retains the existing PVC. The existing volume is unaffected, and the cluster will attach it to
|
||||
the node where the new Pod is about to launch.
|
||||
|
||||
The default for policies is `Retain`, matching the StatefulSet behavior before this new feature.
|
||||
|
||||
Here is an example policy.
|
||||
|
||||
```yaml
|
||||
apiVersion: apps/v1
|
||||
kind: StatefulSet
|
||||
...
|
||||
spec:
|
||||
persistentVolumeClaimRetentionPolicy:
|
||||
whenDeleted: Retain
|
||||
whenScaled: Delete
|
||||
...
|
||||
```
|
||||
|
||||
The StatefulSet {{<glossary_tooltip text="controller" term_id="controller">}} adds [owner
|
||||
references](/docs/concepts/overview/working-with-objects/owners-dependents/#owner-references-in-object-specifications)
|
||||
to its PVCs, which are then deleted by the {{<glossary_tooltip text="garbage collector"
|
||||
term_id="garbage-collection">}} after the Pod is terminated. This enables the Pod to
|
||||
cleanly unmount all volumes before the PVCs are deleted (and before the backing PV and
|
||||
volume are deleted, depending on the retain policy). When you set the `whenDeleted`
|
||||
policy to `Delete`, an owner reference to the StatefulSet instance is placed on all PVCs
|
||||
associated with that StatefulSet.
|
||||
|
||||
The `whenScaled` policy must delete PVCs only when a Pod is scaled down, and not when a
|
||||
Pod is deleted for another reason. When reconciling, the StatefulSet controller compares
|
||||
its desired replica count to the actual Pods present on the cluster. Any StatefulSet Pod
|
||||
whose id greater than the replica count is condemned and marked for deletion. If the
|
||||
`whenScaled` policy is `Delete`, the condemned Pods are first set as owners to the
|
||||
associated StatefulSet template PVCs, before the Pod is deleted. This causes the PVCs
|
||||
to be garbage collected after only the condemned Pods have terminated.
|
||||
|
||||
This means that if the controller crashes and restarts, no Pod will be deleted before its
|
||||
owner reference has been updated appropriate to the policy. If a condemned Pod is
|
||||
force-deleted while the controller is down, the owner reference may or may not have been
|
||||
set up, depending on when the controller crashed. It may take several reconcile loops to
|
||||
update the owner references, so some condemned Pods may have set up owner references and
|
||||
other may not. For this reason we recommend waiting for the controller to come back up,
|
||||
which will verify owner references before terminating Pods. If that is not possible, the
|
||||
operator should verify the owner references on PVCs to ensure the expected objects are
|
||||
deleted when Pods are force-deleted.
|
||||
|
||||
### Replicas
|
||||
|
||||
|
|
|
@ -1,75 +1,68 @@
|
|||
---
|
||||
reviewers:
|
||||
- janetkuo
|
||||
title: TTL Controller for Finished Resources
|
||||
title: Automatic Clean-up for Finished Jobs
|
||||
content_type: concept
|
||||
weight: 70
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
{{< feature-state for_k8s_version="v1.21" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.23" state="stable" >}}
|
||||
|
||||
The TTL controller provides a TTL (time to live) mechanism to limit the lifetime of resource
|
||||
objects that have finished execution. TTL controller only handles
|
||||
{{< glossary_tooltip text="Jobs" term_id="job" >}} for now,
|
||||
and may be expanded to handle other resources that will finish execution,
|
||||
such as Pods and custom resources.
|
||||
|
||||
This feature is currently beta and enabled by default, and can be disabled via
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
`TTLAfterFinished` in both kube-apiserver and kube-controller-manager.
|
||||
TTL-after-finished {{<glossary_tooltip text="controller" term_id="controller">}} provides a
|
||||
TTL (time to live) mechanism to limit the lifetime of resource objects that
|
||||
have finished execution. TTL controller only handles
|
||||
{{< glossary_tooltip text="Jobs" term_id="job" >}}.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## TTL Controller
|
||||
## TTL-after-finished Controller
|
||||
|
||||
The TTL controller only supports Jobs for now. A cluster operator can use this feature to clean
|
||||
The TTL-after-finished controller is only supported for Jobs. A cluster operator can use this feature to clean
|
||||
up finished Jobs (either `Complete` or `Failed`) automatically by specifying the
|
||||
`.spec.ttlSecondsAfterFinished` field of a Job, as in this
|
||||
[example](/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically).
|
||||
The TTL controller will assume that a resource is eligible to be cleaned up
|
||||
TTL seconds after the resource has finished, in other words, when the TTL has expired. When the
|
||||
TTL controller cleans up a resource, it will delete it cascadingly, that is to say it will delete
|
||||
its dependent objects together with it. Note that when the resource is deleted,
|
||||
The TTL-after-finished controller will assume that a job is eligible to be cleaned up
|
||||
TTL seconds after the job has finished, in other words, when the TTL has expired. When the
|
||||
TTL-after-finished controller cleans up a job, it will delete it cascadingly, that is to say it will delete
|
||||
its dependent objects together with it. Note that when the job is deleted,
|
||||
its lifecycle guarantees, such as finalizers, will be honored.
|
||||
|
||||
The TTL seconds can be set at any time. Here are some examples for setting the
|
||||
`.spec.ttlSecondsAfterFinished` field of a Job:
|
||||
|
||||
* Specify this field in the resource manifest, so that a Job can be cleaned up
|
||||
* Specify this field in the job manifest, so that a Job can be cleaned up
|
||||
automatically some time after it finishes.
|
||||
* Set this field of existing, already finished resources, to adopt this new
|
||||
* Set this field of existing, already finished jobs, to adopt this new
|
||||
feature.
|
||||
* Use a
|
||||
[mutating admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks)
|
||||
to set this field dynamically at resource creation time. Cluster administrators can
|
||||
use this to enforce a TTL policy for finished resources.
|
||||
to set this field dynamically at job creation time. Cluster administrators can
|
||||
use this to enforce a TTL policy for finished jobs.
|
||||
* Use a
|
||||
[mutating admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks)
|
||||
to set this field dynamically after the resource has finished, and choose
|
||||
different TTL values based on resource status, labels, etc.
|
||||
to set this field dynamically after the job has finished, and choose
|
||||
different TTL values based on job status, labels, etc.
|
||||
|
||||
## Caveat
|
||||
|
||||
### Updating TTL Seconds
|
||||
|
||||
Note that the TTL period, e.g. `.spec.ttlSecondsAfterFinished` field of Jobs,
|
||||
can be modified after the resource is created or has finished. However, once the
|
||||
can be modified after the job is created or has finished. However, once the
|
||||
Job becomes eligible to be deleted (when the TTL has expired), the system won't
|
||||
guarantee that the Jobs will be kept, even if an update to extend the TTL
|
||||
returns a successful API response.
|
||||
|
||||
### Time Skew
|
||||
|
||||
Because TTL controller uses timestamps stored in the Kubernetes resources to
|
||||
Because TTL-after-finished controller uses timestamps stored in the Kubernetes jobs to
|
||||
determine whether the TTL has expired or not, this feature is sensitive to time
|
||||
skew in the cluster, which may cause TTL controller to clean up resource objects
|
||||
skew in the cluster, which may cause TTL-after-finish controller to clean up job objects
|
||||
at the wrong time.
|
||||
|
||||
In Kubernetes, it's required to run NTP on all nodes
|
||||
(see [#6159](https://github.com/kubernetes/kubernetes/issues/6159#issuecomment-93844058))
|
||||
to avoid time skew. Clocks aren't always correct, but the difference should be
|
||||
Clocks aren't always correct, but the difference should be
|
||||
very small. Please be aware of this risk when setting a non-zero TTL.
|
||||
|
||||
|
||||
|
|
|
@ -9,22 +9,13 @@ weight: 80
|
|||
|
||||
<!-- overview -->
|
||||
|
||||
{{< feature-state state="alpha" for_k8s_version="v1.22" >}}
|
||||
{{< feature-state state="beta" for_k8s_version="v1.23" >}}
|
||||
|
||||
This page provides an overview of ephemeral containers: a special type of container
|
||||
that runs temporarily in an existing {{< glossary_tooltip term_id="pod" >}} to
|
||||
accomplish user-initiated actions such as troubleshooting. You use ephemeral
|
||||
containers to inspect services rather than to build applications.
|
||||
|
||||
{{< warning >}}
|
||||
Ephemeral containers are in alpha state and are not suitable for production
|
||||
clusters. In accordance with the [Kubernetes Deprecation Policy](
|
||||
/docs/reference/using-api/deprecation-policy/), this alpha feature could change
|
||||
significantly in the future or be removed entirely.
|
||||
{{< /warning >}}
|
||||
|
||||
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## Understanding ephemeral containers
|
||||
|
|
|
@ -233,57 +233,87 @@ When a Pod's containers are Ready but at least one custom condition is missing o
|
|||
|
||||
## Container probes
|
||||
|
||||
A [Probe](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#probe-v1-core) is a diagnostic
|
||||
A _probe_ is a diagnostic
|
||||
performed periodically by the
|
||||
[kubelet](/docs/reference/command-line-tools-reference/kubelet/)
|
||||
on a Container. To perform a diagnostic,
|
||||
the kubelet calls a
|
||||
[Handler](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#handler-v1-core) implemented by
|
||||
the container. There are three types of handlers:
|
||||
on a container. To perform a diagnostic,
|
||||
the kubelet either executes code within the container, or makes
|
||||
a network request.
|
||||
|
||||
* [ExecAction](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#execaction-v1-core):
|
||||
Executes a specified command inside the container. The diagnostic
|
||||
### Check mechanisms {#probe-check-methods}
|
||||
|
||||
There are four different ways to check a container using a probe.
|
||||
Each probe must define exactly one of these four mechanisms:
|
||||
|
||||
`exec`
|
||||
: Executes a specified command inside the container. The diagnostic
|
||||
is considered successful if the command exits with a status code of 0.
|
||||
|
||||
* [TCPSocketAction](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#tcpsocketaction-v1-core):
|
||||
Performs a TCP check against the Pod's IP address on
|
||||
a specified port. The diagnostic is considered successful if the port is open.
|
||||
`grpc`
|
||||
: Performs a remote procedure call using [gRPC](https://grpc.io/).
|
||||
The target should implement
|
||||
[gRPC health checks](https://grpc.io/grpc/core/md_doc_health-checking.html).
|
||||
The diagnostic is considered successful if the `status`
|
||||
of the response is `SERVING`.
|
||||
gRPC probes are an alpha feature and are only available if you
|
||||
enable the `GRPCContainerProbe`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
|
||||
|
||||
* [HTTPGetAction](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#httpgetaction-v1-core):
|
||||
Performs an HTTP `GET` request against the Pod's IP
|
||||
address on a specified port and path. The diagnostic is considered successful
|
||||
if the response has a status code greater than or equal to 200 and less than 400.
|
||||
`httpGet`
|
||||
: Performs an HTTP `GET` request against the Pod's IP
|
||||
address on a specified port and path. The diagnostic is
|
||||
considered successful if the response has a status code
|
||||
greater than or equal to 200 and less than 400.
|
||||
|
||||
`tcpSocket`
|
||||
: Performs a TCP check against the Pod's IP address on
|
||||
a specified port. The diagnostic is considered successful if
|
||||
the port is open. If the remote system (the container) closes
|
||||
the connection immediately after it opens, this counts as healthy.
|
||||
|
||||
### Probe outcome
|
||||
|
||||
Each probe has one of three results:
|
||||
|
||||
* `Success`: The container passed the diagnostic.
|
||||
* `Failure`: The container failed the diagnostic.
|
||||
* `Unknown`: The diagnostic failed, so no action should be taken.
|
||||
`Success`
|
||||
: The container passed the diagnostic.
|
||||
|
||||
`Failure`
|
||||
: The container failed the diagnostic.
|
||||
|
||||
`Unknown`
|
||||
: The diagnostic failed (no action should be taken, and the kubelet
|
||||
will make further checks).
|
||||
|
||||
### Types of probe
|
||||
|
||||
The kubelet can optionally perform and react to three kinds of probes on running
|
||||
containers:
|
||||
|
||||
* `livenessProbe`: Indicates whether the container is running. If
|
||||
the liveness probe fails, the kubelet kills the container, and the container
|
||||
is subjected to its [restart policy](#restart-policy). If a Container does not
|
||||
provide a liveness probe, the default state is `Success`.
|
||||
`livenessProbe`
|
||||
: Indicates whether the container is running. If
|
||||
the liveness probe fails, the kubelet kills the container, and the container
|
||||
is subjected to its [restart policy](#restart-policy). If a container does not
|
||||
provide a liveness probe, the default state is `Success`.
|
||||
|
||||
* `readinessProbe`: Indicates whether the container is ready to respond to requests.
|
||||
If the readiness probe fails, the endpoints controller removes the Pod's IP
|
||||
address from the endpoints of all Services that match the Pod. The default
|
||||
state of readiness before the initial delay is `Failure`. If a Container does
|
||||
not provide a readiness probe, the default state is `Success`.
|
||||
`readinessProbe`
|
||||
: Indicates whether the container is ready to respond to requests.
|
||||
If the readiness probe fails, the endpoints controller removes the Pod's IP
|
||||
address from the endpoints of all Services that match the Pod. The default
|
||||
state of readiness before the initial delay is `Failure`. If a container does
|
||||
not provide a readiness probe, the default state is `Success`.
|
||||
|
||||
* `startupProbe`: Indicates whether the application within the container is started.
|
||||
All other probes are disabled if a startup probe is provided, until it succeeds.
|
||||
If the startup probe fails, the kubelet kills the container, and the container
|
||||
is subjected to its [restart policy](#restart-policy). If a Container does not
|
||||
provide a startup probe, the default state is `Success`.
|
||||
`startupProbe`
|
||||
: Indicates whether the application within the container is started.
|
||||
All other probes are disabled if a startup probe is provided, until it succeeds.
|
||||
If the startup probe fails, the kubelet kills the container, and the container
|
||||
is subjected to its [restart policy](#restart-policy). If a container does not
|
||||
provide a startup probe, the default state is `Success`.
|
||||
|
||||
For more information about how to set up a liveness, readiness, or startup probe,
|
||||
see [Configure Liveness, Readiness and Startup Probes](/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/).
|
||||
|
||||
### When should you use a liveness probe?
|
||||
#### When should you use a liveness probe?
|
||||
|
||||
{{< feature-state for_k8s_version="v1.0" state="stable" >}}
|
||||
|
||||
|
@ -295,7 +325,7 @@ with the Pod's `restartPolicy`.
|
|||
If you'd like your container to be killed and restarted if a probe fails, then
|
||||
specify a liveness probe, and specify a `restartPolicy` of Always or OnFailure.
|
||||
|
||||
### When should you use a readiness probe?
|
||||
#### When should you use a readiness probe?
|
||||
|
||||
{{< feature-state for_k8s_version="v1.0" state="stable" >}}
|
||||
|
||||
|
@ -329,7 +359,7 @@ The Pod remains in the unready state while it waits for the containers in the Po
|
|||
to stop.
|
||||
{{< /note >}}
|
||||
|
||||
### When should you use a startup probe?
|
||||
#### When should you use a startup probe?
|
||||
|
||||
{{< feature-state for_k8s_version="v1.20" state="stable" >}}
|
||||
|
||||
|
@ -451,13 +481,13 @@ This avoids a resource leak as Pods are created and terminated over time.
|
|||
## {{% heading "whatsnext" %}}
|
||||
|
||||
* Get hands-on experience
|
||||
[attaching handlers to Container lifecycle events](/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/).
|
||||
[attaching handlers to container lifecycle events](/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/).
|
||||
|
||||
* Get hands-on experience
|
||||
[configuring Liveness, Readiness and Startup Probes](/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/).
|
||||
|
||||
* Learn more about [container lifecycle hooks](/docs/concepts/containers/container-lifecycle-hooks/).
|
||||
|
||||
* For detailed information about Pod / Container status in the API, see [PodStatus](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podstatus-v1-core)
|
||||
and
|
||||
[ContainerStatus](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#containerstatus-v1-core).
|
||||
* For detailed information about Pod and container status in the API, see
|
||||
the API reference documentation covering
|
||||
[`.status`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodStatus) for Pod.
|
||||
|
|
|
@ -583,7 +583,8 @@ subresource of the referenced *owner* can change it.
|
|||
This admission controller implements additional validations for checking incoming `PersistentVolumeClaim` resize requests.
|
||||
|
||||
{{< note >}}
|
||||
Support for volume resizing is available as an alpha feature. Admins must set the feature gate `ExpandPersistentVolumes`
|
||||
Support for volume resizing is available as a beta feature. As a cluster administrator,
|
||||
you must ensure that the feature gate `ExpandPersistentVolumes` is set
|
||||
to `true` to enable resizing.
|
||||
{{< /note >}}
|
||||
|
||||
|
@ -698,7 +699,7 @@ admission plugin, which allows preventing pods from running on specifically tain
|
|||
|
||||
### PodSecurity {#podsecurity}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.22" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.23" state="beta" >}}
|
||||
|
||||
This is the replacement for the deprecated [PodSecurityPolicy](#podsecuritypolicy) admission controller
|
||||
defined in the next section. This admission controller acts on creation and modification of the pod and
|
||||
|
|
|
@ -71,20 +71,23 @@ different Kubernetes components.
|
|||
| `CSIMigration` | `false` | Alpha | 1.14 | 1.16 |
|
||||
| `CSIMigration` | `true` | Beta | 1.17 | |
|
||||
| `CSIMigrationAWS` | `false` | Alpha | 1.14 | |
|
||||
| `CSIMigrationAWS` | `false` | Beta | 1.17 | |
|
||||
| `CSIMigrationAWS` | `false` | Beta | 1.17 | 1.22 |
|
||||
| `CSIMigrationAWS` | `true` | Beta | 1.23 | |
|
||||
| `CSIMigrationAzureDisk` | `false` | Alpha | 1.15 | 1.18 |
|
||||
| `CSIMigrationAzureDisk` | `false` | Beta | 1.19 | |
|
||||
| `CSIMigrationAzureDisk` | `false` | Beta | 1.19 | 1.22 |
|
||||
| `CSIMigrationAzureDisk` | `true` | Beta | 1.23 | |
|
||||
| `CSIMigrationAzureFile` | `false` | Alpha | 1.15 | 1.19 |
|
||||
| `CSIMigrationAzureFile` | `false` | Beta | 1.21 | |
|
||||
| `CSIMigrationGCE` | `false` | Alpha | 1.14 | 1.16 |
|
||||
| `CSIMigrationGCE` | `false` | Beta | 1.17 | |
|
||||
| `CSIMigrationGCE` | `false` | Beta | 1.17 | 1.22 |
|
||||
| `CSIMigrationGCE` | `true` | Beta | 1.23 | |
|
||||
| `CSIMigrationOpenStack` | `false` | Alpha | 1.14 | 1.17 |
|
||||
| `CSIMigrationOpenStack` | `true` | Beta | 1.18 | |
|
||||
| `CSIMigrationvSphere` | `false` | Beta | 1.19 | |
|
||||
| `CSIMigrationPortworx` | `false` | Alpha | 1.23 | |
|
||||
| `CSIMigrationRBD` | `false` | Alpha | 1.23 | |
|
||||
| `CSIStorageCapacity` | `false` | Alpha | 1.19 | 1.20 |
|
||||
| `CSIStorageCapacity` | `true` | Beta | 1.21 | |
|
||||
| `CSIVolumeFSGroupPolicy` | `false` | Alpha | 1.19 | 1.19 |
|
||||
| `CSIVolumeFSGroupPolicy` | `true` | Beta | 1.20 | |
|
||||
| `CSIVolumeHealth` | `false` | Alpha | 1.21 | |
|
||||
| `CSRDuration` | `true` | Beta | 1.22 | |
|
||||
| `ConfigurableFSGroupPolicy` | `false` | Alpha | 1.18 | 1.19 |
|
||||
|
@ -92,6 +95,7 @@ different Kubernetes components.
|
|||
| `ControllerManagerLeaderMigration` | `false` | Alpha | 1.21 | 1.21 |
|
||||
| `ControllerManagerLeaderMigration` | `true` | Beta | 1.22 | |
|
||||
| `CustomCPUCFSQuotaPeriod` | `false` | Alpha | 1.12 | |
|
||||
| `CustomResourceValidationExpressions` | `false` | Alpha | 1.23 | |
|
||||
| `DaemonSetUpdateSurge` | `false` | Alpha | 1.21 | 1.21 |
|
||||
| `DaemonSetUpdateSurge` | `true` | Beta | 1.22 | |
|
||||
| `DefaultPodTopologySpread` | `false` | Alpha | 1.19 | 1.19 |
|
||||
|
@ -108,7 +112,8 @@ different Kubernetes components.
|
|||
| `EfficientWatchResumption` | `true` | Beta | 1.21 | |
|
||||
| `EndpointSliceTerminatingCondition` | `false` | Alpha | 1.20 | 1.21 |
|
||||
| `EndpointSliceTerminatingCondition` | `true` | Beta | 1.22 | |
|
||||
| `EphemeralContainers` | `false` | Alpha | 1.16 | |
|
||||
| `EphemeralContainers` | `false` | Alpha | 1.16 | 1.22 |
|
||||
| `EphemeralContainers` | `true` | Beta | 1.23 | |
|
||||
| `ExpandCSIVolumes` | `false` | Alpha | 1.14 | 1.15 |
|
||||
| `ExpandCSIVolumes` | `true` | Beta | 1.16 | |
|
||||
| `ExpandedDNSConfig` | `false` | Alpha | 1.22 | |
|
||||
|
@ -117,25 +122,24 @@ different Kubernetes components.
|
|||
| `ExpandPersistentVolumes` | `false` | Alpha | 1.8 | 1.10 |
|
||||
| `ExpandPersistentVolumes` | `true` | Beta | 1.11 | |
|
||||
| `ExperimentalHostUserNamespaceDefaulting` | `false` | Beta | 1.5 | |
|
||||
| `GenericEphemeralVolume` | `false` | Alpha | 1.19 | 1.20 |
|
||||
| `GenericEphemeralVolume` | `true` | Beta | 1.21 | |
|
||||
| `GracefulNodeShutdown` | `false` | Alpha | 1.20 | 1.20 |
|
||||
| `GracefulNodeShutdown` | `true` | Beta | 1.21 | |
|
||||
| `GRPCContainerProbe` | `false` | Alpha | 1.23 | |
|
||||
| `HPAContainerMetrics` | `false` | Alpha | 1.20 | |
|
||||
| `HPAScaleToZero` | `false` | Alpha | 1.16 | |
|
||||
| `IdentifyPodOS` | `false` | Alpha | 1.23 | |
|
||||
| `IndexedJob` | `false` | Alpha | 1.21 | 1.21 |
|
||||
| `IndexedJob` | `true` | Beta | 1.22 | |
|
||||
| `IngressClassNamespacedParams` | `false` | Alpha | 1.21 | 1.21 |
|
||||
| `IngressClassNamespacedParams` | `true` | Beta | 1.22 | |
|
||||
| `InTreePluginAWSUnregister` | `false` | Alpha | 1.21 | |
|
||||
| `InTreePluginAzureDiskUnregister` | `false` | Alpha | 1.21 | |
|
||||
| `InTreePluginAzureFileUnregister` | `false` | Alpha | 1.21 | |
|
||||
| `InTreePluginGCEUnregister` | `false` | Alpha | 1.21 | |
|
||||
| `InTreePluginOpenStackUnregister` | `false` | Alpha | 1.21 | |
|
||||
| `InTreePluginvSphereUnregister` | `false` | Alpha | 1.21 | |
|
||||
| `IPv6DualStack` | `false` | Alpha | 1.15 | 1.20 |
|
||||
| `IPv6DualStack` | `true` | Beta | 1.21 | |
|
||||
| `JobTrackingWithFinalizers` | `false` | Alpha | 1.22 | |
|
||||
| `JobMutableNodeSchedulingDirectives` | `true` | Beta | 1.23 | |
|
||||
| `JobReadyPods` | `false` | Alpha | 1.23 | |
|
||||
| `JobTrackingWithFinalizers` | `false` | Alpha | 1.22 | 1.22 |
|
||||
| `JobTrackingWithFinalizers` | `true` | Beta | 1.23 | |
|
||||
| `KubeletCredentialProviders` | `false` | Alpha | 1.20 | |
|
||||
| `KubeletInUserNamespace` | `false` | Alpha | 1.22 | |
|
||||
| `KubeletPodResourcesGetAllocatable` | `false` | Alpha | 1.21 | |
|
||||
|
@ -159,7 +163,8 @@ different Kubernetes components.
|
|||
| `PodAffinityNamespaceSelector` | `true` | Beta | 1.22 | |
|
||||
| `PodOverhead` | `false` | Alpha | 1.16 | 1.17 |
|
||||
| `PodOverhead` | `true` | Beta | 1.18 | |
|
||||
| `PodSecurity` | `false` | Alpha | 1.22 | |
|
||||
| `PodSecurity` | `false` | Alpha | 1.22 | 1.22 |
|
||||
| `PodSecurity` | `true` | Beta | 1.23 | |
|
||||
| `PreferNominatedNode` | `false` | Alpha | 1.21 | 1.21 |
|
||||
| `PreferNominatedNode` | `true` | Beta | 1.22 | |
|
||||
| `ProbeTerminationGracePeriod` | `false` | Alpha | 1.21 | 1.21 |
|
||||
|
@ -168,6 +173,7 @@ different Kubernetes components.
|
|||
| `ProxyTerminatingEndpoints` | `false` | Alpha | 1.22 | |
|
||||
| `QOSReserved` | `false` | Alpha | 1.11 | |
|
||||
| `ReadWriteOncePod` | `false` | Alpha | 1.22 | |
|
||||
| `RecoverVolumeExpansionFailure` | `false` | Alpha | 1.23 | |
|
||||
| `RemainingItemCount` | `false` | Alpha | 1.15 | 1.15 |
|
||||
| `RemainingItemCount` | `true` | Beta | 1.16 | |
|
||||
| `RemoveSelfLink` | `false` | Alpha | 1.16 | 1.19 |
|
||||
|
@ -183,22 +189,22 @@ different Kubernetes components.
|
|||
| `ServiceLoadBalancerClass` | `true` | Beta | 1.22 | |
|
||||
| `SizeMemoryBackedVolumes` | `false` | Alpha | 1.20 | 1.21 |
|
||||
| `SizeMemoryBackedVolumes` | `true` | Beta | 1.22 | |
|
||||
| `StatefulSetMinReadySeconds` | `false` | Alpha | 1.22 | |
|
||||
| `StatefulSetMinReadySeconds` | `false` | Alpha | 1.22 | 1.22 |
|
||||
| `StatefulSetMinReadySeconds` | `true` | Beta | 1.23 | |
|
||||
| `StorageVersionAPI` | `false` | Alpha | 1.20 | |
|
||||
| `StorageVersionHash` | `false` | Alpha | 1.14 | 1.14 |
|
||||
| `StorageVersionHash` | `true` | Beta | 1.15 | |
|
||||
| `SuspendJob` | `false` | Alpha | 1.21 | 1.21 |
|
||||
| `SuspendJob` | `true` | Beta | 1.22 | |
|
||||
| `TTLAfterFinished` | `false` | Alpha | 1.12 | 1.20 |
|
||||
| `TTLAfterFinished` | `true` | Beta | 1.21 | |
|
||||
| `TopologyAwareHints` | `false` | Alpha | 1.21 | |
|
||||
| `TopologyAwareHints` | `false` | Alpha | 1.21 | 1.22 |
|
||||
| `TopologyAwareHints` | `true` | Beta | 1.23 | |
|
||||
| `TopologyManager` | `false` | Alpha | 1.16 | 1.17 |
|
||||
| `TopologyManager` | `true` | Beta | 1.18 | |
|
||||
| `VolumeCapacityPriority` | `false` | Alpha | 1.21 | - |
|
||||
| `WinDSR` | `false` | Alpha | 1.14 | |
|
||||
| `WinOverlay` | `false` | Alpha | 1.14 | 1.19 |
|
||||
| `WinOverlay` | `true` | Beta | 1.20 | |
|
||||
| `WindowsHostProcessContainers` | `false` | Alpha | 1.22 | |
|
||||
| `WindowsHostProcessContainers` | `false` | Beta | 1.23 | |
|
||||
{{< /table >}}
|
||||
|
||||
### Feature gates for graduated or deprecated features
|
||||
|
@ -227,6 +233,7 @@ different Kubernetes components.
|
|||
| `BoundServiceAccountTokenVolume` | `false` | Alpha | 1.13 | 1.20 |
|
||||
| `BoundServiceAccountTokenVolume` | `true` | Beta | 1.21 | 1.21 |
|
||||
| `BoundServiceAccountTokenVolume` | `true` | GA | 1.22 | - |
|
||||
| `ConfigurableFSGroupPolicy` | `true` | GA | 1.23 | |
|
||||
| `CRIContainerLogRotation` | `false` | Alpha | 1.10 | 1.10 |
|
||||
| `CRIContainerLogRotation` | `true` | Beta | 1.11 | 1.20 |
|
||||
| `CRIContainerLogRotation` | `true` | GA | 1.21 | - |
|
||||
|
@ -257,6 +264,9 @@ different Kubernetes components.
|
|||
| `CSIServiceAccountToken` | `false` | Alpha | 1.20 | 1.20 |
|
||||
| `CSIServiceAccountToken` | `true` | Beta | 1.21 | 1.21 |
|
||||
| `CSIServiceAccountToken` | `true` | GA | 1.22 | |
|
||||
| `CSIVolumeFSGroupPolicy` | `false` | Alpha | 1.19 | 1.19 |
|
||||
| `CSIVolumeFSGroupPolicy` | `true` | Beta | 1.20 | 1.22 |
|
||||
| `CSIVolumeFSGroupPolicy` | `true` | GA | 1.23 | |
|
||||
| `CronJobControllerV2` | `false` | Alpha | 1.20 | 1.20 |
|
||||
| `CronJobControllerV2` | `true` | Beta | 1.21 | 1.21 |
|
||||
| `CronJobControllerV2` | `true` | GA | 1.22 | - |
|
||||
|
@ -311,6 +321,9 @@ different Kubernetes components.
|
|||
| `ExternalPolicyForExternalIP` | `true` | GA | 1.18 | - |
|
||||
| `GCERegionalPersistentDisk` | `true` | Beta | 1.10 | 1.12 |
|
||||
| `GCERegionalPersistentDisk` | `true` | GA | 1.13 | - |
|
||||
| `GenericEphemeralVolume` | `false` | Alpha | 1.19 | 1.20 |
|
||||
| `GenericEphemeralVolume` | `true` | Beta | 1.21 | 1.22 |
|
||||
| `GenericEphemeralVolume` | `true` | GA | 1.23 | - |
|
||||
| `HugePageStorageMediumSize` | `false` | Alpha | 1.18 | 1.18 |
|
||||
| `HugePageStorageMediumSize` | `true` | Beta | 1.19 | 1.21 |
|
||||
| `HugePageStorageMediumSize` | `true` | GA | 1.22 | - |
|
||||
|
@ -325,8 +338,14 @@ different Kubernetes components.
|
|||
| `ImmutableEphemeralVolumes` | `false` | Alpha | 1.18 | 1.18 |
|
||||
| `ImmutableEphemeralVolumes` | `true` | Beta | 1.19 | 1.20 |
|
||||
| `ImmutableEphemeralVolumes` | `true` | GA | 1.21 | |
|
||||
| `IngressClassNamespacedParams` | `false` | Alpha | 1.21 | 1.21 |
|
||||
| `IngressClassNamespacedParams` | `true` | Beta | 1.22 | 1.22 |
|
||||
| `IngressClassNamespacedParams` | `true` | GA | 1.23 | - |
|
||||
| `Initializers` | `false` | Alpha | 1.7 | 1.13 |
|
||||
| `Initializers` | - | Deprecated | 1.14 | - |
|
||||
| `IPv6DualStack` | `false` | Alpha | 1.15 | 1.20 |
|
||||
| `IPv6DualStack` | `true` | Beta | 1.21 | 1.22 |
|
||||
| `IPv6DualStack` | `true` | GA | 1.23 | - |
|
||||
| `KubeletConfigFile` | `false` | Alpha | 1.8 | 1.9 |
|
||||
| `KubeletConfigFile` | - | Deprecated | 1.10 | - |
|
||||
| `KubeletPluginsWatcher` | `false` | Alpha | 1.11 | 1.11 |
|
||||
|
@ -356,6 +375,7 @@ different Kubernetes components.
|
|||
| `PersistentLocalVolumes` | `false` | Alpha | 1.7 | 1.9 |
|
||||
| `PersistentLocalVolumes` | `true` | Beta | 1.10 | 1.13 |
|
||||
| `PersistentLocalVolumes` | `true` | GA | 1.14 | - |
|
||||
| `PodAndContainerStatsFromCRI` | `false` | Alpha | 1.23 | |
|
||||
| `PodDisruptionBudget` | `false` | Alpha | 1.3 | 1.4 |
|
||||
| `PodDisruptionBudget` | `true` | Beta | 1.5 | 1.20 |
|
||||
| `PodDisruptionBudget` | `true` | GA | 1.21 | - |
|
||||
|
@ -435,6 +455,9 @@ different Kubernetes components.
|
|||
| `SupportPodPidsLimit` | `true` | GA | 1.20 | - |
|
||||
| `Sysctls` | `true` | Beta | 1.11 | 1.20 |
|
||||
| `Sysctls` | `true` | GA | 1.21 | |
|
||||
| `TTLAfterFinished` | `false` | Alpha | 1.12 | 1.20 |
|
||||
| `TTLAfterFinished` | `true` | Beta | 1.21 | 1.22 |
|
||||
| `TTLAfterFinished` | `true` | GA | 1.23 | - |
|
||||
| `TaintBasedEvictions` | `false` | Alpha | 1.6 | 1.12 |
|
||||
| `TaintBasedEvictions` | `true` | Beta | 1.13 | 1.17 |
|
||||
| `TaintBasedEvictions` | `true` | GA | 1.18 | - |
|
||||
|
@ -556,10 +579,10 @@ Each feature gate is designed for enabling/disabling a specific feature:
|
|||
extended tokens by starting `kube-apiserver` with flag `--service-account-extend-token-expiration=false`.
|
||||
Check [Bound Service Account Tokens](https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/1205-bound-service-account-tokens/README.md)
|
||||
for more details.
|
||||
- `ControllerManagerLeaderMigration`: Enables Leader Migration for
|
||||
[kube-controller-manager](/docs/tasks/administer-cluster/controller-manager-leader-migration/#initial-leader-migration-configuration) and
|
||||
[cloud-controller-manager](/docs/tasks/administer-cluster/controller-manager-leader-migration/#deploy-cloud-controller-manager) which allows a cluster operator to live migrate
|
||||
controllers from the kube-controller-manager into an external controller-manager
|
||||
- `ControllerManagerLeaderMigration`: Enables Leader Migration for
|
||||
[kube-controller-manager](/docs/tasks/administer-cluster/controller-manager-leader-migration/#initial-leader-migration-configuration) and
|
||||
[cloud-controller-manager](/docs/tasks/administer-cluster/controller-manager-leader-migration/#deploy-cloud-controller-manager) which allows a cluster operator to live migrate
|
||||
controllers from the kube-controller-manager into an external controller-manager
|
||||
(e.g. the cloud-controller-manager) in an HA cluster without downtime.
|
||||
- `CPUManager`: Enable container level CPU affinity support, see
|
||||
[CPU Management Policies](/docs/tasks/administer-cluster/cpu-management-policies/).
|
||||
|
@ -615,6 +638,13 @@ Each feature gate is designed for enabling/disabling a specific feature:
|
|||
operations from the GCE-PD in-tree plugin to PD CSI plugin. Supports falling
|
||||
back to in-tree GCE plugin if a node does not have PD CSI plugin installed and
|
||||
configured. Requires CSIMigration feature flag enabled.
|
||||
- `CSIMigrationRBD`: Enables shims and translation logic to route volume
|
||||
operations from the RBD in-tree plugin to Ceph RBD CSI plugin. Requires
|
||||
CSIMigration and CSIMigrationRBD feature flags enabled and Ceph CSI plugin
|
||||
installed and configured in the cluster. This flag has been deprecated in
|
||||
favor of the
|
||||
`InTreePluginRBDUnregister` feature flag which prevents the registration of
|
||||
in-tree RBD plugin.
|
||||
- `CSIMigrationGCEComplete`: Stops registering the GCE-PD in-tree plugin in
|
||||
kubelet and volume controllers and enables shims and translation logic to
|
||||
route volume operations from the GCE-PD in-tree plugin to PD CSI plugin.
|
||||
|
@ -641,6 +671,9 @@ Each feature gate is designed for enabling/disabling a specific feature:
|
|||
CSIMigrationvSphere feature flags enabled and vSphere CSI plugin installed and
|
||||
configured on all nodes in the cluster. This flag has been deprecated in favor
|
||||
of the `InTreePluginvSphereUnregister` feature flag which prevents the registration of in-tree vsphere plugin.
|
||||
- `CSIMigrationPortworx`: Enables shims and translation logic to route volume operations
|
||||
from the Portworx in-tree plugin to Portworx CSI plugin.
|
||||
Requires Portworx CSI driver to be installed and configured in the cluster, and feature gate set `CSIMigrationPortworx=true` in kube-controller-manager and kubelet configs.
|
||||
- `CSINodeInfo`: Enable all logic related to the CSINodeInfo API object in csi.storage.k8s.io.
|
||||
- `CSIPersistentVolume`: Enable discovering and mounting volumes provisioned through a
|
||||
[CSI (Container Storage Interface)](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md)
|
||||
|
@ -669,6 +702,7 @@ Each feature gate is designed for enabling/disabling a specific feature:
|
|||
version 1 of the same controller is selected.
|
||||
- `CustomCPUCFSQuotaPeriod`: Enable nodes to change `cpuCFSQuotaPeriod` in
|
||||
[kubelet config](/docs/tasks/administer-cluster/kubelet-config-file/).
|
||||
- `CustomResourceValidationExpressions`: Enable expression language validation in CRD which will validate customer resource based on validation rules written in `x-kubernetes-validations` extension.
|
||||
- `CustomPodDNS`: Enable customizing the DNS settings for a Pod using its `dnsConfig` property.
|
||||
Check [Pod's DNS Config](/docs/concepts/services-networking/dns-pod-service/#pods-dns-config)
|
||||
for more details.
|
||||
|
@ -758,6 +792,7 @@ Each feature gate is designed for enabling/disabling a specific feature:
|
|||
and gracefully terminate pods running on the node. See
|
||||
[Graceful Node Shutdown](/docs/concepts/architecture/nodes/#graceful-node-shutdown)
|
||||
for more details.
|
||||
- `GRPCContainerProbe`: Enables the gRPC probe method for {Liveness,Readiness,Startup}Probe. See [Configure Liveness, Readiness and Startup Probes](/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-grpc-liveness-probe).
|
||||
- `HPAContainerMetrics`: Enable the `HorizontalPodAutoscaler` to scale based on
|
||||
metrics from individual containers in target pods.
|
||||
- `HPAScaleToZero`: Enables setting `minReplicas` to 0 for `HorizontalPodAutoscaler`
|
||||
|
@ -769,6 +804,8 @@ Each feature gate is designed for enabling/disabling a specific feature:
|
|||
- `HyperVContainer`: Enable
|
||||
[Hyper-V isolation](https://docs.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/hyperv-container)
|
||||
for Windows containers.
|
||||
- `IdentifyPodOS`: Allows the Pod OS field to be specified. This helps in identifying the OS of the pod
|
||||
authoritatively during the API server admission time. In Kubernetes {{< skew currentVersion >}}, the allowed values for the `pod.spec.os.name` are `windows` and `linux`.
|
||||
- `ImmutableEphemeralVolumes`: Allows for marking individual Secrets and ConfigMaps as
|
||||
immutable for better safety and performance.
|
||||
- `InTreePluginAWSUnregister`: Stops registering the aws-ebs in-tree plugin in kubelet
|
||||
|
@ -792,6 +829,13 @@ Each feature gate is designed for enabling/disabling a specific feature:
|
|||
Initializers admission plugin.
|
||||
- `IPv6DualStack`: Enable [dual stack](/docs/concepts/services-networking/dual-stack/)
|
||||
support for IPv6.
|
||||
- `JobMutableNodeSchedulingDirectives`: Allows updating node scheduling directives in
|
||||
the pod template of [Job](/docs/concepts/workloads/controllers/job).
|
||||
- `JobReadyPods`: Enables tracking the number of Pods that have a `Ready`
|
||||
[condition](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions).
|
||||
The count of `Ready` pods is recorded in the
|
||||
[status](/docs/reference/kubernetes-api/workload-resources/job-v1/#JobStatus)
|
||||
of a [Job](/docs/concepts/workloads/controllers/job) status.
|
||||
- `JobTrackingWithFinalizers`: Enables tracking [Job](/docs/concepts/workloads/controllers/job)
|
||||
completions without relying on Pods remaining in the cluster indefinitely.
|
||||
The Job controller uses Pod finalizers and a field in the Job status to keep
|
||||
|
@ -851,6 +895,8 @@ Each feature gate is designed for enabling/disabling a specific feature:
|
|||
feature which allows users to influence ReplicaSet downscaling order.
|
||||
- `PersistentLocalVolumes`: Enable the usage of `local` volume type in Pods.
|
||||
Pod affinity has to be specified if requesting a `local` volume.
|
||||
- `PodAndContainerStatsFromCRI`: Configure the kubelet to gather container and pod stats from the CRI container runtime
|
||||
rather than gathering them from cAdvisor.
|
||||
- `PodDisruptionBudget`: Enable the [PodDisruptionBudget](/docs/tasks/run-application/configure-pdb/) feature.
|
||||
- `PodAffinityNamespaceSelector`: Enable the [Pod Affinity Namespace Selector](/docs/concepts/scheduling-eviction/assign-pod-node/#namespace-selector)
|
||||
and [CrossNamespacePodAffinity](/docs/concepts/policy/resource-quotas/#cross-namespace-pod-affinity-quota) quota scope features.
|
||||
|
@ -880,6 +926,10 @@ Each feature gate is designed for enabling/disabling a specific feature:
|
|||
(memory only for now).
|
||||
- `ReadWriteOncePod`: Enables the usage of `ReadWriteOncePod` PersistentVolume
|
||||
access mode.
|
||||
- `RecoverVolumeExpansionFailure`: Enables users to edit their PVCs to smaller sizes so as they can recover from previously issued
|
||||
volume expansion failures. See
|
||||
[Recovering from Failure when Expanding Volumes](/docs/concepts/storage/persistent-volumes/#recovering-from-failure-when-expanding-volumes)
|
||||
for more details.
|
||||
- `RemainingItemCount`: Allow the API servers to show a count of remaining
|
||||
items in the response to a
|
||||
[chunking list request](/docs/reference/using-api/api-concepts/#retrieving-large-results-sets-in-chunks).
|
||||
|
|
|
@ -0,0 +1,22 @@
|
|||
---
|
||||
title: Container Runtime Interface
|
||||
id: container-runtime-interface
|
||||
date: 2021-11-24
|
||||
full_link: /docs/concepts/architecture/cri
|
||||
short_description: >
|
||||
The main protocol for the communication between the kubelet and Container Runtime.
|
||||
|
||||
aka:
|
||||
tags:
|
||||
- cri
|
||||
---
|
||||
|
||||
The main protocol for the communication between the kubelet and Container Runtime.
|
||||
|
||||
<!--more-->
|
||||
|
||||
The Kubernetes Container Runtime Interface (CRI) defines the main
|
||||
[gRPC](https://grpc.io) protocol for the communication between the
|
||||
[cluster components](/docs/concepts/overview/components/#node-components)
|
||||
{{< glossary_tooltip text="kubelet" term_id="kubelet" >}} and
|
||||
{{< glossary_tooltip text="container runtime" term_id="container-runtime" >}}.
|
|
@ -4,14 +4,14 @@ id: flexvolume
|
|||
date: 2018-06-25
|
||||
full_link: /docs/concepts/storage/volumes/#flexvolume
|
||||
short_description: >
|
||||
FlexVolume is an interface for creating out-of-tree volume plugins. The {{< glossary_tooltip text="Container Storage Interface" term_id="csi" >}} is a newer interface which addresses several problems with FlexVolumes.
|
||||
FlexVolume is a deprecated interface for creating out-of-tree volume plugins. The {{< glossary_tooltip text="Container Storage Interface" term_id="csi" >}} is a newer interface that addresses several problems with FlexVolume.
|
||||
|
||||
|
||||
aka:
|
||||
tags:
|
||||
- storage
|
||||
---
|
||||
FlexVolume is an interface for creating out-of-tree volume plugins. The {{< glossary_tooltip text="Container Storage Interface" term_id="csi" >}} is a newer interface which addresses several problems with FlexVolumes.
|
||||
FlexVolume is a deprecated interface for creating out-of-tree volume plugins. The {{< glossary_tooltip text="Container Storage Interface" term_id="csi" >}} is a newer interface that addresses several problems with FlexVolume.
|
||||
|
||||
<!--more-->
|
||||
|
||||
|
|
|
@ -434,4 +434,20 @@ policies to apply when validating a submitted Pod. Note that warnings are also d
|
|||
or updating objects that contain Pod templates, such as Deployments, Jobs, StatefulSets, etc.
|
||||
|
||||
See [Enforcing Pod Security at the Namespace Level](/docs/concepts/security/pod-security-admission)
|
||||
for more information.
|
||||
for more information.
|
||||
|
||||
## seccomp.security.alpha.kubernetes.io/pod (deprecated) {#seccomp-security-alpha-kubernetes-io-pod}
|
||||
|
||||
This annotation has been deprecated since Kubernetes v1.19 and will become non-functional in v1.25.
|
||||
To specify security settings for a Pod, include the `securityContext` field in the Pod specification.
|
||||
The [`securityContext`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context) field within a Pod's `.spec` defines pod-level security attributes.
|
||||
When you [specify the security context for a Pod](/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod),
|
||||
the settings you specify apply to all containers in that Pod.
|
||||
|
||||
## container.seccomp.security.alpha.kubernetes.io/[NAME] {#container-seccomp-security-alpha-kubernetes-io}
|
||||
|
||||
This annotation has been deprecated since Kubernetes v1.19 and will become non-functional in v1.25.
|
||||
The tutorial [Restrict a Container's Syscalls with seccomp](/docs/tutorials/clusters/seccomp/) takes
|
||||
you through the steps you follow to apply a seccomp profile to a Pod or to one of
|
||||
its containers. That tutorial covers the supported mechanism for configuring seccomp in Kubernetes,
|
||||
based on setting `securityContext` within the Pod's `.spec`.
|
|
@ -20,8 +20,7 @@ by implementing one or more of these extension points.
|
|||
|
||||
You can specify scheduling profiles by running `kube-scheduler --config <filename>`,
|
||||
using the
|
||||
KubeSchedulerConfiguration ([v1beta1](/docs/reference/config-api/kube-scheduler-config.v1beta1/)
|
||||
or [v1beta2](/docs/reference/config-api/kube-scheduler-config.v1beta2/))
|
||||
KubeSchedulerConfiguration ([v1beta2](/docs/reference/config-api/kube-scheduler-config.v1beta2/))
|
||||
struct.
|
||||
|
||||
A minimal configuration looks as follows:
|
||||
|
@ -78,6 +77,8 @@ extension points:
|
|||
least one bind plugin is required.
|
||||
1. `postBind`: This is an informational extension point that is called after
|
||||
a Pod has been bound.
|
||||
1. `multiPoint`: This is a config-only field that allows plugins to be enabled
|
||||
or disabled for all of their applicable extension points simultaneously.
|
||||
|
||||
For each extension point, you could disable specific [default plugins](#scheduling-plugins)
|
||||
or enable your own. For example:
|
||||
|
@ -179,30 +180,6 @@ that are not enabled by default:
|
|||
volume limits can be satisfied for the node.
|
||||
Extension points: `filter`.
|
||||
|
||||
The following plugins are deprecated and can only be enabled in a `v1beta1`
|
||||
configuration:
|
||||
|
||||
- `NodeResourcesLeastAllocated`: Favors nodes that have a low allocation of
|
||||
resources.
|
||||
Extension points: `score`.
|
||||
- `NodeResourcesMostAllocated`: Favors nodes that have a high allocation of
|
||||
resources.
|
||||
Extension points: `score`.
|
||||
- `RequestedToCapacityRatio`: Favor nodes according to a configured function of
|
||||
the allocated resources.
|
||||
Extension points: `score`.
|
||||
- `NodeLabel`: Filters and / or scores a node according to configured
|
||||
{{< glossary_tooltip text="label(s)" term_id="label" >}}.
|
||||
Extension points: `filter`, `score`.
|
||||
- `ServiceAffinity`: Checks that Pods that belong to a
|
||||
{{< glossary_tooltip term_id="service" >}} fit in a set of nodes defined by
|
||||
configured labels. This plugin also favors spreading the Pods belonging to a
|
||||
Service across nodes.
|
||||
Extension points: `preFilter`, `filter`, `score`.
|
||||
- `NodePreferAvoidPods`: Prioritizes nodes according to the node annotation
|
||||
`scheduler.alpha.kubernetes.io/preferAvoidPods`.
|
||||
Extension points: `score`.
|
||||
|
||||
### Multiple profiles
|
||||
|
||||
You can configure `kube-scheduler` to run more than one profile.
|
||||
|
@ -251,6 +228,186 @@ the same configuration parameters (if applicable). This is because the scheduler
|
|||
only has one pending pods queue.
|
||||
{{< /note >}}
|
||||
|
||||
### Plugins that apply to multiple extension points {#multipoint}
|
||||
|
||||
Starting from `kubescheduler.config.k8s.io/v1beta3`, there is an additional field in the
|
||||
profile config, `multiPoint`, which allows for easily enabling or disabling a plugin
|
||||
across several extension points. The intent of `multiPoint` config is to simplify the
|
||||
configuration needed for users and administrators when using custom profiles.
|
||||
|
||||
Consider a plugin, `MyPlugin`, which implements the `preScore`, `score`, `preFilter`,
|
||||
and `filter` extension points. To enable `MyPlugin` for all its available extension
|
||||
points, the profile config looks like:
|
||||
|
||||
```yaml
|
||||
apiVersion: kubescheduler.config.k8s.io/v1beta3
|
||||
kind: KubeSchedulerConfiguration
|
||||
profiles:
|
||||
- schedulerName: multipoint-scheduler
|
||||
plugins:
|
||||
multiPoint:
|
||||
enabled:
|
||||
- name: MyPlugin
|
||||
```
|
||||
|
||||
This would equate to manually enabling `MyPlugin` for all of its extension
|
||||
points, like so:
|
||||
|
||||
```yaml
|
||||
apiVersion: kubescheduler.config.k8s.io/v1beta3
|
||||
kind: KubeSchedulerConfiguration
|
||||
profiles:
|
||||
- schedulerName: non-multipoint-scheduler
|
||||
plugins:
|
||||
preScore:
|
||||
enabled:
|
||||
- name: MyPlugin
|
||||
score:
|
||||
enabled:
|
||||
- name: MyPlugin
|
||||
preFilter:
|
||||
enabled:
|
||||
- name: MyPlugin
|
||||
filter:
|
||||
enabled:
|
||||
- name: MyPlugin
|
||||
```
|
||||
|
||||
One benefit of using `multiPoint` here is that if `MyPlugin` implements another
|
||||
extension point in the future, the `multiPoint` config will automatically enable it
|
||||
for the new extension.
|
||||
|
||||
Specific extension points can be excluded from `MultiPoint` expansion using
|
||||
the `disabled` field for that extension point. This works with disabling default
|
||||
plugins, non-default plugins, or with the wildcard (`'*'`) to disable all plugins.
|
||||
An example of this, disabling `Score` and `PreScore`, would be:
|
||||
|
||||
```yaml
|
||||
apiVersion: kubescheduler.config.k8s.io/v1beta3
|
||||
kind: KubeSchedulerConfiguration
|
||||
profiles:
|
||||
- schedulerName: non-multipoint-scheduler
|
||||
plugins:
|
||||
multiPoint:
|
||||
enabled:
|
||||
- name: 'MyPlugin'
|
||||
preScore:
|
||||
disabled:
|
||||
- name: '*'
|
||||
score:
|
||||
disabled:
|
||||
- name: '*'
|
||||
```
|
||||
|
||||
In `v1beta3`, all [default plugins](#scheduling-plugins) are enabled internally through `MultiPoint`.
|
||||
However, individual extension points are still available to allow flexible
|
||||
reconfiguration of the default values (such as ordering and Score weights). For
|
||||
example, consider two Score plugins `DefaultScore1` and `DefaultScore2`, each with
|
||||
a weight of `1`. They can be reordered with different weights like so:
|
||||
|
||||
```yaml
|
||||
apiVersion: kubescheduler.config.k8s.io/v1beta3
|
||||
kind: KubeSchedulerConfiguration
|
||||
profiles:
|
||||
- schedulerName: multipoint-scheduler
|
||||
plugins:
|
||||
score:
|
||||
enabled:
|
||||
- name: 'DefaultScore2'
|
||||
weight: 5
|
||||
```
|
||||
|
||||
In this example, it's unnecessary to specify the plugins in `MultiPoint` explicitly
|
||||
because they are default plugins. And the only plugin specified in `Score` is `DefaultScore2`.
|
||||
This is because plugins set through specific extension points will always take precedence
|
||||
over `MultiPoint` plugins. So, this snippet essentially re-orders the two plugins
|
||||
without needing to specify both of them.
|
||||
|
||||
The general hierarchy for precedence when configuring `MultiPoint` plugins is as follows:
|
||||
1. Specific extension points run first, and their settings override whatever is set elsewhere
|
||||
2. Plugins manually configured through `MultiPoint` and their settings
|
||||
3. Default plugins and their default settings
|
||||
|
||||
To demonstrate the above hierarchy, the following example is based on these plugins:
|
||||
|Plugin|Extension Points|
|
||||
|---|---|
|
||||
|`DefaultQueueSort`|`QueueSort`|
|
||||
|`CustomQueueSort`|`QueueSort`|
|
||||
|`DefaultPlugin1`|`Score`, `Filter`|
|
||||
|`DefaultPlugin2`|`Score`|
|
||||
|`CustomPlugin1`|`Score`, `Filter`|
|
||||
|`CustomPlugin2`|`Score`, `Filter`|
|
||||
|
||||
A valid sample configuration for these plugins would be:
|
||||
|
||||
```yaml
|
||||
apiVersion: kubescheduler.config.k8s.io/v1beta3
|
||||
kind: KubeSchedulerConfiguration
|
||||
profiles:
|
||||
- schedulerName: multipoint-scheduler
|
||||
plugins:
|
||||
multiPoint:
|
||||
enabled:
|
||||
- name: 'CustomQueueSort'
|
||||
- name: 'CustomPlugin1'
|
||||
weight: 3
|
||||
- name: 'CustomPlugin2'
|
||||
disabled:
|
||||
- name: 'DefaultQueueSort'
|
||||
filter:
|
||||
disabled:
|
||||
- name: 'DefaultPlugin1'
|
||||
score:
|
||||
enabled:
|
||||
- name: 'DefaultPlugin2'
|
||||
```
|
||||
|
||||
Note that there is no error for re-declaring a `MultiPoint` plugin in a specific
|
||||
extension point. The re-declaration is ignored (and logged), as specific extension points
|
||||
take precedence.
|
||||
|
||||
Besides keeping most of the config in one spot, this sample does a few things:
|
||||
* Enables the custom `queueSort` plugin and disables the default one
|
||||
* Enables `CustomPlugin1` and `CustomPlugin2`, which will run first for all of their extension points
|
||||
* Disables `DefaultPlugin1`, but only for `filter`
|
||||
* Reorders `DefaultPlugin2` to run first in `score` (even before the custom plugins)
|
||||
|
||||
In versions of the config before `v1beta3`, without `multiPoint`, the above snippet would equate to this:
|
||||
```yaml
|
||||
apiVersion: kubescheduler.config.k8s.io/v1beta2
|
||||
kind: KubeSchedulerConfiguration
|
||||
profiles:
|
||||
- schedulerName: multipoint-scheduler
|
||||
plugins:
|
||||
|
||||
# Disable the default QueueSort plugin
|
||||
queueSort:
|
||||
enabled:
|
||||
- name: 'CustomQueueSort'
|
||||
disabled:
|
||||
- name: 'DefaultQueueSort'
|
||||
|
||||
# Enable custom Filter plugins
|
||||
filter:
|
||||
enabled:
|
||||
- name: 'CustomPlugin1'
|
||||
- name: 'CustomPlugin2'
|
||||
- name: 'DefaultPlugin2'
|
||||
disabled:
|
||||
- name: 'DefaultPlugin1'
|
||||
|
||||
# Enable and reorder custom score plugins
|
||||
score:
|
||||
enabled:
|
||||
- name: 'DefaultPlugin2'
|
||||
weight: 1
|
||||
- name: 'DefaultPlugin1'
|
||||
weight: 3
|
||||
```
|
||||
|
||||
While this is a complicated example, it demonstrates the flexibility of `MultiPoint` config
|
||||
as well as its seamless integration with the existing methods for configuring extension points.
|
||||
|
||||
## Scheduler configuration migrations
|
||||
|
||||
{{< tabs name="tab_with_md" >}}
|
||||
|
@ -285,7 +442,13 @@ only has one pending pods queue.
|
|||
* A plugin enabled in a v1beta2 configuration file takes precedence over the default configuration for that plugin.
|
||||
|
||||
* Invalid `host` or `port` configured for scheduler healthz and metrics bind address will cause validation failure.
|
||||
{{% /tab %}}
|
||||
|
||||
{{% tab name="v1beta2 → v1beta3" %}}
|
||||
* Three plugins' weight are increased by default:
|
||||
* `InterPodAffinity` from 1 to 2
|
||||
* `NodeAffinity` from 1 to 2
|
||||
* `TaintToleration` from 1 to 3
|
||||
{{% /tab %}}
|
||||
{{< /tabs >}}
|
||||
|
||||
|
|
|
@ -1,104 +1,16 @@
|
|||
---
|
||||
title: Scheduling Policies
|
||||
content_type: concept
|
||||
weight: 10
|
||||
sitemap:
|
||||
priority: 0.2 # Scheduling priorities are deprecated
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
A scheduling Policy can be used to specify the *predicates* and *priorities*
|
||||
that the {{< glossary_tooltip text="kube-scheduler" term_id="kube-scheduler" >}}
|
||||
runs to [filter and score nodes](/docs/concepts/scheduling-eviction/kube-scheduler/#kube-scheduler-implementation),
|
||||
respectively.
|
||||
In Kubernetes versions before v1.23, a scheduling policy can be used to specify the *predicates* and *priorities* process. For example, you can set a scheduling policy by
|
||||
running `kube-scheduler --policy-config-file <filename>` or `kube-scheduler --policy-configmap <ConfigMap>`.
|
||||
|
||||
You can set a scheduling policy by running
|
||||
`kube-scheduler --policy-config-file <filename>` or
|
||||
`kube-scheduler --policy-configmap <ConfigMap>`
|
||||
and using the [Policy type](/docs/reference/config-api/kube-scheduler-policy-config.v1/).
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## Predicates
|
||||
|
||||
The following *predicates* implement filtering:
|
||||
|
||||
- `PodFitsHostPorts`: Checks if a Node has free ports (the network protocol kind)
|
||||
for the Pod ports the Pod is requesting.
|
||||
|
||||
- `PodFitsHost`: Checks if a Pod specifies a specific Node by its hostname.
|
||||
|
||||
- `PodFitsResources`: Checks if the Node has free resources (eg, CPU and Memory)
|
||||
to meet the requirement of the Pod.
|
||||
|
||||
- `MatchNodeSelector`: Checks if a Pod's Node {{< glossary_tooltip term_id="selector" >}}
|
||||
matches the Node's {{< glossary_tooltip text="label(s)" term_id="label" >}}.
|
||||
|
||||
- `NoVolumeZoneConflict`: Evaluate if the {{< glossary_tooltip text="Volumes" term_id="volume" >}}
|
||||
that a Pod requests are available on the Node, given the failure zone restrictions for
|
||||
that storage.
|
||||
|
||||
- `NoDiskConflict`: Evaluates if a Pod can fit on a Node due to the volumes it requests,
|
||||
and those that are already mounted.
|
||||
|
||||
- `MaxCSIVolumeCount`: Decides how many {{< glossary_tooltip text="CSI" term_id="csi" >}}
|
||||
volumes should be attached, and whether that's over a configured limit.
|
||||
|
||||
- `PodToleratesNodeTaints`: checks if a Pod's {{< glossary_tooltip text="tolerations" term_id="toleration" >}}
|
||||
can tolerate the Node's {{< glossary_tooltip text="taints" term_id="taint" >}}.
|
||||
|
||||
- `CheckVolumeBinding`: Evaluates if a Pod can fit due to the volumes it requests.
|
||||
This applies for both bound and unbound
|
||||
{{< glossary_tooltip text="PVCs" term_id="persistent-volume-claim" >}}.
|
||||
|
||||
## Priorities
|
||||
|
||||
The following *priorities* implement scoring:
|
||||
|
||||
- `SelectorSpreadPriority`: Spreads Pods across hosts, considering Pods that
|
||||
belong to the same {{< glossary_tooltip text="Service" term_id="service" >}},
|
||||
{{< glossary_tooltip term_id="statefulset" >}} or
|
||||
{{< glossary_tooltip term_id="replica-set" >}}.
|
||||
|
||||
- `InterPodAffinityPriority`: Implements preferred
|
||||
[inter pod affininity and antiaffinity](/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity).
|
||||
|
||||
- `LeastRequestedPriority`: Favors nodes with fewer requested resources. In other
|
||||
words, the more Pods that are placed on a Node, and the more resources those
|
||||
Pods use, the lower the ranking this policy will give.
|
||||
|
||||
- `MostRequestedPriority`: Favors nodes with most requested resources. This policy
|
||||
will fit the scheduled Pods onto the smallest number of Nodes needed to run your
|
||||
overall set of workloads.
|
||||
|
||||
- `RequestedToCapacityRatioPriority`: Creates a requestedToCapacity based ResourceAllocationPriority using default resource scoring function shape.
|
||||
|
||||
- `BalancedResourceAllocation`: Favors nodes with balanced resource usage.
|
||||
|
||||
- `NodePreferAvoidPodsPriority`: Prioritizes nodes according to the node annotation
|
||||
`scheduler.alpha.kubernetes.io/preferAvoidPods`. You can use this to hint that
|
||||
two different Pods shouldn't run on the same Node.
|
||||
|
||||
- `NodeAffinityPriority`: Prioritizes nodes according to node affinity scheduling
|
||||
preferences indicated in PreferredDuringSchedulingIgnoredDuringExecution.
|
||||
You can read more about this in [Assigning Pods to Nodes](/docs/concepts/scheduling-eviction/assign-pod-node/).
|
||||
|
||||
- `TaintTolerationPriority`: Prepares the priority list for all the nodes, based on
|
||||
the number of intolerable taints on the node. This policy adjusts a node's rank
|
||||
taking that list into account.
|
||||
|
||||
- `ImageLocalityPriority`: Favors nodes that already have the
|
||||
{{< glossary_tooltip text="container images" term_id="image" >}} for that
|
||||
Pod cached locally.
|
||||
|
||||
- `ServiceSpreadingPriority`: For a given Service, this policy aims to make sure that
|
||||
the Pods for the Service run on different nodes. It favours scheduling onto nodes
|
||||
that don't have Pods for the service already assigned there. The overall outcome is
|
||||
that the Service becomes more resilient to a single Node failure.
|
||||
|
||||
- `EqualPriority`: Gives an equal weight of one to all nodes.
|
||||
|
||||
- `EvenPodsSpreadPriority`: Implements preferred
|
||||
[pod topology spread constraints](/docs/concepts/workloads/pods/pod-topology-spread-constraints/).
|
||||
This scheduling policy is not supported since Kubernetes v1.23. Associated flags `policy-config-file`, `policy-configmap`, `policy-configmap-namespace` and `use-legacy-policy-config` are also not supported. Instead, use the [Scheduler Configuration](/docs/reference/scheduling/config/) to achieve similar behavior.
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
|
@ -106,4 +18,3 @@ The following *priorities* implement scoring:
|
|||
* Learn about [kube-scheduler Configuration](/docs/reference/scheduling/config/)
|
||||
* Read the [kube-scheduler configuration reference (v1beta2)](/docs/reference/config-api/kube-scheduler-config.v1beta2)
|
||||
* Read the [kube-scheduler Policy reference (v1)](/docs/reference/config-api/kube-scheduler-policy-config.v1/)
|
||||
|
||||
|
|
|
@ -7,9 +7,9 @@ min-kubernetes-server-version: 1.21
|
|||
|
||||
<!-- overview -->
|
||||
|
||||
{{< feature-state for_k8s_version="v1.21" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.23" state="stable" >}}
|
||||
|
||||
Your Kubernetes cluster can run in [dual-stack](/docs/concepts/services-networking/dual-stack/) networking mode, which means that cluster networking lets you use either address family. In a dual-stack cluster, the control plane can assign both an IPv4 address and an IPv6 address to a single {{< glossary_tooltip text="Pod" term_id="pod" >}} or a {{< glossary_tooltip text="Service" term_id="service" >}}.
|
||||
Your Kubernetes cluster includes [dual-stack](/docs/concepts/services-networking/dual-stack/) networking, which means that cluster networking lets you use either address family. In a cluster, the control plane can assign both an IPv4 address and an IPv6 address to a single {{< glossary_tooltip text="Pod" term_id="pod" >}} or a {{< glossary_tooltip text="Service" term_id="service" >}}.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
|
@ -28,10 +28,8 @@ The size of the IP address allocations should be suitable for the number of Pods
|
|||
Services that you are planning to run.
|
||||
|
||||
{{< note >}}
|
||||
If you are upgrading an existing cluster then, by default, the `kubeadm upgrade` command
|
||||
changes the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
`IPv6DualStack` to `true` if that is not already enabled.
|
||||
However, `kubeadm` does not support making modifications to the pod IP address range
|
||||
If you are upgrading an existing cluster with the `kubeadm upgrade` command,
|
||||
`kubeadm` does not support making modifications to the pod IP address range
|
||||
(“cluster CIDR”) nor to the cluster's Service address range (“Service CIDR”).
|
||||
{{< /note >}}
|
||||
|
||||
|
@ -51,8 +49,6 @@ To make things clearer, here is an example kubeadm [configuration file](https://
|
|||
---
|
||||
apiVersion: kubeadm.k8s.io/v1beta3
|
||||
kind: ClusterConfiguration
|
||||
featureGates:
|
||||
IPv6DualStack: true
|
||||
networking:
|
||||
podSubnet: 10.244.0.0/16,2001:db8:42:0::/56
|
||||
serviceSubnet: 10.96.0.0/16,2001:db8:42:1::/112
|
||||
|
@ -75,7 +71,7 @@ Run kubeadm to initiate the dual-stack control plane node:
|
|||
kubeadm init --config=kubeadm-config.yaml
|
||||
```
|
||||
|
||||
Currently, the kube-controller-manager flags `--node-cidr-mask-size-ipv4|--node-cidr-mask-size-ipv6` are being left with default values. See [enable IPv4/IPv6 dual stack](/docs/concepts/services-networking/dual-stack#enable-ipv4ipv6-dual-stack).
|
||||
The kube-controller-manager flags `--node-cidr-mask-size-ipv4|--node-cidr-mask-size-ipv6` are set with default values. See [configure IPv4/IPv6 dual stack](/docs/concepts/services-networking/dual-stack#configure-ipv4-ipv6-dual-stack).
|
||||
|
||||
{{< note >}}
|
||||
The `--apiserver-advertise-address` flag does not support dual-stack.
|
||||
|
@ -132,23 +128,15 @@ kubeadm join --config=kubeadm-config.yaml
|
|||
### Create a single-stack cluster
|
||||
|
||||
{{< note >}}
|
||||
Enabling the dual-stack feature doesn't mean that you need to use dual-stack addressing.
|
||||
Dual-stack support doesn't mean that you need to use dual-stack addressing.
|
||||
You can deploy a single-stack cluster that has the dual-stack networking feature enabled.
|
||||
{{< /note >}}
|
||||
|
||||
In 1.21 the `IPv6DualStack` feature is Beta and the feature gate is defaulted to `true`. To disable the feature you must configure the feature gate to `false`. Note that once the feature is GA, the feature gate will be removed.
|
||||
|
||||
```shell
|
||||
kubeadm init --feature-gates IPv6DualStack=false
|
||||
```
|
||||
|
||||
To make things more clear, here is an example kubeadm [configuration file](https://pkg.go.dev/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta3) `kubeadm-config.yaml` for the single-stack control plane node.
|
||||
|
||||
```yaml
|
||||
apiVersion: kubeadm.k8s.io/v1beta3
|
||||
kind: ClusterConfiguration
|
||||
featureGates:
|
||||
IPv6DualStack: false
|
||||
networking:
|
||||
podSubnet: 10.244.0.0/16
|
||||
serviceSubnet: 10.96.0.0/16
|
||||
|
|
|
@ -375,6 +375,7 @@ For [flex-volume support](https://github.com/kubernetes/community/blob/ab55d85/c
|
|||
Kubernetes components like the kubelet and kube-controller-manager use the default path of
|
||||
`/usr/libexec/kubernetes/kubelet-plugins/volume/exec/`, yet the flex-volume directory _must be writeable_
|
||||
for the feature to work.
|
||||
(**Note**: FlexVolume was deprecated in the Kubernetes v1.23 release)
|
||||
|
||||
To workaround this issue you can configure the flex-volume directory using the kubeadm
|
||||
[configuration file](/docs/reference/config-api/kubeadm-config.v1beta3/).
|
||||
|
|
|
@ -153,6 +153,37 @@ section refers to several key workload enablers and how they map to Windows.
|
|||
* `emptyDir` volumes
|
||||
* Named pipe host mounts
|
||||
* Resource limits
|
||||
* OS field:
|
||||
{{< feature-state for_k8s_version="v1.23" state="alpha" >}}
|
||||
`.spec.os.name` should be set to `windows` to indicate that the current Pod uses Windows containers.
|
||||
`IdentifyPodOS` feature gate needs to be enabled for this field to be recognized and used by control plane
|
||||
components and kubelet.
|
||||
{{< note >}}
|
||||
If the `IdentifyPodOS` feature gate is enabled and you set the `.spec.os.name` field to `windows`, you must not set the following fields in the `.spec` of that Pod:
|
||||
* `spec.hostPID`
|
||||
* `spec.hostIPC`
|
||||
* `spec.securityContext.seLinuxOptions`
|
||||
* `spec.securityContext.seccompProfile`
|
||||
* `spec.securityContext.fsGroup`
|
||||
* `spec.securityContext.fsGroupChangePolicy`
|
||||
* `spec.securityContext.sysctls`
|
||||
* `spec.shareProcessNamespace`
|
||||
* `spec.securityContext.runAsUser`
|
||||
* `spec.securityContext.runAsGroup`
|
||||
* `spec.securityContext.supplementalGroups`
|
||||
* `spec.containers[*].securityContext.seLinuxOptions`
|
||||
* `spec.containers[*].securityContext.seccompProfile`
|
||||
* `spec.containers[*].securityContext.capabilities`
|
||||
* `spec.containers[*].securityContext.readOnlyRootFilesystem`
|
||||
* `spec.containers[*].securityContext.privileged`
|
||||
* `spec.containers[*].securityContext.allowPrivilegeEscalation`
|
||||
* `spec.containers[*].securityContext.procMount`
|
||||
* `spec.containers[*].securityContext.runAsUser`
|
||||
* `spec.containers[*].securityContext.runAsGroup`
|
||||
|
||||
Note: In this table, wildcards (*) indicate all elements in a list. For example, spec.containers[*].securityContext refers to the Security Context object for all defined containers. If not, Pod API validation would fail causing admission failures.
|
||||
{{< /note >}}
|
||||
|
||||
* [Workload resources](/docs/concepts/workloads/controllers/) including:
|
||||
* ReplicaSet
|
||||
* Deployments
|
||||
|
@ -340,9 +371,7 @@ Kubernetes on Windows does not support single-stack "IPv6-only" networking. Howe
|
|||
dual-stack IPv4/IPv6 networking for pods and nodes with single-family services
|
||||
is supported.
|
||||
|
||||
You can enable IPv4/IPv6 dual-stack networking for `l2bridge` networks using the
|
||||
`IPv6DualStack` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
|
||||
See [enable IPv4/IPv6 dual stack](/docs/concepts/services-networking/dual-stack#enable-ipv4ipv6-dual-stack) for more details.
|
||||
You can use IPv4/IPv6 dual-stack networking with `l2bridge` networks. See [configure IPv4/IPv6 dual stack](/docs/concepts/services-networking/dual-stack#configure-ipv4-ipv6-dual-stack) for more details.
|
||||
|
||||
{{< note >}}
|
||||
Overlay (VXLAN) networks on Windows do not support dual-stack networking.
|
||||
|
|
|
@ -160,7 +160,21 @@ Users today need to use some combination of taints and node selectors in order t
|
|||
keep Linux and Windows workloads on their respective OS-specific nodes.
|
||||
This likely imposes a burden only on Windows users. The recommended approach is outlined below,
|
||||
with one of its main goals being that this approach should not break compatibility for existing Linux workloads.
|
||||
{{< note >}}
|
||||
If the `IdentifyPodOS` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is
|
||||
enabled, you can (and should) set `.spec.os.name` for a Pod to indicate the operating system
|
||||
that the containers in that Pod are designed for. For Pods that run Linux containers, set
|
||||
`.spec.os.name` to `linux`. For Pods that run Windows containers, set `.spec.os.name`
|
||||
to Windows.
|
||||
|
||||
The scheduler does not use the value of `.spec.os.name` when assigning Pods to nodes. You should
|
||||
use normal Kubernetes mechanisms for
|
||||
[assigning pods to nodes](/docs/concepts/scheduling-eviction/assign-pod-node/)
|
||||
to ensure that the control plane for your cluster places pods onto nodes that are running the
|
||||
appropriate operating system.
|
||||
no effect on the scheduling of the Windows pods, so taints and tolerations and node selectors are still required
|
||||
to ensure that the Windows pods land onto appropriate Windows nodes.
|
||||
{{< /note >}}
|
||||
### Ensuring OS-specific workloads land on the appropriate container host
|
||||
|
||||
Users can ensure Windows containers can be scheduled on the appropriate host using Taints and Tolerations.
|
||||
|
|
|
@ -60,6 +60,13 @@ duration as `--node-status-update-frequency`.
|
|||
|
||||
The behavior of the static policy can be fine-tuned using the `--cpu-manager-policy-options` flag.
|
||||
The flag takes a comma-separated list of `key=value` policy options.
|
||||
This feature can be disabled completely using the `CPUManagerPolicyOptions` feature gate.
|
||||
|
||||
The policy options are split into two groups: alpha quality (hidden by default) and beta quality
|
||||
(visible by default). The groups are guarded respectively by the `CPUManagerPolicyAlphaOptions`
|
||||
and `CPUManagerPolicyBetaOptions` feature gates. Diverging from the Kubernetes standard, these
|
||||
feature gates guard groups of options, because it would have been too cumbersome to add a feature
|
||||
gate for each individual option.
|
||||
|
||||
### None policy
|
||||
|
||||
|
@ -218,8 +225,17 @@ equal to one. The `nginx` container is granted 2 exclusive CPUs.
|
|||
|
||||
#### Static policy options
|
||||
|
||||
You can toggle groups of options on and off based upon their maturity level
|
||||
using the following feature gates:
|
||||
* `CPUManagerPolicyBetaOptions` default enabled. Disable to hide beta-level options.
|
||||
* `CPUManagerPolicyAlphaOptions` default disabled. Enable to show alpha-level options.
|
||||
You will still have to enable each option using the `CPUManagerPolicyOptions` kubelet option.
|
||||
|
||||
The following policy options exist for the static `CPUManager` policy:
|
||||
* `full-pcpus-only` (beta, visible by default)
|
||||
* `distribute-cpus-across-numa` (alpha, hidden by default)
|
||||
|
||||
If the `full-pcpus-only` policy option is specified, the static policy will always allocate full physical cores.
|
||||
You can enable this option by adding `full-pcups-only=true` to the CPUManager policy options.
|
||||
By default, without this option, the static policy allocates CPUs using a topology-aware best-fit allocation.
|
||||
On SMT enabled systems, the policy can allocate individual virtual cores, which correspond to hardware threads.
|
||||
This can lead to different containers sharing the same physical cores; this behaviour in turn contributes
|
||||
|
@ -227,3 +243,24 @@ to the [noisy neighbours problem](https://en.wikipedia.org/wiki/Cloud_computing_
|
|||
With the option enabled, the pod will be admitted by the kubelet only if the CPU request of all its containers
|
||||
can be fulfilled by allocating full physical cores.
|
||||
If the pod does not pass the admission, it will be put in Failed state with the message `SMTAlignmentError`.
|
||||
|
||||
If the `distribute-cpus-across-numa`policy option is specified, the static
|
||||
policy will evenly distribute CPUs across NUMA nodes in cases where more than
|
||||
one NUMA node is required to satisfy the allocation.
|
||||
By default, the `CPUManager` will pack CPUs onto one NUMA node until it is
|
||||
filled, with any remaining CPUs simply spilling over to the next NUMA node.
|
||||
This can cause undesired bottlenecks in parallel code relying on barriers (and
|
||||
similar synchronization primitives), as this type of code tends to run only as
|
||||
fast as its slowest worker (which is slowed down by the fact that fewer CPUs
|
||||
are available on at least one NUMA node).
|
||||
By distributing CPUs evenly across NUMA nodes, application developers can more
|
||||
easily ensure that no single worker suffers from NUMA effects more than any
|
||||
other, improving the overall performance of these types of applications.
|
||||
|
||||
The `full-pcpus-only` option can be enabled by adding `full-pcups-only=true` to
|
||||
the CPUManager policy options.
|
||||
Likewise, the `distribute-cpus-across-numa` option can be enabled by adding
|
||||
`distribute-cpus-across-numa=true` to the CPUManager policy options.
|
||||
When both are set, they are "additive" in the sense that CPUs will be
|
||||
distributed across NUMA nodes in chunks of full-pcpus rather than individual
|
||||
cores.
|
||||
|
|
|
@ -1,40 +0,0 @@
|
|||
---
|
||||
reviewers:
|
||||
- robscott
|
||||
title: Enabling Topology Aware Hints
|
||||
content_type: task
|
||||
min-kubernetes-server-version: 1.21
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
{{< feature-state for_k8s_version="v1.21" state="alpha" >}}
|
||||
|
||||
_Topology Aware Hints_ enable topology aware routing with topology hints
|
||||
included in {{< glossary_tooltip text="EndpointSlices" term_id="endpoint-slice" >}}.
|
||||
This approach tries to keep traffic close to where it originated from;
|
||||
you might do this to reduce costs, or to improve network performance.
|
||||
|
||||
## {{% heading "prerequisites" %}}
|
||||
|
||||
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
|
||||
|
||||
The following prerequisite is needed in order to enable topology aware hints:
|
||||
|
||||
* Configure the {{< glossary_tooltip text="kube-proxy" term_id="kube-proxy" >}} to run in
|
||||
iptables mode or IPVS mode
|
||||
* Ensure that you have not disabled EndpointSlices
|
||||
|
||||
## Enable Topology Aware Hints
|
||||
|
||||
To enable service topology hints, enable the `TopologyAwareHints` [feature
|
||||
gate](/docs/reference/command-line-tools-reference/feature-gates/) for the
|
||||
kube-apiserver, kube-controller-manager, and kube-proxy:
|
||||
|
||||
```
|
||||
--feature-gates="TopologyAwareHints=true"
|
||||
```
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
* Read about [Topology Aware Hints](/docs/concepts/services-networking/topology-aware-hints) for Services
|
||||
* Read [Connecting Applications with Services](/docs/concepts/services-networking/connect-applications-service/)
|
|
@ -13,6 +13,17 @@ This document describes how to configure and use kernel parameters within a
|
|||
Kubernetes cluster using the {{< glossary_tooltip term_id="sysctl" >}}
|
||||
interface.
|
||||
|
||||
{{< note >}}
|
||||
Starting from Kubernetes version 1.23, the kubelet supports the use of either `/` or `.`
|
||||
as separators for sysctl names.
|
||||
For example, you can represent the same sysctl name as `kernel.shm_rmid_forced` using a
|
||||
period as the separator, or as `kernel/shm_rmid_forced` using a slash as a separator.
|
||||
For more sysctl parameter conversion method details, please refer to
|
||||
the page [sysctl.d(5)](https://man7.org/linux/man-pages/man5/sysctl.d.5.html) from
|
||||
the Linux man-pages project.
|
||||
Setting Sysctls for a Pod and PodSecurityPolicy features do not yet support
|
||||
setting sysctls with slashes.
|
||||
{{< /note >}}
|
||||
## {{% heading "prerequisites" %}}
|
||||
|
||||
|
||||
|
|
|
@ -220,11 +220,62 @@ After 15 seconds, view Pod events to verify that liveness probes:
|
|||
kubectl describe pod goproxy
|
||||
```
|
||||
|
||||
## Define a gRPC liveness probe
|
||||
|
||||
{{< feature-state for_k8s_version="v1.23" state="alpha" >}}
|
||||
|
||||
If your application implements [gRPC Health Checking Protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md),
|
||||
kubelet can be configured to use it for application liveness checks.
|
||||
You must enable the `GRPCContainerProbe`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
in order to configure checks that rely on gRPC.
|
||||
|
||||
Here is an example manifest:
|
||||
|
||||
{{< codenew file="pods/probe/grpc-liveness.yaml">}}
|
||||
|
||||
To use a gRPC probe, `port` must be configured. If the health endpoint is configured
|
||||
on a non-default service, you must also specify the `service`.
|
||||
|
||||
{{< note >}}
|
||||
Unlike HTTP and TCP probes, named ports cannot be used and custom host cannot be configured.
|
||||
{{< /note >}}
|
||||
|
||||
Configuration problems (for example: incorrect port and service, unimplemented health checking protocol)
|
||||
are considered a probe failure, similar to HTTP and TCP probes.
|
||||
|
||||
To try the gRPC liveness check, create a Pod using the command below.
|
||||
In the example below, the etcd pod is configured to use gRPC liveness probe.
|
||||
|
||||
```shell
|
||||
kubectl apply -f https://k8s.io/examples/pods/probe/content/en/examples/pods/probe/grpc-liveness.yaml
|
||||
```
|
||||
|
||||
After 15 seconds, view Pod events to verify that the liveness check has not failed:
|
||||
|
||||
```shell
|
||||
kubectl describe pod etcd-with-grpc
|
||||
```
|
||||
|
||||
Before Kubernetes 1.23, gRPC health probes were often implemented using [grpc-health-probe](https://github.com/grpc-ecosystem/grpc-health-probe/),
|
||||
as described in the blog post [Health checking gRPC servers on Kubernetes](/blog/2018/10/01/health-checking-grpc-servers-on-kubernetes/).
|
||||
The built-in gRPC probes behavior is similar to one implemented by grpc-health-probe.
|
||||
When migrating from grpc-health-probe to built-in probes, remember the following differences:
|
||||
|
||||
- Built-in probes run against the pod IP address, unlike grpc-health-probe that often runs against `127.0.0.1`.
|
||||
Be sure to configure your gRPC endpoint to listen on the Pod's IP address.
|
||||
- Built-in probes do not support any authentication parameters (like `-tls`).
|
||||
- There are no error codes for built-in probes. All errors are considered as probe failures.
|
||||
- If `ExecProbeTimeout` feature gate is set to `false`, grpc-health-probe does **not** respect the `timeoutSeconds` setting (which defaults to 1s),
|
||||
while built-in probe would fail on timeout.
|
||||
|
||||
## Use a named port
|
||||
|
||||
You can use a named
|
||||
[ContainerPort](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#containerport-v1-core)
|
||||
for HTTP or TCP liveness checks:
|
||||
[`port`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#ports)
|
||||
for HTTP and TCP probes. (gRPC probes do not support named ports).
|
||||
|
||||
For example:
|
||||
|
||||
```yaml
|
||||
ports:
|
||||
|
@ -349,7 +400,7 @@ This defect was corrected in Kubernetes v1.20. You may have been relying on the
|
|||
even without realizing it, as the default timeout is 1 second.
|
||||
As a cluster administrator, you can disable the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) `ExecProbeTimeout` (set it to `false`)
|
||||
on each kubelet to restore the behavior from older versions, then remove that override
|
||||
once all the exec probes in the cluster have a `timeoutSeconds` value set.
|
||||
once all the exec probes in the cluster have a `timeoutSeconds` value set.
|
||||
If you have pods that are impacted from the default 1 second timeout,
|
||||
you should update their probe timeout so that you're ready for the
|
||||
eventual removal of that feature gate.
|
||||
|
@ -487,12 +538,11 @@ It will be rejected by the API server.
|
|||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
|
||||
* Learn more about
|
||||
[Container Probes](/docs/concepts/workloads/pods/pod-lifecycle/#container-probes).
|
||||
|
||||
You can also read the API references for:
|
||||
|
||||
* [Pod](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#pod-v1-core)
|
||||
* [Container](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#container-v1-core)
|
||||
* [Probe](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#probe-v1-core)
|
||||
* [Pod](/docs/reference/kubernetes-api/workload-resources/pod-v1/), and specifically:
|
||||
* [container(s)](/docs/reference/kubernetes-api/workload-resources/pod-v1/#Container)
|
||||
* [probe(s)](/docs/reference/kubernetes-api/workload-resources/pod-v1/#Probe)
|
||||
|
|
|
@ -2,12 +2,12 @@
|
|||
title: Create a Windows HostProcess Pod
|
||||
content_type: task
|
||||
weight: 20
|
||||
min-kubernetes-server-version: 1.22
|
||||
min-kubernetes-server-version: 1.23
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
{{< feature-state for_k8s_version="v1.22" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.23" state="beta" >}}
|
||||
|
||||
Windows HostProcess containers enable you to run containerized
|
||||
workloads on a Windows host. These containers operate as
|
||||
|
@ -43,38 +43,48 @@ HostProcess containers have access to the host's network interfaces and IP addre
|
|||
privileges needed by Windows nodes.
|
||||
|
||||
|
||||
## {{% heading "prerequisites" %}}% version-check %}}
|
||||
## {{% heading "prerequisites" %}}
|
||||
|
||||
To enable HostProcess containers while in Alpha you need to
|
||||
pass the following feature gate flag to
|
||||
**kubelet** and **kube-apiserver**.
|
||||
See [Features Gates](/docs/reference/command-line-tools-reference/feature-gates/#overview)
|
||||
documentation for more details.
|
||||
<!-- change this when graduating to stable -->
|
||||
|
||||
```powershell
|
||||
--feature-gates=WindowsHostProcessContainers=true
|
||||
```
|
||||
This task guide is specific to Kubernetes v{{< skew currentVersion >}}.
|
||||
If you are not running Kubernetes v{{< skew currentVersion >}}, check the documentation for
|
||||
that version of Kubernetes.
|
||||
|
||||
The kubelet will communicate with containerd directly by
|
||||
passing the hostprocess flag via CRI. You can use the
|
||||
In Kubernetes {{< skew currentVersion >}}, the HostProcess container feature is enabled by default. The kubelet will
|
||||
communicate with containerd directly by passing the hostprocess flag via CRI. You can use the
|
||||
latest version of containerd (v1.6+) to run HostProcess containers.
|
||||
[How to install containerd.](/docs/setup/production-environment/container-runtimes/#containerd)
|
||||
|
||||
To *disable* HostProcess containers you need to pass the following feature gate flag to the
|
||||
**kubelet** and **kube-apiserver**:
|
||||
|
||||
```powershell
|
||||
--feature-gates=WindowsHostProcessContainers=false
|
||||
```
|
||||
|
||||
See [Features Gates](/docs/reference/command-line-tools-reference/feature-gates/#overview)
|
||||
documentation for more details.
|
||||
|
||||
|
||||
|
||||
## Limitations
|
||||
|
||||
- HostProcess containers require containerd 1.6 or higher for the
|
||||
These limitations are relevant for Kubernetes v{{< skew currentVersion >}}:
|
||||
|
||||
- HostProcess containers require containerd 1.6 or higher
|
||||
{{< glossary_tooltip text="container runtime" term_id="container-runtime" >}}.
|
||||
- As of v1.22 HostProcess pods can only contain HostProcess containers. This is a current limitation
|
||||
- HostProcess pods can only contain HostProcess containers. This is a current limitation
|
||||
of the Windows OS; non-privileged Windows containers cannot share a vNIC with the host IP namespace.
|
||||
- HostProcess containers run as a process on the host and do not have any degree of
|
||||
isolation other than resource constraints imposed on the HostProcess user account. Neither
|
||||
filesystem or Hyper-V isolation are supported for HostProcess containers.
|
||||
- Volume mounts are supported and are mounted under the container volume. See [Volume Mounts](#volume-mounts)
|
||||
- A limited set of host user accounts are available for HostProcess containers by default.
|
||||
See [Choosing a User Account](#choosing-a-user-account).
|
||||
See [Choosing a User Account](#choosing-a-user-account).
|
||||
- Resource limits (disk, memory, cpu count) are supported in the same fashion as processes
|
||||
on the host.
|
||||
- Both Named pipe mounts and Unix domain sockets are **not** currently supported and should instead
|
||||
- Both Named pipe mounts and Unix domain sockets are **not** supported and should instead
|
||||
be accessed via their path on the host (e.g. \\\\.\\pipe\\\*)
|
||||
|
||||
## HostProcess Pod configuration requirements
|
||||
|
@ -88,62 +98,64 @@ When running under the privileged policy, here are
|
|||
the configurations which need to be set to enable the creation of a HostProcess pod:
|
||||
|
||||
<table>
|
||||
<caption style="display:none">Privileged policy specification</caption>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td><strong>Control</strong></td>
|
||||
<td><strong>Policy</strong></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td style="white-space: nowrap"><a href="/docs/concepts/security/pod-security-standards">Windows HostProcess</a></td>
|
||||
<td>
|
||||
<p>Windows pods offer the ability to run <a href="/docs/tasks/configure-pod-container/create-hostprocess-pod">
|
||||
<caption style="display: none">Privileged policy specification</caption>
|
||||
<thead>
|
||||
<tr>
|
||||
<th>Control</th>
|
||||
<th>Policy</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
<tr>
|
||||
<td style="white-space: nowrap"><a href="/docs/concepts/security/pod-security-standards"><tt>securityContext.windowsOptions.hostProcess</tt></a></td>
|
||||
<td>
|
||||
<p>Windows pods offer the ability to run <a href="/docs/tasks/configure-pod-container/create-hostprocess-pod">
|
||||
HostProcess containers</a> which enables privileged access to the Windows node. </p>
|
||||
<p><strong>Allowed Values</strong></p>
|
||||
<ul>
|
||||
<li><code>true</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td style="white-space: nowrap"><a href="/docs/concepts/security/pod-security-standards">Host Networking</a></td>
|
||||
<td>
|
||||
<p>Will be in host network by default initially. Support
|
||||
to set network to a different compartment may be desirable in
|
||||
the future.</p>
|
||||
<p><strong>Allowed Values</strong></p>
|
||||
<ul>
|
||||
<li><code>true</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
</tr>
|
||||
<p><strong>Allowed Values</strong></p>
|
||||
<ul>
|
||||
<li><code>true</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td style="white-space: nowrap"><a href="/docs/tasks/configure-pod-container/configure-runasusername/">runAsUsername</a></td>
|
||||
<td>
|
||||
<p>Specification of which user the HostProcess container should run as is required for the pod spec.</p>
|
||||
<p><strong>Allowed Values</strong></p>
|
||||
<ul>
|
||||
<li><code>NT AUTHORITY\SYSTEM</code></li>
|
||||
<li><code>NT AUTHORITY\Local service</code></li>
|
||||
<li><code>NT AUTHORITY\NetworkService</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
</tr>
|
||||
<td style="white-space: nowrap"><a href="/docs/concepts/security/pod-security-standards"><tt>hostNetwork</tt></a></td>
|
||||
<td>
|
||||
<p>Will be in host network by default initially. Support
|
||||
to set network to a different compartment may be desirable in
|
||||
the future.</p>
|
||||
<p><strong>Allowed Values</strong></p>
|
||||
<ul>
|
||||
<li><code>true</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td style="white-space: nowrap"><a href="/docs/concepts/security/pod-security-standards">runAsNonRoot</a></td>
|
||||
<td>
|
||||
<p>Because HostProcess containers have privileged access to the host, the <tt>runAsNonRoot</tt> field cannot be set to true.</p>
|
||||
<p><strong>Allowed Values</strong></p>
|
||||
<ul>
|
||||
<td style="white-space: nowrap"><a href="/docs/tasks/configure-pod-container/configure-runasusername/"><tt>securityContext.windowsOptions.runAsUsername</tt></a></td>
|
||||
<td>
|
||||
<p>Specification of which user the HostProcess container should run as is required for the pod spec.</p>
|
||||
<p><strong>Allowed Values</strong></p>
|
||||
<ul>
|
||||
<li><code>NT AUTHORITY\SYSTEM</code></li>
|
||||
<li><code>NT AUTHORITY\Local service</code></li>
|
||||
<li><code>NT AUTHORITY\NetworkService</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td style="white-space: nowrap"><a href="/docs/concepts/security/pod-security-standards"><tt>runAsNonRoot</tt></a></td>
|
||||
<td>
|
||||
<p>Because HostProcess containers have privileged access to the host, the <tt>runAsNonRoot</tt> field cannot be set to true.</p>
|
||||
<p><strong>Allowed Values</strong></p>
|
||||
<ul>
|
||||
<li>Undefined/Nil</li>
|
||||
<li><code>false</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
<li><code>false</code></li>
|
||||
</ul>
|
||||
</td>
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
### Example Manifest (excerpt)
|
||||
### Example manifest (excerpt) {#manifest-example}
|
||||
|
||||
```yaml
|
||||
spec:
|
||||
|
@ -163,13 +175,13 @@ spec:
|
|||
"kubernetes.io/os": windows
|
||||
```
|
||||
|
||||
## Volume Mounts
|
||||
## Volume mounts
|
||||
|
||||
HostProcess containers support the ability to mount volumes within the container volume space.
|
||||
Applications running inside the container can access volume mounts directly via relative or
|
||||
absolute paths. An environment variable `$CONTAINER_SANDBOX_MOUNT_POINT` is set upon container
|
||||
creation and provides the absolute host path to the container volume. Relative paths are based
|
||||
upon the `Pod.containers.volumeMounts.mountPath` configuration.
|
||||
upon the `.spec.containers.volumeMounts.mountPath` configuration.
|
||||
|
||||
### Example {#volume-mount-example}
|
||||
|
||||
|
@ -179,7 +191,7 @@ To access service account tokens the following path structures are supported wit
|
|||
|
||||
`$CONTAINER_SANDBOX_MOUNT_POINT\var\run\secrets\kubernetes.io\serviceaccount\`
|
||||
|
||||
## Resource Limits
|
||||
## Resource limits
|
||||
|
||||
Resource limits (disk, memory, cpu count) are applied to the job and are job wide.
|
||||
For example, with a limit of 10MB set, the memory allocated for any HostProcess job object
|
||||
|
@ -188,7 +200,7 @@ These limits would be specified the same way they are currently for whatever orc
|
|||
or runtime is being used. The only difference is in the disk resource usage calculation
|
||||
used for resource tracking due to the difference in how HostProcess containers are bootstrapped.
|
||||
|
||||
## Choosing a User Account
|
||||
## Choosing a user account
|
||||
|
||||
HostProcess containers support the ability to run as one of three supported Windows service accounts:
|
||||
|
||||
|
|
|
@ -15,10 +15,52 @@ You can configure this admission controller to set cluster-wide defaults and [ex
|
|||
|
||||
{{% version-check %}}
|
||||
|
||||
- Enable the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/#feature-gates-for-alpha-or-beta-features).
|
||||
- Ensure the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/#feature-gates-for-alpha-or-beta-features) is enabled.
|
||||
|
||||
## Configure the Admission Controller
|
||||
|
||||
{{< tabs name="PodSecurityConfiguration_example_1" >}}
|
||||
{{% tab name="pod-security.admission.config.k8s.io/v1beta1" %}}
|
||||
```yaml
|
||||
apiVersion: apiserver.config.k8s.io/v1
|
||||
kind: AdmissionConfiguration
|
||||
plugins:
|
||||
- name: PodSecurity
|
||||
configuration:
|
||||
apiVersion: pod-security.admission.config.k8s.io/v1beta1
|
||||
kind: PodSecurityConfiguration
|
||||
# Defaults applied when a mode label is not set.
|
||||
#
|
||||
# Level label values must be one of:
|
||||
# - "privileged" (default)
|
||||
# - "baseline"
|
||||
# - "restricted"
|
||||
#
|
||||
# Version label values must be one of:
|
||||
# - "latest" (default)
|
||||
# - specific version like "v{{< skew latestVersion >}}"
|
||||
defaults:
|
||||
enforce: "privileged"
|
||||
enforce-version: "latest"
|
||||
audit: "privileged"
|
||||
audit-version: "latest"
|
||||
warn: "privileged"
|
||||
warn-version: "latest"
|
||||
exemptions:
|
||||
# Array of authenticated usernames to exempt.
|
||||
usernames: []
|
||||
# Array of runtime class names to exempt.
|
||||
runtimeClassNames: []
|
||||
# Array of namespaces to exempt.
|
||||
namespaces: []
|
||||
```
|
||||
|
||||
{{< note >}}
|
||||
v1beta1 configuration requires v1.23+. For v1.22, use v1alpha1.
|
||||
{{< /note >}}
|
||||
|
||||
{{% /tab %}}
|
||||
{{% tab name="pod-security.admission.config.k8s.io/v1alpha1" %}}
|
||||
```yaml
|
||||
apiVersion: apiserver.config.k8s.io/v1
|
||||
kind: AdmissionConfiguration
|
||||
|
@ -51,4 +93,6 @@ plugins:
|
|||
runtimeClasses: []
|
||||
# Array of namespaces to exempt.
|
||||
namespaces: []
|
||||
```
|
||||
```
|
||||
{{% /tab %}}
|
||||
{{< /tabs >}}
|
||||
|
|
|
@ -13,7 +13,7 @@ Namespaces can be labeled to enforce the [Pod Security Standards](/docs/concepts
|
|||
|
||||
{{% version-check %}}
|
||||
|
||||
- Enable the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/#feature-gates-for-alpha-or-beta-features).
|
||||
- Ensure the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/#feature-gates-for-alpha-or-beta-features) is enabled.
|
||||
|
||||
## Requiring the `baseline` Pod Security Standard with namespace labels
|
||||
|
||||
|
|
|
@ -17,7 +17,7 @@ admission controller. This can be done effectively using a combination of dry-ru
|
|||
|
||||
{{% version-check %}}
|
||||
|
||||
- Enable the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/#feature-gates-for-alpha-or-beta-features).
|
||||
- Ensure the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/#feature-gates-for-alpha-or-beta-features) is enabled.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
|
|
|
@ -149,7 +149,7 @@ exit
|
|||
|
||||
## Configure volume permission and ownership change policy for Pods
|
||||
|
||||
{{< feature-state for_k8s_version="v1.20" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.23" state="stable" >}}
|
||||
|
||||
By default, Kubernetes recursively changes ownership and permissions for the contents of each
|
||||
volume to match the `fsGroup` specified in a Pod's `securityContext` when that volume is
|
||||
|
@ -186,7 +186,7 @@ and [`emptydir`](/docs/concepts/storage/volumes/#emptydir).
|
|||
|
||||
## Delegating volume permission and ownership change to CSI driver
|
||||
|
||||
{{< feature-state for_k8s_version="v1.22" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.23" state="beta" >}}
|
||||
|
||||
If you deploy a [Container Storage Interface (CSI)](https://github.com/container-storage-interface/spec/blob/master/spec.md)
|
||||
driver which supports the `VOLUME_MOUNT_GROUP` `NodeServiceCapability`, the
|
||||
|
|
|
@ -73,7 +73,7 @@ For more details, see [Get a Shell to a Running Container](
|
|||
|
||||
## Debugging with an ephemeral debug container {#ephemeral-container}
|
||||
|
||||
{{< feature-state state="alpha" for_k8s_version="v1.22" >}}
|
||||
{{< feature-state state="beta" for_k8s_version="v1.23" >}}
|
||||
|
||||
{{< glossary_tooltip text="Ephemeral containers" term_id="ephemeral-container" >}}
|
||||
are useful for interactive troubleshooting when `kubectl exec` is insufficient
|
||||
|
@ -83,12 +83,6 @@ https://github.com/GoogleContainerTools/distroless).
|
|||
|
||||
### Example debugging using ephemeral containers {#ephemeral-container-example}
|
||||
|
||||
{{< note >}}
|
||||
The examples in this section require the `EphemeralContainers` [feature gate](
|
||||
/docs/reference/command-line-tools-reference/feature-gates/) enabled in your
|
||||
cluster and `kubectl` version v1.22 or later.
|
||||
{{< /note >}}
|
||||
|
||||
You can use the `kubectl debug` command to add ephemeral containers to a
|
||||
running Pod. First, create a pod for the example:
|
||||
|
||||
|
|
|
@ -65,3 +65,12 @@ Metrics Server collects metrics from the Summary API, exposed by
|
|||
|
||||
Learn more about the metrics server in
|
||||
[the design doc](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/metrics-server.md).
|
||||
|
||||
### Summary API Source
|
||||
The [Kubelet](/docs/reference/command-line-tools-reference/kubelet/) gathers stats at node, volume, pod and container level, and omits
|
||||
them in the [Summary API](https://github.com/kubernetes/kubernetes/blob/7d309e0104fedb57280b261e5677d919cb2a0e2d/staging/src/k8s.io/kubelet/pkg/apis/stats/v1alpha1/types.go)
|
||||
for consumers to read.
|
||||
|
||||
Pre-1.23, these resources have been primarily gathered from [cAdvisor](https://github.com/google/cadvisor). However, in 1.23 with the
|
||||
introduction of the `PodAndContainerStatsFromCRI` FeatureGate, container and pod level stats can be gathered by the CRI implementation.
|
||||
Note: this also requires support from the CRI implementations (containerd >= 1.6.0, CRI-O >= 1.23.0).
|
||||
|
|
|
@ -556,8 +556,9 @@ deleted by Kubernetes.
|
|||
### Validation
|
||||
|
||||
Custom resources are validated via
|
||||
[OpenAPI v3 schemas](https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.0.md#schemaObject)
|
||||
and you can add additional validation using
|
||||
[OpenAPI v3 schemas](https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.0.md#schemaObject),
|
||||
by x-kubernetes-validations when the [Validation Rules feature](#validation-rules) is enabled, and you
|
||||
can add additional validation using
|
||||
[admission webhooks](/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook).
|
||||
|
||||
Additionally, the following restrictions are applied to the schema:
|
||||
|
@ -577,6 +578,11 @@ Additionally, the following restrictions are applied to the schema:
|
|||
- The field `additionalProperties` cannot be set to `false`.
|
||||
- The field `additionalProperties` is mutually exclusive with `properties`.
|
||||
|
||||
The `x-kubernetes-validations` extension can be used to validate custom resources using [Common
|
||||
Expression Language (CEL)](https://github.com/google/cel-spec) expressions when the [Validation
|
||||
rules](#validation-rules) feature is enabled and the CustomResourceDefinition schema is a
|
||||
[structural schema](#specifying-a-structural-schema).
|
||||
|
||||
The `default` field can be set when the [Defaulting feature](#defaulting) is enabled,
|
||||
which is the case with `apiextensions.k8s.io/v1` CustomResourceDefinitions.
|
||||
Defaulting is in GA since 1.17 (beta since 1.16 with the `CustomResourceDefaulting`
|
||||
|
@ -693,6 +699,303 @@ kubectl apply -f my-crontab.yaml
|
|||
crontab "my-new-cron-object" created
|
||||
```
|
||||
|
||||
## Validation rules
|
||||
|
||||
{{< feature-state state="alpha" for_k8s_version="v1.23" >}}
|
||||
|
||||
Validation rules are in alpha since 1.23 and validate custom resources when the
|
||||
`CustomResourceValidationExpressions` [feature
|
||||
gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled.
|
||||
This feature is only available if the schema is a
|
||||
[structural schema](#specifying-a-structural-schema).
|
||||
|
||||
Validation rules use the [Common Expression Language (CEL)](https://github.com/google/cel-spec)
|
||||
to validate custom resource values. Validation rules are included in
|
||||
CustomResourceDefinition schemas using the `x-kubernetes-validations` extension.
|
||||
|
||||
The Rule is scoped to the location of the `x-kubernetes-validations` extension in the schema.
|
||||
And `self` variable in the CEL expression is bound to the scoped value.
|
||||
|
||||
For example:
|
||||
|
||||
```yaml
|
||||
...
|
||||
openAPIV3Schema:
|
||||
type: object
|
||||
properties:
|
||||
spec:
|
||||
type: object
|
||||
x-kubernetes-validation-rules:
|
||||
- rule: "self.minReplicas <= self.replicas"
|
||||
message: "replicas should be greater than or equal to minReplicas."
|
||||
- rule: "self.replicas <= self.maxReplicas"
|
||||
message: "replicas should be smaller than or equal to maxReplicas."
|
||||
properties:
|
||||
...
|
||||
minReplicas:
|
||||
type: integer
|
||||
replicas:
|
||||
type: integer
|
||||
maxReplicas:
|
||||
type: integer
|
||||
required:
|
||||
- minReplicas
|
||||
- replicas
|
||||
- maxReplicas
|
||||
```
|
||||
|
||||
will reject a request to create this custom resource:
|
||||
|
||||
```yaml
|
||||
apiVersion: "stable.example.com/v1"
|
||||
kind: CronTab
|
||||
metadata:
|
||||
name: my-new-cron-object
|
||||
spec:
|
||||
minReplicas: 0
|
||||
replicas: 20
|
||||
maxReplicas: 10
|
||||
```
|
||||
|
||||
with the response:
|
||||
|
||||
```
|
||||
The CronTab "my-new-cron-object" is invalid:
|
||||
* spec: Invalid value: map[string]interface {}{"maxReplicas":10, "minReplicas":0, "replicas":20}: replicas should be smaller than or equal to maxReplicas.
|
||||
```
|
||||
|
||||
`x-kubernetes-validations` could have multiple rules.
|
||||
|
||||
The `rule` under `x-kubernetes-validations` represents the expression which will be evaluated by CEL.
|
||||
|
||||
The `message` represents the message displayed when validation fails. If message is unset, the above response would be:
|
||||
```
|
||||
The CronTab "my-new-cron-object" is invalid:
|
||||
* spec: Invalid value: map[string]interface {}{"maxReplicas":10, "minReplicas":0, "replicas":20}: failed rule: self.replicas <= self.maxReplicas
|
||||
```
|
||||
|
||||
Validation rules are compiled when CRDs are created/updated.
|
||||
The request of CRDs create/update will fail if compilation of validation rules fail.
|
||||
Compilation process includes type checking as well.
|
||||
|
||||
The compilation failure:
|
||||
- `no_matching_overload`: this function has no overload for the types of the arguments.
|
||||
|
||||
e.g. Rule like `self == true` against a field of integer type will get error:
|
||||
```
|
||||
Invalid value: apiextensions.ValidationRule{Rule:"self == true", Message:""}: compilation failed: ERROR: \<input>:1:6: found no matching overload for '_==_' applied to '(int, bool)'
|
||||
```
|
||||
|
||||
- `no_such_field`: does not contain the desired field.
|
||||
|
||||
e.g. Rule like `self.nonExistingField > 0` against a non-existing field will return the error:
|
||||
```
|
||||
Invalid value: apiextensions.ValidationRule{Rule:"self.nonExistingField > 0", Message:""}: compilation failed: ERROR: \<input>:1:5: undefined field 'nonExistingField'
|
||||
```
|
||||
|
||||
- `invalid argument`: invalid argument to macros.
|
||||
|
||||
e.g. Rule like `has(self)` will return error:
|
||||
```
|
||||
Invalid value: apiextensions.ValidationRule{Rule:"has(self)", Message:""}: compilation failed: ERROR: <input>:1:4: invalid argument to has() macro
|
||||
```
|
||||
|
||||
|
||||
Validation Rules Examples:
|
||||
|
||||
| Rule | Purpose |
|
||||
| ---------------- | ------------ |
|
||||
| `self.minReplicas <= self.replicas && self.replicas <= self.maxReplicas` | Validate that the three fields defining replicas are ordered appropriately |
|
||||
| `'Available' in self.stateCounts` | Validate that an entry with the 'Available' key exists in a map |
|
||||
| `(size(self.list1) == 0) != (size(self.list2) == 0)` | Validate that one of two lists is non-empty, but not both |
|
||||
| <code>!('MY_KEY' in self.map1) || self['MY_KEY'].matches('^[a-zA-Z]*$')</code> | Validate the value of a map for a specific key, if it is in the map |
|
||||
| `self.envars.filter(e, e.name = 'MY_ENV').all(e, e.value.matches('^[a-zA-Z]*$')` | Validate the 'value' field of a listMap entry where key field 'name' is 'MY_ENV' |
|
||||
| `has(self.expired) && self.created + self.ttl < self.expired` | Validate that 'expired' date is after a 'create' date plus a 'ttl' duration |
|
||||
| `self.health.startsWith('ok')` | Validate a 'health' string field has the prefix 'ok' |
|
||||
| `self.widgets.exists(w, w.key == 'x' && w.foo < 10)` | Validate that the 'foo' property of a listMap item with a key 'x' is less than 10 |
|
||||
| `type(self) == string ? self == '100%' : self == 1000` | Validate an int-or-string field for both the the int and string cases |
|
||||
| `self.metadata.name.startsWith(self.prefix)` | Validate that an object's name has the prefix of another field value |
|
||||
| `self.set1.all(e, !(e in self.set2))` | Validate that two listSets are disjoint |
|
||||
| `size(self.names) == size(self.details) && self.names.all(n, n in self.details)` | Validate the 'details' map is keyed by the items in the 'names' listSet |
|
||||
|
||||
Xref: [Supported evaluation on CEL](https://github.com/google/cel-spec/blob/v0.6.0/doc/langdef.md#evaluation)
|
||||
|
||||
|
||||
- If the Rule is scoped to the root of a resource, it may make field selection into any fields
|
||||
declared in the OpenAPIv3 schema of the CRD as well as `apiVersion`, `kind`, `metadata.name` and
|
||||
`metadata.generateName`. This includes selection of fields in both the `spec` and `status` in the
|
||||
same expression:
|
||||
```yaml
|
||||
...
|
||||
openAPIV3Schema:
|
||||
type: object
|
||||
x-kubernetes-validation-rules:
|
||||
- rule: "self.status.availableReplicas >= self.spec.minReplicas"
|
||||
properties:
|
||||
spec:
|
||||
type: object
|
||||
properties:
|
||||
minReplicas:
|
||||
type: integer
|
||||
...
|
||||
status:
|
||||
type: object
|
||||
properties:
|
||||
availableReplicas:
|
||||
type: integer
|
||||
```
|
||||
|
||||
- If the Rule is scoped to an object with properties, the accessible properties of the object are field selectable
|
||||
via `self.field` and field presence can be checked via `has(self.field)`. Null valued fields are treated as
|
||||
absent fields in CEL expressions.
|
||||
|
||||
```yaml
|
||||
...
|
||||
openAPIV3Schema:
|
||||
type: object
|
||||
properties:
|
||||
spec:
|
||||
type: object
|
||||
x-kubernetes-validation-rules:
|
||||
- rule: "has(self.foo)"
|
||||
properties:
|
||||
...
|
||||
foo:
|
||||
type: integer
|
||||
```
|
||||
|
||||
- If the Rule is scoped to an object with additionalProperties (i.e. a map) the value of the map
|
||||
are accessible via `self[mapKey]`, map containment can be checked via `mapKey in self` and all entries of the map
|
||||
are accessible via CEL macros and functions such as `self.all(...)`.
|
||||
```yaml
|
||||
...
|
||||
openAPIV3Schema:
|
||||
type: object
|
||||
properties:
|
||||
spec:
|
||||
type: object
|
||||
x-kubernetes-validation-rules:
|
||||
- rule: "self['xyz'].foo > 0"
|
||||
additionalProperties:
|
||||
...
|
||||
type: object
|
||||
properties:
|
||||
foo:
|
||||
type: integer
|
||||
```
|
||||
|
||||
- If the Rule is scoped to an array, the elements of the array are accessible via `self[i]` and also by macros and
|
||||
functions.
|
||||
```yaml
|
||||
...
|
||||
openAPIV3Schema:
|
||||
type: object
|
||||
properties:
|
||||
...
|
||||
foo:
|
||||
type: array
|
||||
x-kubernetes-validation-rules:
|
||||
- rule: "size(self) == 1"
|
||||
items:
|
||||
type: string
|
||||
```
|
||||
|
||||
- If the Rule is scoped to a scalar, `self` is bound to the scalar value.
|
||||
```yaml
|
||||
...
|
||||
openAPIV3Schema:
|
||||
type: object
|
||||
properties:
|
||||
spec:
|
||||
type: object
|
||||
properties:
|
||||
...
|
||||
foo:
|
||||
type: integer
|
||||
x-kubernetes-validation-rules:
|
||||
- rule: "self > 0"
|
||||
```
|
||||
Examples:
|
||||
|
||||
|type of the field rule scoped to | Rule example |
|
||||
| -----------------------| -----------------------|
|
||||
| root object | `self.status.actual <= self.spec.maxDesired`|
|
||||
| map of objects | `self.components['Widget'].priority < 10`|
|
||||
| list of integers | `self.values.all(value, value >= 0 && value < 100)`|
|
||||
| string | `self.startsWith('kube')`|
|
||||
|
||||
|
||||
The `apiVersion`, `kind`, `metadata.name` and `metadata.generateName` are always accessible from the root of the
|
||||
object and from any x-kubernetes-embedded-resource annotated objects. No other metadata properties are accessible.
|
||||
|
||||
Unknown data preserved in custom resources via `x-kubernetes-preserve-unknown-fields` is not accessible in CEL
|
||||
expressions. This includes:
|
||||
- Unknown field values that are preserved by object schemas with x-kubernetes-preserve-unknown-fields.
|
||||
- Object properties where the property schema is of an "unknown type". An "unknown type" is recursively defined as:
|
||||
- A schema with no type and x-kubernetes-preserve-unknown-fields set to true
|
||||
- An array where the items schema is of an "unknown type"
|
||||
- An object where the additionalProperties schema is of an "unknown type"
|
||||
|
||||
|
||||
Only property names of the form `[a-zA-Z_.-/][a-zA-Z0-9_.-/]*` are accessible.
|
||||
Accessible property names are escaped according to the following rules when accessed in the expression:
|
||||
|
||||
| escape sequence | property name equivalent |
|
||||
| ----------------------- | -----------------------|
|
||||
| `__underscores__` | `__` |
|
||||
| `__dot__` | `.` |
|
||||
|`__dash__` | `-` |
|
||||
| `__slash__` | `/` |
|
||||
| `__{keyword}__` | [CEL RESERVED keyword](https://github.com/google/cel-spec/blob/v0.6.0/doc/langdef.md#syntax) |
|
||||
|
||||
Note: CEL RESERVED keyword needs to match the exact property name to be escaped (e.g. int in the word sprint would not be escaped).
|
||||
|
||||
Examples on escaping:
|
||||
|
||||
|property name | rule with escaped property name |
|
||||
| ----------------| ----------------------- |
|
||||
| namespace | `self.__namespace__ > 0` |
|
||||
| x-prop | `self.x__dash__prop > 0` |
|
||||
| redact__d | `self.redact__underscores__d > 0` |
|
||||
| string | `self.startsWith('kube')` |
|
||||
|
||||
|
||||
Equality on arrays with `x-kubernetes-list-type` of `set` or `map` ignores element order, i.e. [1, 2] == [2, 1].
|
||||
Concatenation on arrays with x-kubernetes-list-type use the semantics of the list type:
|
||||
- `set`: `X + Y` performs a union where the array positions of all elements in `X` are preserved and
|
||||
non-intersecting elements in `Y` are appended, retaining their partial order.
|
||||
- `map`: `X + Y` performs a merge where the array positions of all keys in `X` are preserved but the values
|
||||
are overwritten by values in `Y` when the key sets of `X` and `Y` intersect. Elements in `Y` with
|
||||
non-intersecting keys are appended, retaining their partial order.
|
||||
|
||||
|
||||
Here is the declarations type mapping between OpenAPIv3 and CEL type:
|
||||
|
||||
| OpenAPIv3 type | CEL type |
|
||||
| -------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
|
||||
| 'object' with Properties | object / "message type" |
|
||||
| 'object' with AdditionalProperties | map |
|
||||
| 'object' with x-kubernetes-embedded-type | object / "message type", 'apiVersion', 'kind', 'metadata.name' and 'metadata.generateName' are implicitly included in schema |
|
||||
| 'object' with x-kubernetes-preserve-unknown-fields | object / "message type", unknown fields are NOT accessible in CEL expression |
|
||||
| x-kubernetes-int-or-string | dynamic object that is either an int or a string, `type(value)` can be used to check the type |
|
||||
| 'array | list |
|
||||
| 'array' with x-kubernetes-list-type=map | list with map based Equality & unique key guarantees |
|
||||
| 'array' with x-kubernetes-list-type=set | list with set based Equality & unique entry guarantees |
|
||||
| 'boolean' | boolean |
|
||||
| 'number' (all formats) | double |
|
||||
| 'integer' (all formats) | int (64) |
|
||||
| 'null' | null_type |
|
||||
| 'string' | string |
|
||||
| 'string' with format=byte (base64 encoded) | bytes |
|
||||
| 'string' with format=date | timestamp (google.protobuf.Timestamp) |
|
||||
| 'string' with format=datetime | timestamp (google.protobuf.Timestamp) |
|
||||
| 'string' with format=duration | duration (google.protobuf.Duration) |
|
||||
|
||||
xref: [CEL types](https://github.com/google/cel-spec/blob/v0.6.0/doc/langdef.md#values), [OpenAPI
|
||||
types](https://swagger.io/specification/#data-types), [Kubernetes Structural Schemas](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#specifying-a-structural-schema).
|
||||
|
||||
|
||||
|
||||
### Defaulting
|
||||
|
||||
{{< note >}}
|
||||
|
|
|
@ -3,7 +3,7 @@ reviewers:
|
|||
- lachie83
|
||||
- khenidak
|
||||
- bridgetkromhout
|
||||
min-kubernetes-server-version: v1.20
|
||||
min-kubernetes-server-version: v1.23
|
||||
title: Validate IPv4/IPv6 dual-stack
|
||||
content_type: task
|
||||
---
|
||||
|
@ -21,6 +21,9 @@ This document shares how to validate IPv4/IPv6 dual-stack enabled Kubernetes clu
|
|||
|
||||
{{< version-check >}}
|
||||
|
||||
{{< note >}}
|
||||
While you can validate with an earlier version, the feature is only GA and officially supported since v1.23.
|
||||
{{< /note >}}
|
||||
|
||||
|
||||
<!-- steps -->
|
||||
|
|
|
@ -4,42 +4,59 @@ reviewers:
|
|||
- jszczepkowski
|
||||
- justinsb
|
||||
- directxman12
|
||||
title: Horizontal Pod Autoscaler Walkthrough
|
||||
title: HorizontalPodAutoscaler Walkthrough
|
||||
content_type: task
|
||||
weight: 100
|
||||
min-kubernetes-server-version: 1.23
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
Horizontal Pod Autoscaler automatically scales the number of Pods
|
||||
in a replication controller, deployment, replica set or stateful set based on observed CPU utilization
|
||||
(or, with beta support, on some other, application-provided metrics).
|
||||
A [HorizontalPodAutoscaler](/docs/tasks/run-application/horizontal-pod-autoscale/)
|
||||
(HPA for short)
|
||||
automatically updates a workload resource (such as
|
||||
a {{< glossary_tooltip text="Deployment" term_id="deployment" >}} or
|
||||
{{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}}), with the
|
||||
aim of automatically scaling the workload to match demand.
|
||||
|
||||
This document walks you through an example of enabling Horizontal Pod Autoscaler for the php-apache server.
|
||||
For more information on how Horizontal Pod Autoscaler behaves, see the
|
||||
[Horizontal Pod Autoscaler user guide](/docs/tasks/run-application/horizontal-pod-autoscale/).
|
||||
Horizontal scaling means that the response to increased load is to deploy more
|
||||
{{< glossary_tooltip text="Pods" term_id="pod" >}}.
|
||||
This is different from _vertical_ scaling, which for Kubernetes would mean
|
||||
assigning more resources (for example: memory or CPU) to the Pods that are already
|
||||
running for the workload.
|
||||
|
||||
If the load decreases, and the number of Pods is above the configured minimum,
|
||||
the HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet,
|
||||
or other similar resource) to scale back down.
|
||||
|
||||
This document walks you through an example of enabling HorizontalPodAutoscaler to
|
||||
automatically manage scale for an example web app. This example workload is Apache
|
||||
httpd running some PHP code.
|
||||
|
||||
## {{% heading "prerequisites" %}}
|
||||
|
||||
This example requires a running Kubernetes cluster and kubectl, version 1.2 or later.
|
||||
[Metrics server](https://github.com/kubernetes-sigs/metrics-server) monitoring needs to be deployed
|
||||
in the cluster to provide metrics through the [Metrics API](https://github.com/kubernetes/metrics).
|
||||
Horizontal Pod Autoscaler uses this API to collect metrics. To learn how to deploy the metrics-server,
|
||||
see the [metrics-server documentation](https://github.com/kubernetes-sigs/metrics-server#deployment).
|
||||
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}} If you're running an older
|
||||
release of Kubernetes, refer to the version of the documentation for that release (see
|
||||
[available documentation versions](/docs/home/supported-doc-versions/).
|
||||
|
||||
To specify multiple resource metrics for a Horizontal Pod Autoscaler, you must have a
|
||||
Kubernetes cluster and kubectl at version 1.6 or later. To make use of custom metrics, your cluster
|
||||
must be able to communicate with the API server providing the custom Metrics API.
|
||||
Finally, to use metrics not related to any Kubernetes object you must have a
|
||||
Kubernetes cluster at version 1.10 or later, and you must be able to communicate
|
||||
with the API server that provides the external Metrics API.
|
||||
See the [Horizontal Pod Autoscaler user guide](/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-custom-metrics) for more details.
|
||||
To follow this walkthrough, you also need to use a cluster that has a
|
||||
[Metrics Server](https://github.com/kubernetes-sigs/metrics-server#readme) deployed and configured.
|
||||
The Kubernetes Metrics Server collects resource metrics from
|
||||
the {{<glossary_tooltip term_id="kubelet" text="kubelets">}} in your cluster, and exposes those metrics
|
||||
through the [Kubernetes API](/docs/concepts/overview/kubernetes-api/),
|
||||
using an [APIService](/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/) to add
|
||||
new kinds of resource that represent metric readings.
|
||||
|
||||
To learn how to deploy the Metrics Server, see the
|
||||
[metrics-server documentation](https://github.com/kubernetes-sigs/metrics-server#deployment).
|
||||
|
||||
<!-- steps -->
|
||||
|
||||
## Run and expose php-apache server
|
||||
|
||||
To demonstrate Horizontal Pod Autoscaler we will use a custom docker image based on the php-apache image. The Dockerfile has the following content:
|
||||
To demonstrate a HorizontalPodAutoscaler, you will first make a custom container image that uses
|
||||
the `php-apache` image from Docker Hub as its starting point. The `Dockerfile` is ready-made for you,
|
||||
and has the following content:
|
||||
|
||||
```dockerfile
|
||||
FROM php:5-apache
|
||||
|
@ -47,7 +64,8 @@ COPY index.php /var/www/html/index.php
|
|||
RUN chmod a+rx index.php
|
||||
```
|
||||
|
||||
It defines an index.php page which performs some CPU intensive computations:
|
||||
This code defines a simple `index.php` page that performs some CPU intensive computations,
|
||||
in order to simulate load in your cluster.
|
||||
|
||||
```php
|
||||
<?php
|
||||
|
@ -59,12 +77,13 @@ It defines an index.php page which performs some CPU intensive computations:
|
|||
?>
|
||||
```
|
||||
|
||||
First, we will start a deployment running the image and expose it as a service
|
||||
using the following configuration:
|
||||
Once you have made that container image, start a Deployment that runs a container using the
|
||||
image you made, and expose it as a {{< glossary_tooltip term_id="service">}}
|
||||
using the following manifest:
|
||||
|
||||
{{< codenew file="application/php-apache.yaml" >}}
|
||||
|
||||
Run the following command:
|
||||
To do so, run the following command:
|
||||
|
||||
```shell
|
||||
kubectl apply -f https://k8s.io/examples/application/php-apache.yaml
|
||||
|
@ -75,16 +94,27 @@ deployment.apps/php-apache created
|
|||
service/php-apache created
|
||||
```
|
||||
|
||||
## Create Horizontal Pod Autoscaler
|
||||
## Create the HorizontalPodAutoscaler {#create-horizontal-pod-autoscaler}
|
||||
|
||||
Now that the server is running, create the autoscaler using `kubectl`. There is
|
||||
[`kubectl autoscale`](/docs/reference/generated/kubectl/kubectl-commands#autoscale) subcommand,
|
||||
part of `kubectl`, that helps you do this.
|
||||
|
||||
You will shortly run a command that creates a HorizontalPodAutoscaler that maintains
|
||||
between 1 and 10 replicas of the Pods controlled by the php-apache Deployment that
|
||||
you created in the first step of these instructions.
|
||||
|
||||
Roughly speaking, the HPA {{<glossary_tooltip text="controller" term_id="controller">}} will increase and decrease
|
||||
the number of replicas (by updating the Deployment) to maintain an average CPU utilization across all Pods of 50%.
|
||||
The Deployment then updates the ReplicaSet - this is part of how all Deployments work in Kubernetes -
|
||||
and then the ReplicaSet either adds or removes Pods based on the change to its `.spec`.
|
||||
|
||||
Now that the server is running, we will create the autoscaler using
|
||||
[kubectl autoscale](/docs/reference/generated/kubectl/kubectl-commands#autoscale).
|
||||
The following command will create a Horizontal Pod Autoscaler that maintains between 1 and 10 replicas of the Pods
|
||||
controlled by the php-apache deployment we created in the first step of these instructions.
|
||||
Roughly speaking, HPA will increase and decrease the number of replicas
|
||||
(via the deployment) to maintain an average CPU utilization across all Pods of 50%.
|
||||
Since each pod requests 200 milli-cores by `kubectl run`, this means an average CPU usage of 100 milli-cores.
|
||||
See [here](/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details) for more details on the algorithm.
|
||||
See [Algorithm details](/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details) for more details
|
||||
on the algorithm.
|
||||
|
||||
|
||||
Create the HorizontalPodAutoscaler:
|
||||
|
||||
```shell
|
||||
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
|
||||
|
@ -94,47 +124,64 @@ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
|
|||
horizontalpodautoscaler.autoscaling/php-apache autoscaled
|
||||
```
|
||||
|
||||
We may check the current status of autoscaler by running:
|
||||
You can check the current status of the newly-made HorizontalPodAutoscaler, by running:
|
||||
|
||||
```shell
|
||||
# You can use "hpa" or "horizontalpodautoscaler"; either name works OK.
|
||||
kubectl get hpa
|
||||
```
|
||||
|
||||
The output is similar to:
|
||||
```
|
||||
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
|
||||
php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 18s
|
||||
```
|
||||
|
||||
Please note that the current CPU consumption is 0% as we are not sending any requests to the server
|
||||
(the ``TARGET`` column shows the average across all the pods controlled by the corresponding deployment).
|
||||
(if you see other HorizontalPodAutoscalers with different names, that means they already existed,
|
||||
and isn't usually a problem).
|
||||
|
||||
## Increase load
|
||||
Please note that the current CPU consumption is 0% as there are no clients sending requests to the server
|
||||
(the ``TARGET`` column shows the average across all the Pods controlled by the corresponding deployment).
|
||||
|
||||
Now, we will see how the autoscaler reacts to increased load.
|
||||
We will start a container, and send an infinite loop of queries to the php-apache service (please run it in a different terminal):
|
||||
## Increase the load {#increase-load}
|
||||
|
||||
Next, see how the autoscaler reacts to increased load.
|
||||
To do this, you'll start a different Pod to act as a client. The container within the client Pod
|
||||
runs in an infinite loop, sending queries to the php-apache service.
|
||||
|
||||
```shell
|
||||
# Run this in a separate terminal
|
||||
# so that the load generation continues and you can carry on with the rest of the steps
|
||||
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
|
||||
```
|
||||
|
||||
Within a minute or so, we should see the higher CPU load by executing:
|
||||
|
||||
Now run:
|
||||
```shell
|
||||
kubectl get hpa
|
||||
# type Ctrl+C to end the watch when you're ready
|
||||
kubectl get hpa php-apache --watch
|
||||
```
|
||||
|
||||
Within a minute or so, you should see the higher CPU load; for example:
|
||||
|
||||
```
|
||||
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
|
||||
php-apache Deployment/php-apache/scale 305% / 50% 1 10 1 3m
|
||||
```
|
||||
|
||||
and then, more replicas. For example:
|
||||
```
|
||||
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
|
||||
php-apache Deployment/php-apache/scale 305% / 50% 1 10 7 3m
|
||||
```
|
||||
|
||||
Here, CPU consumption has increased to 305% of the request.
|
||||
As a result, the deployment was resized to 7 replicas:
|
||||
As a result, the Deployment was resized to 7 replicas:
|
||||
|
||||
```shell
|
||||
kubectl get deployment php-apache
|
||||
```
|
||||
|
||||
You should see the replica count matching the figure from the HorizontalPodAutoscaler
|
||||
```
|
||||
NAME READY UP-TO-DATE AVAILABLE AGE
|
||||
php-apache 7/7 7 7 19m
|
||||
|
@ -146,24 +193,29 @@ of load is not controlled in any way it may happen that the final number of repl
|
|||
will differ from this example.
|
||||
{{< /note >}}
|
||||
|
||||
## Stop load
|
||||
## Stop generating load {#stop-load}
|
||||
|
||||
We will finish our example by stopping the user load.
|
||||
To finish the example, stop sending the load.
|
||||
|
||||
In the terminal where we created the container with `busybox` image, terminate
|
||||
In the terminal where you created the Pod that runs a `busybox` image, terminate
|
||||
the load generation by typing `<Ctrl> + C`.
|
||||
|
||||
Then we will verify the result state (after a minute or so):
|
||||
Then verify the result state (after a minute or so):
|
||||
|
||||
```shell
|
||||
kubectl get hpa
|
||||
# type Ctrl+C to end the watch when you're ready
|
||||
kubectl get hpa php-apache --watch
|
||||
```
|
||||
|
||||
The output is similar to:
|
||||
|
||||
```
|
||||
NAME REFERENCE TARGET MINPODS MAXPODS REPLICAS AGE
|
||||
php-apache Deployment/php-apache/scale 0% / 50% 1 10 1 11m
|
||||
```
|
||||
|
||||
and the Deployment also shows that it has scaled down:
|
||||
|
||||
```shell
|
||||
kubectl get deployment php-apache
|
||||
```
|
||||
|
@ -173,20 +225,18 @@ NAME READY UP-TO-DATE AVAILABLE AGE
|
|||
php-apache 1/1 1 1 27m
|
||||
```
|
||||
|
||||
Here CPU utilization dropped to 0, and so HPA autoscaled the number of replicas back down to 1.
|
||||
Once CPU utilization dropped to 0, the HPA automatically scaled the number of replicas back down to 1.
|
||||
|
||||
{{< note >}}
|
||||
Autoscaling the replicas may take a few minutes.
|
||||
{{< /note >}}
|
||||
|
||||
<!-- discussion -->
|
||||
|
||||
## Autoscaling on multiple metrics and custom metrics
|
||||
|
||||
You can introduce additional metrics to use when autoscaling the `php-apache` Deployment
|
||||
by making use of the `autoscaling/v2beta2` API version.
|
||||
by making use of the `autoscaling/v2` API version.
|
||||
|
||||
First, get the YAML of your HorizontalPodAutoscaler in the `autoscaling/v2beta2` form:
|
||||
First, get the YAML of your HorizontalPodAutoscaler in the `autoscaling/v2` form:
|
||||
|
||||
```shell
|
||||
kubectl get hpa php-apache -o yaml > /tmp/hpa-v2.yaml
|
||||
|
@ -195,7 +245,7 @@ kubectl get hpa php-apache -o yaml > /tmp/hpa-v2.yaml
|
|||
Open the `/tmp/hpa-v2.yaml` file in an editor, and you should see YAML which looks like this:
|
||||
|
||||
```yaml
|
||||
apiVersion: autoscaling/v2beta2
|
||||
apiVersion: autoscaling/v2
|
||||
kind: HorizontalPodAutoscaler
|
||||
metadata:
|
||||
name: php-apache
|
||||
|
@ -287,7 +337,7 @@ For example, if you had your monitoring system collecting metrics about network
|
|||
you could update the definition above using `kubectl edit` to look like this:
|
||||
|
||||
```yaml
|
||||
apiVersion: autoscaling/v2beta2
|
||||
apiVersion: autoscaling/v2
|
||||
kind: HorizontalPodAutoscaler
|
||||
metadata:
|
||||
name: php-apache
|
||||
|
@ -411,7 +461,7 @@ access to any metric, so cluster administrators should take care when exposing i
|
|||
|
||||
## Appendix: Horizontal Pod Autoscaler Status Conditions
|
||||
|
||||
When using the `autoscaling/v2beta2` form of the HorizontalPodAutoscaler, you will be able to see
|
||||
When using the `autoscaling/v2` form of the HorizontalPodAutoscaler, you will be able to see
|
||||
*status conditions* set by Kubernetes on the HorizontalPodAutoscaler. These status conditions indicate
|
||||
whether or not the HorizontalPodAutoscaler is able to scale, and whether or not it is currently restricted
|
||||
in any way.
|
||||
|
@ -444,7 +494,7 @@ Conditions:
|
|||
Events:
|
||||
```
|
||||
|
||||
For this HorizontalPodAutoscaler, we can see several conditions in a healthy state. The first,
|
||||
For this HorizontalPodAutoscaler, you can see several conditions in a healthy state. The first,
|
||||
`AbleToScale`, indicates whether or not the HPA is able to fetch and update scales, as well as
|
||||
whether or not any backoff-related conditions would prevent scaling. The second, `ScalingActive`,
|
||||
indicates whether or not the HPA is enabled (i.e. the replica count of the target is not zero) and
|
||||
|
@ -454,7 +504,7 @@ was capped by the maximum or minimum of the HorizontalPodAutoscaler. This is an
|
|||
you may wish to raise or lower the minimum or maximum replica count constraints on your
|
||||
HorizontalPodAutoscaler.
|
||||
|
||||
## Appendix: Quantities
|
||||
## Quantities
|
||||
|
||||
All metrics in the HorizontalPodAutoscaler and metrics APIs are specified using
|
||||
a special whole-number notation known in Kubernetes as a
|
||||
|
@ -464,16 +514,16 @@ will return whole numbers without a suffix when possible, and will generally ret
|
|||
quantities in milli-units otherwise. This means you might see your metric value fluctuate
|
||||
between `1` and `1500m`, or `1` and `1.5` when written in decimal notation.
|
||||
|
||||
## Appendix: Other possible scenarios
|
||||
## Other possible scenarios
|
||||
|
||||
### Creating the autoscaler declaratively
|
||||
|
||||
Instead of using `kubectl autoscale` command to create a HorizontalPodAutoscaler imperatively we
|
||||
can use the following file to create it declaratively:
|
||||
can use the following manifest to create it declaratively:
|
||||
|
||||
{{< codenew file="application/hpa/php-apache.yaml" >}}
|
||||
|
||||
We will create the autoscaler by executing the following command:
|
||||
Then, create the autoscaler by executing the following command:
|
||||
|
||||
```shell
|
||||
kubectl create -f https://k8s.io/examples/application/hpa/php-apache.yaml
|
||||
|
|
|
@ -3,41 +3,59 @@ reviewers:
|
|||
- fgrzadkowski
|
||||
- jszczepkowski
|
||||
- directxman12
|
||||
title: Horizontal Pod Autoscaler
|
||||
title: Horizontal Pod Autoscaling
|
||||
feature:
|
||||
title: Horizontal scaling
|
||||
description: >
|
||||
Scale your application up and down with a simple command, with a UI, or automatically based on CPU usage.
|
||||
|
||||
content_type: concept
|
||||
weight: 90
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
The Horizontal Pod Autoscaler automatically scales the number of Pods
|
||||
in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with
|
||||
[custom metrics](https://git.k8s.io/community/contributors/design-proposals/instrumentation/custom-metrics-api.md)
|
||||
support, on some other application-provided metrics). Note that Horizontal
|
||||
Pod Autoscaling does not apply to objects that can't be scaled, for example, DaemonSets.
|
||||
In Kubernetes, a _HorizontalPodAutoscaler_ automatically updates a workload resource (such as
|
||||
a {{< glossary_tooltip text="Deployment" term_id="deployment" >}} or
|
||||
{{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}}), with the
|
||||
aim of automatically scaling the workload to match demand.
|
||||
|
||||
The Horizontal Pod Autoscaler is implemented as a Kubernetes API resource and a controller.
|
||||
Horizontal scaling means that the response to increased load is to deploy more
|
||||
{{< glossary_tooltip text="Pods" term_id="pod" >}}.
|
||||
This is different from _vertical_ scaling, which for Kubernetes would mean
|
||||
assigning more resources (for example: memory or CPU) to the Pods that are already
|
||||
running for the workload.
|
||||
|
||||
If the load decreases, and the number of Pods is above the configured minimum,
|
||||
the HorizontalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet,
|
||||
or other similar resource) to scale back down.
|
||||
|
||||
Horizontal pod autoscaling does not apply to objects that can't be scaled (for example:
|
||||
a {{< glossary_tooltip text="DaemonSet" term_id="daemonset" >}}.)
|
||||
|
||||
The HorizontalPodAutoscaler is implemented as a Kubernetes API resource and a
|
||||
{{< glossary_tooltip text="controller" term_id="controller" >}}.
|
||||
The resource determines the behavior of the controller.
|
||||
The controller periodically adjusts the number of replicas in a replication controller or deployment to match the observed metrics such as average CPU utilisation, average memory utilisation or any other custom metric to the target specified by the user.
|
||||
|
||||
The horizontal pod autoscaling controller, running within the Kubernetes
|
||||
{{< glossary_tooltip text="control plane" term_id="control-plane" >}}, periodically adjusts the
|
||||
desired scale of its target (for example, a Deployment) to match observed metrics such as average
|
||||
CPU utilization, average memory utilization, or any other custom metric you specify.
|
||||
|
||||
There is [walkthrough example](/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/) of using
|
||||
horizontal pod autoscaling.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## How does the Horizontal Pod Autoscaler work?
|
||||
## How does a HorizontalPodAutoscaler work?
|
||||
|
||||

|
||||
{{< figure src="/images/docs/horizontal-pod-autoscaler.svg" caption="HorizontalPodAutoscaler controls the scale of a Deployment and its ReplicaSet" class="diagram-medium">}}
|
||||
|
||||
The Horizontal Pod Autoscaler is implemented as a control loop, with a period controlled
|
||||
by the controller manager's `--horizontal-pod-autoscaler-sync-period` flag (with a default
|
||||
value of 15 seconds).
|
||||
Kubernetes implements horizontal pod autoscaling as a control loop that runs intermittently
|
||||
(it is not a continuous process). The interval is set by the
|
||||
`--horizontal-pod-autoscaler-sync-period` parameter to the
|
||||
[`kube-controller-manager`](/docs/reference/command-line-tools-reference/kube-controller-manager/)
|
||||
(and the default interval is 15 seconds).
|
||||
|
||||
During each period, the controller manager queries the resource utilization against the
|
||||
Once during each period, the controller manager queries the resource utilization against the
|
||||
metrics specified in each HorizontalPodAutoscaler definition. The controller manager
|
||||
obtains the metrics from either the resource metrics API (for per-pod resource metrics),
|
||||
or the custom metrics API (for all other metrics).
|
||||
|
@ -45,41 +63,46 @@ or the custom metrics API (for all other metrics).
|
|||
* For per-pod resource metrics (like CPU), the controller fetches the metrics
|
||||
from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler.
|
||||
Then, if a target utilization value is set, the controller calculates the utilization
|
||||
value as a percentage of the equivalent resource request on the containers in
|
||||
each Pod. If a target raw value is set, the raw metric values are used directly.
|
||||
value as a percentage of the equivalent
|
||||
[resource request](/docs/concepts/configuration/manage-resources-containers/#requests-and-limits)
|
||||
on the containers in each Pod. If a target raw value is set, the raw metric values are used directly.
|
||||
The controller then takes the mean of the utilization or the raw value (depending on the type
|
||||
of target specified) across all targeted Pods, and produces a ratio used to scale
|
||||
the number of desired replicas.
|
||||
|
||||
Please note that if some of the Pod's containers do not have the relevant resource request set,
|
||||
CPU utilization for the Pod will not be defined and the autoscaler will
|
||||
not take any action for that metric. See the [algorithm
|
||||
details](#algorithm-details) section below for more information about
|
||||
how the autoscaling algorithm works.
|
||||
not take any action for that metric. See the [algorithm details](#algorithm-details) section below
|
||||
for more information about how the autoscaling algorithm works.
|
||||
|
||||
* For per-pod custom metrics, the controller functions similarly to per-pod resource metrics,
|
||||
except that it works with raw values, not utilization values.
|
||||
|
||||
* For object metrics and external metrics, a single metric is fetched, which describes
|
||||
the object in question. This metric is compared to the target
|
||||
value, to produce a ratio as above. In the `autoscaling/v2beta2` API
|
||||
value, to produce a ratio as above. In the `autoscaling/v2` API
|
||||
version, this value can optionally be divided by the number of Pods before the
|
||||
comparison is made.
|
||||
|
||||
The HorizontalPodAutoscaler normally fetches metrics from a series of aggregated APIs (`metrics.k8s.io`,
|
||||
`custom.metrics.k8s.io`, and `external.metrics.k8s.io`). The `metrics.k8s.io` API is usually provided by
|
||||
metrics-server, which needs to be launched separately. For more information about resource metrics, see [Metrics Server](/docs/tasks/debug-application-cluster/resource-metrics-pipeline/#metrics-server).
|
||||
The common use for HorizontalPodAutoscaler is to configure it to fetch metrics from
|
||||
{{< glossary_tooltip text="aggregated APIs" term_id="aggregation-layer" >}}
|
||||
(`metrics.k8s.io`, `custom.metrics.k8s.io`, or `external.metrics.k8s.io`). The `metrics.k8s.io` API is
|
||||
usually provided by an add on named Metrics Server, which needs to be launched separately.
|
||||
For more information about resource metrics, see
|
||||
[Metrics Server](/docs/tasks/debug-application-cluster/resource-metrics-pipeline/#metrics-server).
|
||||
|
||||
See [Support for metrics APIs](#support-for-metrics-apis) for more details.
|
||||
[Support for metrics APIs](#support-for-metrics-apis) explains the stability guarantees and support status for these
|
||||
different APIs.
|
||||
|
||||
The autoscaler accesses corresponding scalable controllers (such as replication controllers, deployments, and replica sets)
|
||||
by using the scale sub-resource. Scale is an interface that allows you to dynamically set the number of replicas and examine
|
||||
each of their current states. More details on scale sub-resource can be found
|
||||
[here](https://git.k8s.io/community/contributors/design-proposals/autoscaling/horizontal-pod-autoscaler.md#scale-subresource).
|
||||
The HorizontalPodAutoscaler controller accesses corresponding workload resources that support scaling (such as Deployments
|
||||
and StatefulSet). These resources each have a subresource named `scale`, an interface that allows you to dynamically set the
|
||||
number of replicas and examine each of their current states.
|
||||
For general information about subresources in the Kubernetes API, see
|
||||
[Kubernetes API Concepts](/docs/reference/using-api/api-concepts/).
|
||||
|
||||
### Algorithm Details
|
||||
### Algorithm details
|
||||
|
||||
From the most basic perspective, the Horizontal Pod Autoscaler controller
|
||||
From the most basic perspective, the HorizontalPodAutoscaler controller
|
||||
operates on the ratio between desired metric value and current metric
|
||||
value:
|
||||
|
||||
|
@ -89,26 +112,28 @@ desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricVal
|
|||
|
||||
For example, if the current metric value is `200m`, and the desired value
|
||||
is `100m`, the number of replicas will be doubled, since `200.0 / 100.0 ==
|
||||
2.0` If the current value is instead `50m`, we'll halve the number of
|
||||
replicas, since `50.0 / 100.0 == 0.5`. We'll skip scaling if the ratio is
|
||||
sufficiently close to 1.0 (within a globally-configurable tolerance, from
|
||||
the `--horizontal-pod-autoscaler-tolerance` flag, which defaults to 0.1).
|
||||
2.0` If the current value is instead `50m`, you'll halve the number of
|
||||
replicas, since `50.0 / 100.0 == 0.5`. The control plane skips any scaling
|
||||
action if the ratio is sufficiently close to 1.0 (within a globally-configurable
|
||||
tolerance, 0.1 by default).
|
||||
|
||||
When a `targetAverageValue` or `targetAverageUtilization` is specified,
|
||||
the `currentMetricValue` is computed by taking the average of the given
|
||||
metric across all Pods in the HorizontalPodAutoscaler's scale target.
|
||||
Before checking the tolerance and deciding on the final values, we take
|
||||
pod readiness and missing metrics into consideration, however.
|
||||
|
||||
All Pods with a deletion timestamp set (i.e. Pods in the process of being
|
||||
shut down) and all failed Pods are discarded.
|
||||
Before checking the tolerance and deciding on the final values, the control
|
||||
plane also considers whether any metrics are missing, and how many Pods
|
||||
are [`Ready`](/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions).
|
||||
All Pods with a deletion timestamp set (objects with a deletion timestamp are
|
||||
in the process of being shut down / removed) are ignored, and all failed Pods
|
||||
are discarded.
|
||||
|
||||
If a particular Pod is missing metrics, it is set aside for later; Pods
|
||||
with missing metrics will be used to adjust the final scaling amount.
|
||||
|
||||
When scaling on CPU, if any pod has yet to become ready (i.e. it's still
|
||||
initializing) *or* the most recent metric point for the pod was before it
|
||||
became ready, that pod is set aside as well.
|
||||
When scaling on CPU, if any pod has yet to become ready (it's still
|
||||
initializing, or possibly is unhealthy) *or* the most recent metric point for
|
||||
the pod was before it became ready, that pod is set aside as well.
|
||||
|
||||
Due to technical constraints, the HorizontalPodAutoscaler controller
|
||||
cannot exactly determine the first time a pod becomes ready when
|
||||
|
@ -124,20 +149,21 @@ default is 5 minutes.
|
|||
The `currentMetricValue / desiredMetricValue` base scale ratio is then
|
||||
calculated using the remaining pods not set aside or discarded from above.
|
||||
|
||||
If there were any missing metrics, we recompute the average more
|
||||
If there were any missing metrics, the control plane recomputes the average more
|
||||
conservatively, assuming those pods were consuming 100% of the desired
|
||||
value in case of a scale down, and 0% in case of a scale up. This dampens
|
||||
the magnitude of any potential scale.
|
||||
|
||||
Furthermore, if any not-yet-ready pods were present, and we would have
|
||||
scaled up without factoring in missing metrics or not-yet-ready pods, we
|
||||
conservatively assume the not-yet-ready pods are consuming 0% of the
|
||||
desired metric, further dampening the magnitude of a scale up.
|
||||
Furthermore, if any not-yet-ready pods were present, and the workload would have
|
||||
scaled up without factoring in missing metrics or not-yet-ready pods,
|
||||
the controller conservatively assumes that the not-yet-ready pods are consuming 0%
|
||||
of the desired metric, further dampening the magnitude of a scale up.
|
||||
|
||||
After factoring in the not-yet-ready pods and missing metrics, we
|
||||
recalculate the usage ratio. If the new ratio reverses the scale
|
||||
direction, or is within the tolerance, we skip scaling. Otherwise, we use
|
||||
the new ratio to scale.
|
||||
After factoring in the not-yet-ready pods and missing metrics, the
|
||||
controller recalculates the usage ratio. If the new ratio reverses the scale
|
||||
direction, or is within the tolerance, the controller doesn't take any scaling
|
||||
action. In other cases, the new ratio is used to decide any change to the
|
||||
number of Pods.
|
||||
|
||||
Note that the *original* value for the average utilization is reported
|
||||
back via the HorizontalPodAutoscaler status, without factoring in the
|
||||
|
@ -161,32 +187,25 @@ fluctuating metric values.
|
|||
|
||||
## API Object
|
||||
|
||||
The Horizontal Pod Autoscaler is an API resource in the Kubernetes `autoscaling` API group.
|
||||
The current stable version, which only includes support for CPU autoscaling,
|
||||
can be found in the `autoscaling/v1` API version.
|
||||
|
||||
The beta version, which includes support for scaling on memory and custom metrics,
|
||||
can be found in `autoscaling/v2beta2`. The new fields introduced in `autoscaling/v2beta2`
|
||||
are preserved as annotations when working with `autoscaling/v1`.
|
||||
The Horizontal Pod Autoscaler is an API resource in the Kubernetes
|
||||
`autoscaling` API group. The current stable version can be found in
|
||||
the `autoscaling/v2` API version which includes support for scaling on
|
||||
memory and custom metrics. The new fields introduced in
|
||||
`autoscaling/v2` are preserved as annotations when working with
|
||||
`autoscaling/v1`.
|
||||
|
||||
When you create a HorizontalPodAutoscaler API object, make sure the name specified is a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
More details about the API object can be found at
|
||||
[HorizontalPodAutoscaler Object](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#horizontalpodautoscaler-v1-autoscaling).
|
||||
[HorizontalPodAutoscaler Object](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#horizontalpodautoscaler-v2-autoscaling).
|
||||
|
||||
## Stability of workload scale {#flapping}
|
||||
|
||||
## Support for Horizontal Pod Autoscaler in kubectl
|
||||
When managing the scale of a group of replicas using the HorizontalPodAutoscaler,
|
||||
it is possible that the number of replicas keeps fluctuating frequently due to the
|
||||
dynamic nature of the metrics evaluated. This is sometimes referred to as *thrashing*,
|
||||
or *flapping*. It's similar to the concept of *hysteresis* in cybernetics.
|
||||
|
||||
Horizontal Pod Autoscaler, like every API resource, is supported in a standard way by `kubectl`.
|
||||
We can create a new autoscaler using `kubectl create` command.
|
||||
We can list autoscalers by `kubectl get hpa` and get detailed description by `kubectl describe hpa`.
|
||||
Finally, we can delete an autoscaler using `kubectl delete hpa`.
|
||||
|
||||
In addition, there is a special `kubectl autoscale` command for creating a HorizontalPodAutoscaler object.
|
||||
For instance, executing `kubectl autoscale rs foo --min=2 --max=5 --cpu-percent=80`
|
||||
will create an autoscaler for replication set *foo*, with target CPU utilization set to `80%`
|
||||
and the number of replicas between 2 and 5.
|
||||
The detailed documentation of `kubectl autoscale` can be found [here](/docs/reference/generated/kubectl/kubectl-commands/#autoscale).
|
||||
|
||||
|
||||
## Autoscaling during rolling update
|
||||
|
@ -203,31 +222,6 @@ If you perform a rolling update of a StatefulSet that has an autoscaled number o
|
|||
replicas, the StatefulSet directly manages its set of Pods (there is no intermediate resource
|
||||
similar to ReplicaSet).
|
||||
|
||||
## Support for cooldown/delay
|
||||
|
||||
When managing the scale of a group of replicas using the Horizontal Pod Autoscaler,
|
||||
it is possible that the number of replicas keeps fluctuating frequently due to the
|
||||
dynamic nature of the metrics evaluated. This is sometimes referred to as *thrashing*.
|
||||
|
||||
Starting from v1.6, a cluster operator can mitigate this problem by tuning
|
||||
the global HPA settings exposed as flags for the `kube-controller-manager` component:
|
||||
|
||||
Starting from v1.12, a new algorithmic update removes the need for the
|
||||
upscale delay.
|
||||
|
||||
- `--horizontal-pod-autoscaler-downscale-stabilization`: Specifies the duration of the
|
||||
downscale stabilization time window. Horizontal Pod Autoscaler remembers
|
||||
the historical recommended sizes and only acts on the largest size within this time window.
|
||||
The default value is 5 minutes (`5m0s`).
|
||||
|
||||
{{< note >}}
|
||||
When tuning these parameter values, a cluster operator should be aware of the possible
|
||||
consequences. If the delay (cooldown) value is set too long, there could be complaints
|
||||
that the Horizontal Pod Autoscaler is not responsive to workload changes. However, if
|
||||
the delay value is set too short, the scale of the replicas set may keep thrashing as
|
||||
usual.
|
||||
{{< /note >}}
|
||||
|
||||
## Support for resource metrics
|
||||
|
||||
Any HPA target can be scaled based on the resource usage of the pods in the scaling target.
|
||||
|
@ -256,11 +250,11 @@ a single container might be running with high usage and the HPA will not scale o
|
|||
pod usage is still within acceptable limits.
|
||||
{{< /note >}}
|
||||
|
||||
### Container Resource Metrics
|
||||
### Container resource metrics
|
||||
|
||||
{{< feature-state for_k8s_version="v1.20" state="alpha" >}}
|
||||
|
||||
`HorizontalPodAutoscaler` also supports a container metric source where the HPA can track the
|
||||
The HorizontalPodAutoscaler API also supports a container metric source where the HPA can track the
|
||||
resource usage of individual containers across a set of Pods, in order to scale the target resource.
|
||||
This lets you configure scaling thresholds for the containers that matter most in a particular Pod.
|
||||
For example, if you have a web application and a logging sidecar, you can scale based on the resource
|
||||
|
@ -272,6 +266,7 @@ scaling. If the specified container in the metric source is not present or only
|
|||
of the pods then those pods are ignored and the recommendation is recalculated. See [Algorithm](#algorithm-details)
|
||||
for more details about the calculation. To use container resources for autoscaling define a metric
|
||||
source as follows:
|
||||
|
||||
```yaml
|
||||
type: ContainerResource
|
||||
containerResource:
|
||||
|
@ -297,28 +292,32 @@ Once you have rolled out the container name change to the workload resource, tid
|
|||
the old container name from the HPA specification.
|
||||
{{< /note >}}
|
||||
|
||||
## Support for multiple metrics
|
||||
|
||||
Kubernetes 1.6 adds support for scaling based on multiple metrics. You can use the `autoscaling/v2beta2` API
|
||||
version to specify multiple metrics for the Horizontal Pod Autoscaler to scale on. Then, the Horizontal Pod
|
||||
Autoscaler controller will evaluate each metric, and propose a new scale based on that metric. The largest of the
|
||||
proposed scales will be used as the new scale.
|
||||
## Scaling on custom metrics
|
||||
|
||||
## Support for custom metrics
|
||||
{{< feature-state for_k8s_version="v1.23" state="stable" >}}
|
||||
|
||||
{{< note >}}
|
||||
Kubernetes 1.2 added alpha support for scaling based on application-specific metrics using special annotations.
|
||||
Support for these annotations was removed in Kubernetes 1.6 in favor of the new autoscaling API. While the old method for collecting
|
||||
custom metrics is still available, these metrics will not be available for use by the Horizontal Pod Autoscaler, and the former
|
||||
annotations for specifying which custom metrics to scale on are no longer honored by the Horizontal Pod Autoscaler controller.
|
||||
{{< /note >}}
|
||||
(the `autoscaling/v2beta2` API version previously provided this ability as a beta feature)
|
||||
|
||||
Kubernetes 1.6 adds support for making use of custom metrics in the Horizontal Pod Autoscaler.
|
||||
You can add custom metrics for the Horizontal Pod Autoscaler to use in the `autoscaling/v2beta2` API.
|
||||
Kubernetes then queries the new custom metrics API to fetch the values of the appropriate custom metrics.
|
||||
Provided that you use the `autoscaling/v2` API version, you can configure a HorizontalPodAutoscaler
|
||||
to scale based on a custom metric (that is not built in to Kubernetes or any Kubernetes component).
|
||||
The HorizontalPodAutoscaler controller then queries for these custom metrics from the Kubernetes
|
||||
API.
|
||||
|
||||
See [Support for metrics APIs](#support-for-metrics-apis) for the requirements.
|
||||
|
||||
## Scaling on multiple metrics
|
||||
|
||||
{{< feature-state for_k8s_version="v1.23" state="stable" >}}
|
||||
|
||||
(the `autoscaling/v2beta2` API version previously provided this ability as a beta feature)
|
||||
|
||||
Provided that you use the `autoscaling/v2` API version, you can specify multiple metrics for a
|
||||
HorizontalPodAutoscaler to scale on. Then, the HorizontalPodAutoscaler controller evaluates each metric,
|
||||
and proposes a new scale based on that metric. The HorizontalPodAutoscaler takes the maximum scale
|
||||
recommended for each metric and sets the workload to that size (provided that this isn't larger than the
|
||||
overall maximum that you configured).
|
||||
|
||||
## Support for metrics APIs
|
||||
|
||||
By default, the HorizontalPodAutoscaler controller retrieves metrics from a series of APIs. In order for it to access these
|
||||
|
@ -332,8 +331,7 @@ APIs, cluster administrators must ensure that:
|
|||
It can be launched as a cluster addon.
|
||||
|
||||
* For custom metrics, this is the `custom.metrics.k8s.io` API. It's provided by "adapter" API servers provided by metrics solution vendors.
|
||||
Check with your metrics pipeline, or the [list of known solutions](https://github.com/kubernetes/metrics/blob/master/IMPLEMENTATIONS.md#custom-metrics-api).
|
||||
If you would like to write your own, check out the [boilerplate](https://github.com/kubernetes-sigs/custom-metrics-apiserver) to get started.
|
||||
Check with your metrics pipeline to see if there is a Kubernetes metrics adapter available.
|
||||
|
||||
* For external metrics, this is the `external.metrics.k8s.io` API. It may be provided by the custom metrics adapters provided above.
|
||||
|
||||
|
@ -345,18 +343,23 @@ and [external.metrics.k8s.io](https://github.com/kubernetes/community/blob/maste
|
|||
For examples of how to use them see [the walkthrough for using custom metrics](/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-multiple-metrics-and-custom-metrics)
|
||||
and [the walkthrough for using external metrics](/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-metrics-not-related-to-kubernetes-objects).
|
||||
|
||||
## Support for configurable scaling behavior
|
||||
## Configurable scaling behavior
|
||||
|
||||
Starting from
|
||||
[v1.18](https://github.com/kubernetes/enhancements/blob/master/keps/sig-autoscaling/853-configurable-hpa-scale-velocity/README.md)
|
||||
the `v2beta2` API allows scaling behavior to be configured through the HPA
|
||||
`behavior` field. Behaviors are specified separately for scaling up and down in
|
||||
`scaleUp` or `scaleDown` section under the `behavior` field. A stabilization
|
||||
window can be specified for both directions which prevents the flapping of the
|
||||
number of the replicas in the scaling target. Similarly specifying scaling
|
||||
policies controls the rate of change of replicas while scaling.
|
||||
{{< feature-state for_k8s_version="v1.23" state="stable" >}}
|
||||
|
||||
### Scaling Policies
|
||||
(the `autoscaling/v2beta2` API version previously provided this ability as a beta feature)
|
||||
|
||||
If you use the `v2` HorizontalPodAutoscaler API, you can use the `behavior` field
|
||||
(see the [API reference](/docs/reference/kubernetes-api/workload-resources/horizontal-pod-autoscaler-v2/#HorizontalPodAutoscalerSpec))
|
||||
to configure separate scale-up and scale-down behaviors.
|
||||
You specify these behaviours by setting `scaleUp` and / or `scaleDown`
|
||||
under the `behavior` field.
|
||||
|
||||
You can specify a _stabilization window_ that prevents [flapping](#flapping)
|
||||
the replica count for a scaling target. Scaling policies also let you controls the
|
||||
rate of change of replicas while scaling.
|
||||
|
||||
### Scaling policies
|
||||
|
||||
One or more scaling policies can be specified in the `behavior` section of the spec.
|
||||
When multiple policies are specified the policy which allows the highest amount of
|
||||
|
@ -393,21 +396,27 @@ direction. By setting the value to `Min` which would select the policy which all
|
|||
smallest change in the replica count. Setting the value to `Disabled` completely disables
|
||||
scaling in that direction.
|
||||
|
||||
### Stabilization Window
|
||||
### Stabilization window
|
||||
|
||||
The stabilization window is used to restrict the flapping of replicas when the metrics
|
||||
used for scaling keep fluctuating. The stabilization window is used by the autoscaling
|
||||
algorithm to consider the computed desired state from the past to prevent scaling. In
|
||||
the following example the stabilization window is specified for `scaleDown`.
|
||||
The stabilization window is used to restrict the [flapping](#flapping) of
|
||||
replicas count when the metrics used for scaling keep fluctuating. The autoscaling algorithm
|
||||
uses this window to infer a previous desired state and avoid unwanted changes to workload
|
||||
scale.
|
||||
|
||||
For example, in the following example snippet, a stabilization window is specified for `scaleDown`.
|
||||
|
||||
```yaml
|
||||
scaleDown:
|
||||
stabilizationWindowSeconds: 300
|
||||
behavior:
|
||||
scaleDown:
|
||||
stabilizationWindowSeconds: 300
|
||||
```
|
||||
|
||||
When the metrics indicate that the target should be scaled down the algorithm looks
|
||||
into previously computed desired states and uses the highest value from the specified
|
||||
interval. In above example all desired states from the past 5 minutes will be considered.
|
||||
into previously computed desired states, and uses the highest value from the specified
|
||||
interval. In the above example, all desired states from the past 5 minutes will be considered.
|
||||
|
||||
This approximates a rolling maximum, and avoids having the scaling algorithm frequently
|
||||
remove Pods only to trigger recreating an equivalent Pod just moments later.
|
||||
|
||||
### Default Behavior
|
||||
|
||||
|
@ -495,6 +504,18 @@ behavior:
|
|||
selectPolicy: Disabled
|
||||
```
|
||||
|
||||
## Support for HorizontalPodAutoscaler in kubectl
|
||||
|
||||
HorizontalPodAutoscaler, like every API resource, is supported in a standard way by `kubectl`.
|
||||
You can create a new autoscaler using `kubectl create` command.
|
||||
You can list autoscalers by `kubectl get hpa` or get detailed description by `kubectl describe hpa`.
|
||||
Finally, you can delete an autoscaler using `kubectl delete hpa`.
|
||||
|
||||
In addition, there is a special `kubectl autoscale` command for creating a HorizontalPodAutoscaler object.
|
||||
For instance, executing `kubectl autoscale rs foo --min=2 --max=5 --cpu-percent=80`
|
||||
will create an autoscaler for replication set *foo*, with target CPU utilization set to `80%`
|
||||
and the number of replicas between 2 and 5.
|
||||
|
||||
## Implicit maintenance-mode deactivation
|
||||
|
||||
You can implicitly deactivate the HPA for a target without the
|
||||
|
@ -545,6 +566,13 @@ guidelines, which cover this exact use case.
|
|||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
* Design documentation: [Horizontal Pod Autoscaling](https://git.k8s.io/community/contributors/design-proposals/autoscaling/horizontal-pod-autoscaler.md).
|
||||
* kubectl autoscale command: [kubectl autoscale](/docs/reference/generated/kubectl/kubectl-commands/#autoscale).
|
||||
* Usage example of [Horizontal Pod Autoscaler](/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/).
|
||||
If you configure autoscaling in your cluster, you may also want to consider running a
|
||||
cluster-level autoscaler such as [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler).
|
||||
|
||||
For more information on HorizontalPodAutoscaler:
|
||||
|
||||
* Read a [walkthrough example](/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/) for horizontal pod autoscaling.
|
||||
* Read documentation for [`kubectl autoscale`](/docs/reference/generated/kubectl/kubectl-commands/#autoscale).
|
||||
* If you would like to write your own custom metrics adapter, check out the
|
||||
[boilerplate](https://github.com/kubernetes-sigs/custom-metrics-apiserver) to get started.
|
||||
* Read the [API reference](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/horizontal-pod-autoscaler/) for HorizontalPodAutoscaler.
|
||||
|
|
|
@ -0,0 +1,23 @@
|
|||
---
|
||||
title: "PowerShell auto-completion"
|
||||
description: "Some optional configuration for powershell auto-completion."
|
||||
headless: true
|
||||
---
|
||||
|
||||
The kubectl completion script for PowerShell can be generated with the command `kubectl completion powershell`.
|
||||
|
||||
To do so in all your shell sessions, add the following line to your `$PROFILE` file:
|
||||
|
||||
```powershell
|
||||
kubectl completion powershell | Out-String | Invoke-Expression
|
||||
```
|
||||
|
||||
This command will regenerate the auto-completion script on every PowerShell start up. You can also add the generated script directly to your `$PROFILE` file.
|
||||
|
||||
To add the generated script to your `$PROFILE` file, run the following line in your powershell prompt:
|
||||
|
||||
```powershell
|
||||
kubectl completion powershell >> $PROFILE
|
||||
```
|
||||
|
||||
After reloading your shell, kubectl autocompletion should be working.
|
|
@ -176,7 +176,7 @@ kubectl version --client
|
|||
|
||||
### Enable shell autocompletion
|
||||
|
||||
kubectl provides autocompletion support for Bash and Zsh, which can save you a lot of typing.
|
||||
kubectl provides autocompletion support for Bash, Zsh, Fish, and PowerShell, which can save you a lot of typing.
|
||||
|
||||
Below are the procedures to set up autocompletion for Bash and Zsh.
|
||||
|
||||
|
|
|
@ -159,7 +159,7 @@ If you are on macOS and using [Macports](https://macports.org/) package manager,
|
|||
|
||||
### Enable shell autocompletion
|
||||
|
||||
kubectl provides autocompletion support for Bash and Zsh, which can save you a lot of typing.
|
||||
kubectl provides autocompletion support for Bash, Zsh, Fish, and PowerShell which can save you a lot of typing.
|
||||
|
||||
Below are the procedures to set up autocompletion for Bash and Zsh.
|
||||
|
||||
|
|
|
@ -22,7 +22,6 @@ The following methods exist for installing kubectl on Windows:
|
|||
- [Install kubectl binary with curl on Windows](#install-kubectl-binary-with-curl-on-windows)
|
||||
- [Install on Windows using Chocolatey or Scoop](#install-on-windows-using-chocolatey-or-scoop)
|
||||
|
||||
|
||||
### Install kubectl binary with curl on Windows
|
||||
|
||||
1. Download the [latest release {{< param "fullversion" >}}](https://dl.k8s.io/release/{{< param "fullversion" >}}/bin/windows/amd64/kubectl.exe).
|
||||
|
@ -134,11 +133,11 @@ Edit the config file with a text editor of your choice, such as Notepad.
|
|||
|
||||
### Enable shell autocompletion
|
||||
|
||||
kubectl provides autocompletion support for Bash and Zsh, which can save you a lot of typing.
|
||||
kubectl provides autocompletion support for Bash, Zsh, Fish, and PowerShell, which can save you a lot of typing.
|
||||
|
||||
Below are the procedures to set up autocompletion for Zsh, if you are running that on Windows.
|
||||
Below are the procedures to set up autocompletion for PowerShell.
|
||||
|
||||
{{< include "included/optional-kubectl-configs-zsh.md" >}}
|
||||
{{< include "included/optional-kubectl-configs-pwsh.md" >}}
|
||||
|
||||
### Install `kubectl convert` plugin
|
||||
|
||||
|
|
|
@ -17,7 +17,7 @@ Seccomp stands for secure computing mode and has been a feature of the Linux
|
|||
kernel since version 2.6.12. It can be used to sandbox the privileges of a
|
||||
process, restricting the calls it is able to make from userspace into the
|
||||
kernel. Kubernetes lets you automatically apply seccomp profiles loaded onto a
|
||||
Node to your Pods and containers.
|
||||
{{< glossary_tooltip text="node" term_id="node" >}} to your Pods and containers.
|
||||
|
||||
Identifying the privileges required for your workloads can be difficult. In this
|
||||
tutorial, you will go through how to load seccomp profiles into a local
|
||||
|
@ -36,16 +36,18 @@ profiles that give only the necessary privileges to your container processes.
|
|||
|
||||
## {{% heading "prerequisites" %}}
|
||||
|
||||
{{< version-check >}}
|
||||
|
||||
In order to complete all steps in this tutorial, you must install
|
||||
[kind](https://kind.sigs.k8s.io/docs/user/quick-start/) and
|
||||
[kubectl](/docs/tasks/tools/). This tutorial will show examples
|
||||
both alpha (new in v1.22) and generally available seccomp functionality. You should
|
||||
make sure that your cluster is [configured
|
||||
correctly](https://kind.sigs.k8s.io/docs/user/quick-start/#setting-kubernetes-version)
|
||||
[kind](/docs/tasks/tools/#kind) and [kubectl](/docs/tasks/tools/#kubectl).
|
||||
|
||||
This tutorial shows some examples that are still alpha (since v1.22) and
|
||||
others that use only generally available seccomp functionality. You should
|
||||
make sure that your cluster is
|
||||
[configured correctly](https://kind.sigs.k8s.io/docs/user/quick-start/#setting-kubernetes-version)
|
||||
for the version you are using.
|
||||
|
||||
The tutorial also uses the `curl` tool for downloading examples to your computer.
|
||||
You can adapt the steps to use a different tool if you prefer.
|
||||
|
||||
{{< note >}}
|
||||
It is not possible to apply a seccomp profile to a container running with
|
||||
`privileged: true` set in the container's `securityContext`. Privileged containers always
|
||||
|
@ -54,6 +56,107 @@ run as `Unconfined`.
|
|||
|
||||
<!-- steps -->
|
||||
|
||||
|
||||
## Download example seccomp profiles {#download-profiles}
|
||||
|
||||
The contents of these profiles will be explored later on, but for now go ahead
|
||||
and download them into a directory named `profiles/` so that they can be loaded
|
||||
into the cluster.
|
||||
|
||||
{{< tabs name="tab_with_code" >}}
|
||||
{{{< tab name="audit.json" >}}
|
||||
{{< codenew file="pods/security/seccomp/profiles/audit.json" >}}
|
||||
{{< /tab >}}
|
||||
{{< tab name="violation.json" >}}
|
||||
{{< codenew file="pods/security/seccomp/profiles/violation.json" >}}
|
||||
{{< /tab >}}}
|
||||
{{< tab name="fine-grained.json" >}}
|
||||
{{< codenew file="pods/security/seccomp/profiles/fine-grained.json" >}}
|
||||
{{< /tab >}}}
|
||||
{{< /tabs >}}
|
||||
|
||||
Run these commands:
|
||||
|
||||
```shell
|
||||
mkdir ./profiles
|
||||
curl -L -o profiles/audit.json https://k8s.io/examples/pods/security/seccomp/profiles/audit.json
|
||||
curl -L -o profiles/violation.json https://k8s.io/examples/pods/security/seccomp/profiles/violation.json
|
||||
curl -L -o profiles/fine-grained.json https://k8s.io/examples/pods/security/seccomp/profiles/fine-grained.json
|
||||
ls profiles
|
||||
```
|
||||
|
||||
You should see three profiles listed at the end of the final step:
|
||||
```
|
||||
audit.json fine-grained.json violation.json
|
||||
```
|
||||
|
||||
|
||||
## Create a local Kubernetes cluster with kind
|
||||
|
||||
|
||||
For simplicity, [kind](https://kind.sigs.k8s.io/) can be used to create a single
|
||||
node cluster with the seccomp profiles loaded. Kind runs Kubernetes in Docker,
|
||||
so each node of the cluster is a container. This allows for files
|
||||
to be mounted in the filesystem of each container similar to loading files
|
||||
onto a node.
|
||||
|
||||
{{< codenew file="pods/security/seccomp/kind.yaml" >}}
|
||||
|
||||
Download that example kind configuration, and save it to a file named `kind.yaml`:
|
||||
```shell
|
||||
curl -L -O https://k8s.io/examples/pods/security/seccomp/kind.yaml
|
||||
```
|
||||
|
||||
You can set a specific Kubernetes version by setting the node's container image.
|
||||
See [Nodes](https://kind.sigs.k8s.io/docs/user/configuration/#nodes) within the
|
||||
kind documentation about configuration for more details on this.
|
||||
This tutorial assumes you are using Kubernetes {{< param "version" >}}.
|
||||
|
||||
As an alpha feature, you can configure Kubernetes to use the profile that the
|
||||
{{< glossary_tooltip text="container runtime" term_id="container-runtime" >}}
|
||||
prefers by default, rather than falling back to `Unconfined`.
|
||||
If you want to try that, see
|
||||
[enable the use of `RuntimeDefault` as the default seccomp profile for all workloads](#enable-the-use-of-runtimedefault-as-the-default-seccomp-profile-for-all-workloads)
|
||||
before you continue.
|
||||
|
||||
Once you have a kind configuration in place, create the kind cluster with
|
||||
that configuration:
|
||||
|
||||
```shell
|
||||
kind create cluster --config=kind.yaml
|
||||
```
|
||||
|
||||
After the new Kubernetes cluster is ready, identify the Docker container running
|
||||
as the single node cluster:
|
||||
|
||||
```shell
|
||||
docker ps
|
||||
```
|
||||
|
||||
You should see output indicating that a container is running with name
|
||||
`kind-control-plane`. The output is similar to:
|
||||
|
||||
```
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||
6a96207fed4b kindest/node:v1.18.2 "/usr/local/bin/entr…" 27 seconds ago Up 24 seconds 127.0.0.1:42223->6443/tcp kind-control-plane
|
||||
```
|
||||
|
||||
If observing the filesystem of that container, you should see that the
|
||||
`profiles/` directory has been successfully loaded into the default seccomp path
|
||||
of the kubelet. Use `docker exec` to run a command in the Pod:
|
||||
|
||||
```shell
|
||||
# Change 6a96207fed4b to the container ID you saw from "docker ps"
|
||||
docker exec -it 6a96207fed4b ls /var/lib/kubelet/seccomp/profiles
|
||||
```
|
||||
|
||||
```
|
||||
audit.json fine-grained.json violation.json
|
||||
```
|
||||
|
||||
You have verified that these seccomp profiles are available to the kubelet
|
||||
running within kind.
|
||||
|
||||
## Enable the use of `RuntimeDefault` as the default seccomp profile for all workloads
|
||||
|
||||
{{< feature-state state="alpha" for_k8s_version="v1.22" >}}
|
||||
|
@ -64,8 +167,8 @@ well as corresponding `--seccomp-default`
|
|||
[command line flag](/docs/reference/command-line-tools-reference/kubelet).
|
||||
Both have to be enabled simultaneously to use the feature.
|
||||
|
||||
If enabled, the kubelet will use the `RuntimeDefault` seccomp profile by default, which is
|
||||
defined by the container runtime, instead of using the `Unconfined` (seccomp disabled) mode.
|
||||
If enabled, the kubelet will use the `RuntimeDefault` seccomp profile by default, which is
|
||||
defined by the container runtime, instead of using the `Unconfined` (seccomp disabled) mode.
|
||||
The default profiles aim to provide a strong set
|
||||
of security defaults while preserving the functionality of the workload. It is
|
||||
possible that the default profiles differ between container runtimes and their
|
||||
|
@ -102,96 +205,33 @@ featureGates:
|
|||
SeccompDefault: true
|
||||
```
|
||||
|
||||
## Create Seccomp Profiles
|
||||
|
||||
The contents of these profiles will be explored later on, but for now go ahead
|
||||
and download them into a directory named `profiles/` so that they can be loaded
|
||||
into the cluster.
|
||||
|
||||
{{< tabs name="tab_with_code" >}}
|
||||
{{{< tab name="audit.json" >}}
|
||||
{{< codenew file="pods/security/seccomp/profiles/audit.json" >}}
|
||||
{{< /tab >}}
|
||||
{{< tab name="violation.json" >}}
|
||||
{{< codenew file="pods/security/seccomp/profiles/violation.json" >}}
|
||||
{{< /tab >}}}
|
||||
{{< tab name="fine-grained.json" >}}
|
||||
{{< codenew file="pods/security/seccomp/profiles/fine-grained.json" >}}
|
||||
{{< /tab >}}}
|
||||
{{< /tabs >}}
|
||||
|
||||
## Create a Local Kubernetes Cluster with Kind
|
||||
|
||||
For simplicity, [kind](https://kind.sigs.k8s.io/) can be used to create a single
|
||||
node cluster with the seccomp profiles loaded. Kind runs Kubernetes in Docker,
|
||||
so each node of the cluster is a container. This allows for files
|
||||
to be mounted in the filesystem of each container similar to loading files
|
||||
onto a node.
|
||||
|
||||
{{< codenew file="pods/security/seccomp/kind.yaml" >}}
|
||||
<br>
|
||||
|
||||
Download the example above, and save it to a file named `kind.yaml`. Then create
|
||||
the cluster with the configuration.
|
||||
|
||||
```
|
||||
kind create cluster --config=kind.yaml
|
||||
```
|
||||
|
||||
Once the cluster is ready, identify the container running as the single node
|
||||
cluster:
|
||||
|
||||
```
|
||||
docker ps
|
||||
```
|
||||
|
||||
You should see output indicating that a container is running with name
|
||||
`kind-control-plane`.
|
||||
|
||||
```
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||
6a96207fed4b kindest/node:v1.18.2 "/usr/local/bin/entr…" 27 seconds ago Up 24 seconds 127.0.0.1:42223->6443/tcp kind-control-plane
|
||||
```
|
||||
|
||||
If observing the filesystem of that container, one should see that the
|
||||
`profiles/` directory has been successfully loaded into the default seccomp path
|
||||
of the kubelet. Use `docker exec` to run a command in the Pod:
|
||||
|
||||
```
|
||||
docker exec -it 6a96207fed4b ls /var/lib/kubelet/seccomp/profiles
|
||||
```
|
||||
|
||||
```
|
||||
audit.json fine-grained.json violation.json
|
||||
```
|
||||
|
||||
## Create a Pod with a seccomp profile for syscall auditing
|
||||
|
||||
To start off, apply the `audit.json` profile, which will log all syscalls of the
|
||||
process, to a new Pod.
|
||||
|
||||
Download the correct manifest for your Kubernetes version:
|
||||
Here's a manifest for that Pod:
|
||||
|
||||
{{< tabs name="audit_pods" >}}
|
||||
{{< tab name="v1.19 or Later (GA)" >}}
|
||||
{{< codenew file="pods/security/seccomp/ga/audit-pod.yaml" >}}
|
||||
{{< /tab >}}}
|
||||
{{{< tab name="Pre-v1.19 (alpha)" >}}
|
||||
{{< codenew file="pods/security/seccomp/alpha/audit-pod.yaml" >}}
|
||||
{{< /tab >}}
|
||||
{{< /tabs >}}
|
||||
<br>
|
||||
|
||||
{{< note >}}
|
||||
The functional support for the already deprecated seccomp annotations
|
||||
`seccomp.security.alpha.kubernetes.io/pod` (for the whole pod) and
|
||||
`container.seccomp.security.alpha.kubernetes.io/[name]` (for a single container)
|
||||
is going to be removed with the release of Kubernetes v1.25. Please always use
|
||||
the native API fields in favor of the annotations.
|
||||
{{< /note >}}
|
||||
|
||||
Create the Pod in the cluster:
|
||||
|
||||
```
|
||||
kubectl apply -f audit-pod.yaml
|
||||
```shell
|
||||
kubectl apply -f https://k8s.io/examples/pods/security/seccomp/ga/audit-pod.yaml
|
||||
```
|
||||
|
||||
This profile does not restrict any syscalls, so the Pod should start
|
||||
successfully.
|
||||
|
||||
```
|
||||
```shell
|
||||
kubectl get pod/audit-pod
|
||||
```
|
||||
|
||||
|
@ -201,28 +241,31 @@ audit-pod 1/1 Running 0 30s
|
|||
```
|
||||
|
||||
In order to be able to interact with this endpoint exposed by this
|
||||
container,create a NodePort Service that allows access to the endpoint from
|
||||
inside the kind control plane container.
|
||||
container, create a NodePort {{< glossary_tooltip text="Services" term_id="service" >}}
|
||||
that allows access to the endpoint from inside the kind control plane container.
|
||||
|
||||
```
|
||||
kubectl expose pod/audit-pod --type NodePort --port 5678
|
||||
```shell
|
||||
kubectl expose pod audit-pod --type NodePort --port 5678
|
||||
```
|
||||
|
||||
Check what port the Service has been assigned on the node.
|
||||
|
||||
```
|
||||
kubectl get svc/audit-pod
|
||||
```shell
|
||||
kubectl get service audit-pod
|
||||
```
|
||||
|
||||
The output is similar to:
|
||||
```
|
||||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
||||
audit-pod NodePort 10.111.36.142 <none> 5678:32373/TCP 72s
|
||||
```
|
||||
|
||||
Now you can `curl` the endpoint from inside the kind control plane container at
|
||||
the port exposed by this Service. Use `docker exec` to run a command in the Pod:
|
||||
Now you can use `curl` to access that endpoint from inside the kind control plane container,
|
||||
at the port exposed by this Service. Use `docker exec` to run the `curl` command within the
|
||||
container belonging to that control plane container:
|
||||
|
||||
```
|
||||
```shell
|
||||
# Change 6a96207fed4b to the control plane container ID you saw from "docker ps"
|
||||
docker exec -it 6a96207fed4b curl localhost:32373
|
||||
```
|
||||
|
||||
|
@ -235,13 +278,14 @@ Because this Pod is running in a local cluster, you should be able to see those
|
|||
in `/var/log/syslog`. Open up a new terminal window and `tail` the output for
|
||||
calls from `http-echo`:
|
||||
|
||||
```
|
||||
```shell
|
||||
tail -f /var/log/syslog | grep 'http-echo'
|
||||
```
|
||||
|
||||
You should already see some logs of syscalls made by `http-echo`, and if you
|
||||
`curl` the endpoint in the control plane container you will see more written.
|
||||
|
||||
For example:
|
||||
```
|
||||
Jul 6 15:37:40 my-machine kernel: [369128.669452] audit: type=1326 audit(1594067860.484:14536): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=51 compat=0 ip=0x46fe1f code=0x7ffc0000
|
||||
Jul 6 15:37:40 my-machine kernel: [369128.669453] audit: type=1326 audit(1594067860.484:14537): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=29064 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=54 compat=0 ip=0x46fdba code=0x7ffc0000
|
||||
|
@ -260,37 +304,30 @@ for this container.
|
|||
|
||||
Clean up that Pod and Service before moving to the next section:
|
||||
|
||||
```
|
||||
kubectl delete pod/audit-pod
|
||||
kubectl delete svc/audit-pod
|
||||
```shell
|
||||
kubectl delete service audit-pod --wait
|
||||
kubectl delete pod audit-pod --wait --now
|
||||
```
|
||||
|
||||
## Create Pod with seccomp Profile that Causes Violation
|
||||
## Create Pod with seccomp profile that causes violation
|
||||
|
||||
For demonstration, apply a profile to the Pod that does not allow for any
|
||||
syscalls.
|
||||
|
||||
Download the correct manifest for your Kubernetes version:
|
||||
The manifest for this demonstration is:
|
||||
|
||||
{{< tabs name="violation_pods" >}}
|
||||
{{< tab name="v1.19 or Later (GA)" >}}
|
||||
{{< codenew file="pods/security/seccomp/ga/violation-pod.yaml" >}}
|
||||
{{< /tab >}}}
|
||||
{{{< tab name="Pre-v1.19 (alpha)" >}}
|
||||
{{< codenew file="pods/security/seccomp/alpha/violation-pod.yaml" >}}
|
||||
{{< /tab >}}
|
||||
{{< /tabs >}}
|
||||
<br>
|
||||
|
||||
Create the Pod in the cluster:
|
||||
Attempt to create the Pod in the cluster:
|
||||
|
||||
```
|
||||
kubectl apply -f violation-pod.yaml
|
||||
```shell
|
||||
kubectl apply -f https://k8s.io/examples/pods/security/seccomp/ga/violation-pod.yaml
|
||||
```
|
||||
|
||||
The Pod creates, but there is an issue.
|
||||
If you check the status of the Pod, you should see that it failed to start.
|
||||
|
||||
```
|
||||
```shell
|
||||
kubectl get pod/violation-pod
|
||||
```
|
||||
|
||||
|
@ -307,12 +344,12 @@ only the privileges they need.
|
|||
|
||||
Clean up that Pod and Service before moving to the next section:
|
||||
|
||||
```
|
||||
kubectl delete pod/violation-pod
|
||||
kubectl delete svc/violation-pod
|
||||
```shell
|
||||
kubectl delete service violation-pod --wait
|
||||
kubectl delete pod violation-pod --wait --now
|
||||
```
|
||||
|
||||
## Create Pod with seccomp Profile that Only Allows Necessary Syscalls
|
||||
## Create Pod with seccomp profile that only allows necessary syscalls
|
||||
|
||||
If you take a look at the `fine-pod.json`, you will notice some of the syscalls
|
||||
seen in the first example where the profile set `"defaultAction":
|
||||
|
@ -321,61 +358,56 @@ but explicitly allowing a set of syscalls in the `"action": "SCMP_ACT_ALLOW"`
|
|||
block. Ideally, the container will run successfully and you will see no messages
|
||||
sent to `syslog`.
|
||||
|
||||
Download the correct manifest for your Kubernetes version:
|
||||
The manifest for this example is:
|
||||
|
||||
{{< tabs name="fine_pods" >}}
|
||||
{{< tab name="v1.19 or Later (GA)" >}}
|
||||
{{< codenew file="pods/security/seccomp/ga/fine-pod.yaml" >}}
|
||||
{{< /tab >}}}
|
||||
{{{< tab name="Pre-v1.19 (alpha)" >}}
|
||||
{{< codenew file="pods/security/seccomp/alpha/fine-pod.yaml" >}}
|
||||
{{< /tab >}}
|
||||
{{< /tabs >}}
|
||||
<br>
|
||||
|
||||
Create the Pod in your cluster:
|
||||
|
||||
```
|
||||
kubectl apply -f fine-pod.yaml
|
||||
```shell
|
||||
kubectl apply -f https://k8s.io/examples/pods/security/seccomp/ga/fine-pod.yaml
|
||||
```
|
||||
|
||||
The Pod should start successfully.
|
||||
|
||||
```
|
||||
kubectl get pod/fine-pod
|
||||
```shell
|
||||
kubectl get pod fine-pod
|
||||
```
|
||||
|
||||
The Pod should be showing as having started successfully:
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
fine-pod 1/1 Running 0 30s
|
||||
```
|
||||
|
||||
Open up a new terminal window and `tail` the output for calls from `http-echo`:
|
||||
Open up a new terminal window and use `tail` to monitor for log entries that
|
||||
mention calls from `http-echo`:
|
||||
|
||||
```
|
||||
```shell
|
||||
# The log path on your computer might be different from "/var/log/syslog"
|
||||
tail -f /var/log/syslog | grep 'http-echo'
|
||||
```
|
||||
|
||||
Expose the Pod with a NodePort Service:
|
||||
Next, expose the Pod with a NodePort Service:
|
||||
|
||||
```
|
||||
kubectl expose pod/fine-pod --type NodePort --port 5678
|
||||
```shell
|
||||
kubectl expose pod fine-pod --type NodePort --port 5678
|
||||
```
|
||||
|
||||
Check what port the Service has been assigned on the node:
|
||||
|
||||
```
|
||||
kubectl get svc/fine-pod
|
||||
```shell
|
||||
kubectl get service fine-pod
|
||||
```
|
||||
|
||||
The output is similar to:
|
||||
```
|
||||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
||||
fine-pod NodePort 10.111.36.142 <none> 5678:32373/TCP 72s
|
||||
```
|
||||
|
||||
`curl` the endpoint from inside the kind control plane container:
|
||||
Use `curl` to access that endpoint from inside the kind control plane container:
|
||||
|
||||
```
|
||||
```shell
|
||||
# Change 6a96207fed4b to the control plane container ID you saw from "docker ps"
|
||||
docker exec -it 6a96207fed4b curl localhost:32373
|
||||
```
|
||||
|
||||
|
@ -383,7 +415,7 @@ docker exec -it 6a96207fed4b curl localhost:32373
|
|||
just made some syscalls!
|
||||
```
|
||||
|
||||
You should see no output in the `syslog` because the profile allowed all
|
||||
You should see no output in the `syslog`. This is because the profile allowed all
|
||||
necessary syscalls and specified that an error should occur if one outside of
|
||||
the list is invoked. This is an ideal situation from a security perspective, but
|
||||
required some effort in analyzing the program. It would be nice if there was a
|
||||
|
@ -391,35 +423,51 @@ simple way to get closer to this security without requiring as much effort.
|
|||
|
||||
Clean up that Pod and Service before moving to the next section:
|
||||
|
||||
```
|
||||
kubectl delete pod/fine-pod
|
||||
kubectl delete svc/fine-pod
|
||||
```shell
|
||||
kubectl delete service fine-pod --wait
|
||||
kubectl delete pod fine-pod --wait --now
|
||||
```
|
||||
|
||||
## Create Pod that uses the Container Runtime Default seccomp Profile
|
||||
## Create Pod that uses the container runtime default seccomp profile
|
||||
|
||||
Most container runtimes provide a sane set of default syscalls that are allowed
|
||||
or not. The defaults can easily be applied in Kubernetes by using the
|
||||
`runtime/default` annotation or setting the seccomp type in the security context
|
||||
of a pod or container to `RuntimeDefault`.
|
||||
or not. You can adopt these defaults for your workload by setting the seccomp
|
||||
type in the security context of a pod or container to `RuntimeDefault`.
|
||||
|
||||
Download the correct manifest for your Kubernetes version:
|
||||
{{< note >}}
|
||||
If you have the `SeccompDefault` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) enabled, then Pods use the `RuntimeDefault` seccomp profile whenever
|
||||
no other seccomp profile is specified. Otherwise, the default is `Unconfined`.
|
||||
{{< /note >}}
|
||||
|
||||
Here's a manifest for a Pod that requests the `RuntimeDefault` seccomp profile
|
||||
for all its containers:
|
||||
|
||||
{{< tabs name="default_pods" >}}
|
||||
{{< tab name="v1.19 or Later (GA)" >}}
|
||||
{{< codenew file="pods/security/seccomp/ga/default-pod.yaml" >}}
|
||||
{{< /tab >}}}
|
||||
{{{< tab name="Pre-v1.19 (alpha)" >}}
|
||||
{{< codenew file="pods/security/seccomp/alpha/default-pod.yaml" >}}
|
||||
{{< /tab >}}
|
||||
{{< /tabs >}}
|
||||
<br>
|
||||
|
||||
The default seccomp profile should provide adequate access for most workloads.
|
||||
Create that Pod:
|
||||
```shell
|
||||
kubectl apply -f https://k8s.io/examples/pods/security/seccomp/ga/default-pod.yaml
|
||||
```
|
||||
|
||||
```shell
|
||||
kubectl get pod default-pod
|
||||
```
|
||||
|
||||
The Pod should be showing as having started successfully:
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
default-pod 1/1 Running 0 20s
|
||||
```
|
||||
|
||||
Finally, now that you saw that work OK, clean up:
|
||||
|
||||
```shell
|
||||
kubectl delete pod default-pod --wait --now
|
||||
```
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
Additional resources:
|
||||
You can learn more about Linux seccomp:
|
||||
|
||||
* [A seccomp Overview](https://lwn.net/Articles/656307/)
|
||||
* [Seccomp Security Profiles for Docker](https://docs.docker.com/engine/security/seccomp/)
|
||||
|
|
|
@ -0,0 +1,15 @@
|
|||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: etcd-with-grpc
|
||||
spec:
|
||||
containers:
|
||||
- name: etcd
|
||||
image: k8s.gcr.io/etcd:3.5.1-0
|
||||
command: [ "/usr/local/bin/etcd", "--data-dir", "/var/lib/etcd", "--listen-client-urls", "http://0.0.0.0:2379", "--advertise-client-urls", "http://127.0.0.1:2379", "--log-level", "debug"]
|
||||
ports:
|
||||
- containerPort: 2379
|
||||
livenessProbe:
|
||||
grpc:
|
||||
port: 2379
|
||||
initialDelaySeconds: 10
|
|
@ -1,9 +1,9 @@
|
|||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: audit-pod
|
||||
name: default-pod
|
||||
labels:
|
||||
app: audit-pod
|
||||
app: default-pod
|
||||
spec:
|
||||
securityContext:
|
||||
seccompProfile:
|
||||
|
@ -12,6 +12,6 @@ spec:
|
|||
- name: test-container
|
||||
image: hashicorp/http-echo:0.2.3
|
||||
args:
|
||||
- "-text=just made some syscalls!"
|
||||
- "-text=just made some more syscalls!"
|
||||
securityContext:
|
||||
allowPrivilegeEscalation: false
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because one or more lines are too long
Loading…
Reference in New Issue