Merge pull request #44710 from kubernetes/dev-1.30

Official 1.30 Release Docs
pull/45904/head snapshot-initial-v1.30
Drew Hagen 2024-04-17 15:42:56 -05:00 committed by GitHub
commit 0471ca1257
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
108 changed files with 2345 additions and 453 deletions

View File

@ -141,7 +141,7 @@ until disk usage reaches the `LowThresholdPercent` value.
{{< feature-state feature_gate_name="ImageMaximumGCAge" >}}
As an alpha feature, you can specify the maximum time a local image can be unused for,
As a beta feature, you can specify the maximum time a local image can be unused for,
regardless of disk usage. This is a kubelet setting that you configure for each node.
To configure the setting, enable the `ImageMaximumGCAge`
@ -151,6 +151,13 @@ and also set a value for the `ImageMaximumGCAge` field in the kubelet configurat
The value is specified as a Kubernetes _duration_; for example, you can set the configuration
field to `3d12h`, which means 3 days and 12 hours.
{{< note >}}
This feature does not track image usage across kubelet restarts. If the kubelet
is restarted, the tracked image age is reset, causing the kubelet to wait the full
`ImageMaximumGCAge` duration before qualifying images for garbage collection
based on image age.
{{< /note>}}
### Container garbage collection {#container-image-garbage-collection}
The kubelet garbage collects unused containers based on the following variables,

View File

@ -516,14 +516,44 @@ During a non-graceful shutdown, Pods are terminated in the two phases:
recovered since the user was the one who originally added the taint.
{{< /note >}}
### Forced storage detach on timeout {#storage-force-detach-on-timeout}
In any situation where a pod deletion has not succeeded for 6 minutes, kubernetes will
force detach volumes being unmounted if the node is unhealthy at that instant. Any
workload still running on the node that uses a force-detached volume will cause a
violation of the
[CSI specification](https://github.com/container-storage-interface/spec/blob/master/spec.md#controllerunpublishvolume),
which states that `ControllerUnpublishVolume` "**must** be called after all
`NodeUnstageVolume` and `NodeUnpublishVolume` on the volume are called and succeed".
In such circumstances, volumes on the node in question might encounter data corruption.
The forced storage detach behaviour is optional; users might opt to use the "Non-graceful
node shutdown" feature instead.
Force storage detach on timeout can be disabled by setting the `disable-force-detach-on-timeout`
config field in `kube-controller-manager`. Disabling the force detach on timeout feature means
that a volume that is hosted on a node that is unhealthy for more than 6 minutes will not have
its associated
[VolumeAttachment](/docs/reference/kubernetes-api/config-and-storage-resources/volume-attachment-v1/)
deleted.
After this setting has been applied, unhealthy pods still attached to a volumes must be recovered
via the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure mentioned above.
{{< note >}}
- Caution must be taken while using the [Non-Graceful Node Shutdown](#non-graceful-node-shutdown) procedure.
- Deviation from the steps documented above can result in data corruption.
{{< /note >}}
## Swap memory management {#swap-memory}
{{< feature-state feature_gate_name="NodeSwap" >}}
To enable swap on a node, the `NodeSwap` feature gate must be enabled on
the kubelet, and the `--fail-swap-on` command line flag or `failSwapOn`
the kubelet (default is true), and the `--fail-swap-on` command line flag or `failSwapOn`
[configuration setting](/docs/reference/config-api/kubelet-config.v1beta1/)
must be set to false.
To allow Pods to utilize swap, `swapBehavior` should not be set to `NoSwap` (which is the default behavior) in the kubelet config.
{{< warning >}}
When the memory swap feature is turned on, Kubernetes data such as the content
@ -535,17 +565,16 @@ specify how a node will use swap memory. For example,
```yaml
memorySwap:
swapBehavior: UnlimitedSwap
swapBehavior: LimitedSwap
```
- `UnlimitedSwap` (default): Kubernetes workloads can use as much swap memory as they
request, up to the system limit.
- `NoSwap` (default): Kubernetes workloads will not use swap.
- `LimitedSwap`: The utilization of swap memory by Kubernetes workloads is subject to limitations.
Only Pods of Burstable QoS are permitted to employ swap.
If configuration for `memorySwap` is not specified and the feature gate is
enabled, by default the kubelet will apply the same behaviour as the
`UnlimitedSwap` setting.
`NoSwap` setting.
With `LimitedSwap`, Pods that do not fall under the Burstable QoS classification (i.e.
`BestEffort`/`Guaranteed` Qos Pods) are prohibited from utilizing swap memory.

View File

@ -108,6 +108,15 @@ using the [kubelet configuration file](/docs/tasks/administer-cluster/kubelet-co
These settings let you configure the maximum size for each log file and the maximum number of
files allowed for each container respectively.
In order to perform an efficient log rotation in clusters where the volume of the logs generated by
the workload is large, kubelet also provides a mechanism to tune how the logs are rotated in
terms of how many concurrent log rotations can be performed and the interval at which the logs are
monitored and rotated as required.
You can configure two kubelet [configuration settings](/docs/reference/config-api/kubelet-config.v1beta1/),
`containerLogMaxWorkers` and `containerLogMonitorInterval` using the
[kubelet configuration file](/docs/tasks/administer-cluster/kubelet-config-file/).
When you run [`kubectl logs`](/docs/reference/generated/kubectl/kubectl-commands#logs) as in
the basic logging example, the kubelet on the node handles the request and
reads directly from the log file. The kubelet returns the content of the log file.
@ -148,7 +157,7 @@ If systemd is not present, the kubelet and container runtime write to `.log` fil
run the kubelet via a helper tool, `kube-log-runner`, and use that tool to redirect
kubelet logs to a directory that you choose.
The kubelet always directs your container runtime to write logs into directories within
By default, kubelet directs your container runtime to write logs into directories within
`/var/log/pods`.
For more information on `kube-log-runner`, read [System Logs](/docs/concepts/cluster-administration/system-logs/#klog).
@ -166,7 +175,7 @@ If you want to have logs written elsewhere, you can indirectly
run the kubelet via a helper tool, `kube-log-runner`, and use that tool to redirect
kubelet logs to a directory that you choose.
However, the kubelet always directs your container runtime to write logs within the
However, by default, kubelet directs your container runtime to write logs within the
directory `C:\var\log\pods`.
For more information on `kube-log-runner`, read [System Logs](/docs/concepts/cluster-administration/system-logs/#klog).
@ -180,6 +189,22 @@ the `/var/log` directory, bypassing the default logging mechanism (the component
do not write to the systemd journal). You can use Kubernetes' storage mechanisms
to map persistent storage into the container that runs the component.
Kubelet allows changing the pod logs directory from default `/var/log/pods`
to a custom path. This adjustment can be made by configuring the `podLogsDir`
parameter in the kubelet's configuration file.
{{< caution >}}
It's important to note that the default location `/var/log/pods` has been in use for
an extended period and certain processes might implicitly assume this path.
Therefore, altering this parameter must be approached with caution and at your own risk.
Another caveat to keep in mind is that the kubelet supports the location being on the same
disk as `/var`. Otherwise, if the logs are on a separate filesystem from `/var`,
then the kubelet will not track that filesystem's usage, potentially leading to issues if
it fills up.
{{< /caution >}}
For details about etcd and its logs, view the [etcd documentation](https://etcd.io/docs/).
Again, you can use Kubernetes' storage mechanisms to map persistent storage into
the container that runs the component.

View File

@ -122,7 +122,7 @@ second line.}
### Contextual Logging
{{< feature-state for_k8s_version="v1.24" state="alpha" >}}
{{< feature-state for_k8s_version="v1.30" state="beta" >}}
Contextual logging builds on top of structured logging. It is primarily about
how developers use logging calls: code based on that concept is more flexible
@ -133,8 +133,9 @@ If developers use additional functions like `WithValues` or `WithName` in
their components, then log entries contain additional information that gets
passed into functions by their caller.
Currently this is gated behind the `StructuredLogging` feature gate and
disabled by default. The infrastructure for this was added in 1.24 without
For Kubernetes {{< skew currentVersion >}}, this is gated behind the `ContextualLogging`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) and is
enabled by default. The infrastructure for this was added in 1.24 without
modifying components. The
[`component-base/logs/example`](https://github.com/kubernetes/kubernetes/blob/v1.24.0-beta.0/staging/src/k8s.io/component-base/logs/example/cmd/logger.go)
command demonstrates how to use the new logging calls and how a component
@ -147,14 +148,14 @@ $ go run . --help
--feature-gates mapStringBool A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
AllAlpha=true|false (ALPHA - default=false)
AllBeta=true|false (BETA - default=false)
ContextualLogging=true|false (ALPHA - default=false)
ContextualLogging=true|false (BETA - default=true)
$ go run . --feature-gates ContextualLogging=true
...
I0404 18:00:02.916429 451895 logger.go:94] "example/myname: runtime" foo="bar" duration="1m0s"
I0404 18:00:02.916447 451895 logger.go:95] "example: another runtime" foo="bar" duration="1m0s"
I0222 15:13:31.645988 197901 example.go:54] "runtime" logger="example.myname" foo="bar" duration="1m0s"
I0222 15:13:31.646007 197901 example.go:55] "another runtime" logger="example" foo="bar" duration="1h0m0s" duration="1m0s"
```
The `example` prefix and `foo="bar"` were added by the caller of the function
The `logger` key and `foo="bar"` were added by the caller of the function
which logs the `runtime` message and `duration="1m0s"` value, without having to
modify that function.
@ -165,8 +166,8 @@ is not in the log output anymore:
```console
$ go run . --feature-gates ContextualLogging=false
...
I0404 18:03:31.171945 452150 logger.go:94] "runtime" duration="1m0s"
I0404 18:03:31.171962 452150 logger.go:95] "another runtime" duration="1m0s"
I0222 15:14:40.497333 198174 example.go:54] "runtime" duration="1m0s"
I0222 15:14:40.497346 198174 example.go:55] "another runtime" duration="1h0m0s" duration="1m0s"
```
### JSON log format
@ -244,11 +245,11 @@ To help with debugging issues on nodes, Kubernetes v1.27 introduced a feature th
running on the node. To use the feature, ensure that the `NodeLogQuery`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled for that node, and that the
kubelet configuration options `enableSystemLogHandler` and `enableSystemLogQuery` are both set to true. On Linux
we assume that service logs are available via journald. On Windows we assume that service logs are available
in the application log provider. On both operating systems, logs are also available by reading files within
the assumption is that service logs are available via journald. On Windows the assumption is that service logs are
available in the application log provider. On both operating systems, logs are also available by reading files within
`/var/log/`.
Provided you are authorized to interact with node objects, you can try out this alpha feature on all your nodes or
Provided you are authorized to interact with node objects, you can try out this feature on all your nodes or
just a subset. Here is an example to retrieve the kubelet service logs from a node:
```shell
@ -293,4 +294,4 @@ kubectl get --raw "/api/v1/nodes/node-1.example/proxy/logs/?query=kubelet&patter
* Read about [Contextual Logging](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/3077-contextual-logging)
* Read about [deprecation of klog flags](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components)
* Read about the [Conventions for logging severity](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md)
* Read about [Log Query](https://kep.k8s.io/2258)

View File

@ -208,6 +208,42 @@ ConfigMaps consumed as environment variables are not updated automatically and r
A container using a ConfigMap as a [subPath](/docs/concepts/storage/volumes#using-subpath) volume mount will not receive ConfigMap updates.
{{< /note >}}
### Using Configmaps as environment variables
To use a Configmap in an {{< glossary_tooltip text="environment variable" term_id="container-env-variables" >}}
in a Pod:
1. For each container in your Pod specification, add an environment variable
for each Configmap key that you want to use to the
`env[].valueFrom.configMapKeyRef` field.
1. Modify your image and/or command line so that the program looks for values
in the specified environment variables.
This is an example of defining a ConfigMap as a pod environment variable:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: env-configmap
spec:
containers:
- name: envars-test-container
image: nginx
env:
- name: CONFIGMAP_USERNAME
valueFrom:
configMapKeyRef:
name: myconfigmap
key: username
```
It's important to note that the range of characters allowed for environment
variable names in pods is [restricted](/docs/tasks/inject-data-application/define-environment-variable-container/#using-environment-variables-inside-of-your-config).
If any keys do not meet the rules, those keys are not made available to your container, though
the Pod is allowed to start.
## Immutable ConfigMaps {#configmap-immutable}
{{< feature-state for_k8s_version="v1.21" state="stable" >}}

View File

@ -567,25 +567,10 @@ in a Pod:
For instructions, refer to
[Define container environment variables using Secret data](/docs/tasks/inject-data-application/distribute-credentials-secure/#define-container-environment-variables-using-secret-data).
#### Invalid environment variables {#restriction-env-from-invalid}
If your environment variable definitions in your Pod specification are
considered to be invalid environment variable names, those keys aren't made
available to your container. The Pod is allowed to start.
Kubernetes adds an Event with the reason set to `InvalidVariableNames` and a
message that lists the skipped invalid keys. The following example shows a Pod that refers to a Secret named `mysecret`, where `mysecret` contains 2 invalid keys: `1badkey` and `2alsobad`.
```shell
kubectl get events
```
The output is similar to:
```
LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON
0s 0s 1 dapi-test-pod Pod Warning InvalidEnvironmentVariableNames kubelet, 127.0.0.1 Keys [1badkey, 2alsobad] from the EnvFrom secret default/mysecret were skipped since they are considered invalid environment variable names.
```
It's important to note that the range of characters allowed for environment variable
names in pods is [restricted](/docs/tasks/inject-data-application/define-environment-variable-container/#using-environment-variables-inside-of-your-config).
If any keys do not meet the rules, those keys are not made available to your container, though
the Pod is allowed to start.
### Container image pull Secrets {#using-imagepullsecrets}

View File

@ -56,8 +56,7 @@ There are three types of hook handlers that can be implemented for Containers:
Resources consumed by the command are counted against the Container.
* HTTP - Executes an HTTP request against a specific endpoint on the Container.
* Sleep - Pauses the container for a specified duration.
The "Sleep" action is available when the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
`PodLifecycleSleepAction` is enabled.
This is a beta-level feature default enabled by the `PodLifecycleSleepAction` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
### Hook handler execution

View File

@ -295,6 +295,50 @@ When you add a custom resource, you can access it using:
(generating one is an advanced undertaking, but some projects may provide a client along with
the CRD or AA).
## Custom resource field selectors
[Field Selectors](/docs/concepts/overview/working-with-objects/field-selectors/)
let clients select custom resources based on the value of one or more resource
fields.
All custom resources support the `metadata.name` and `metadata.namespace` field
selectors.
Fields declared in a {{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}}
may also be used with field selectors when included in the `spec.versions[*].selectableFields` field of the
{{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}}.
### Selectable fields for custom resources {#crd-selectable-fields}
{{< feature-state feature_gate_name="CustomResourceFieldSelectors" >}}
You need to enable the `CustomResourceFieldSelectors`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to
use this behavior, which then applies to all CustomResourceDefinitions in your
cluster.
The `spec.versions[*].selectableFields` field of a {{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}} may be used to
declare which other fields in a custom resource may be used in field selectors.
The following example adds the `.spec.color` and `.spec.size` fields as
selectable fields.
{{% code_sample file="customresourcedefinition/shirt-resource-definition.yaml" %}}
Field selectors can then be used to get only resources with with a `color` of `blue`:
```shell
kubectl get shirts.stable.example.com --field-selector spec.color=blue
```
The output should be:
```
NAME COLOR SIZE
example1 blue S
example2 blue M
```
## {{% heading "whatsnext" %}}
* Learn how to [Extend the Kubernetes API with the aggregation layer](/docs/concepts/extend-kubernetes/api-extension/apiserver-aggregation/).

View File

@ -54,19 +54,6 @@ that plugin or [networking provider](/docs/concepts/cluster-administration/netwo
## Network Plugin Requirements
For plugin developers and users who regularly build or deploy Kubernetes, the plugin may also need
specific configuration to support kube-proxy. The iptables proxy depends on iptables, and the
plugin may need to ensure that container traffic is made available to iptables. For example, if
the plugin connects containers to a Linux bridge, the plugin must set the
`net/bridge/bridge-nf-call-iptables` sysctl to `1` to ensure that the iptables proxy functions
correctly. If the plugin does not use a Linux bridge, but uses something like Open vSwitch or
some other mechanism instead, it should ensure container traffic is appropriately routed for the
proxy.
By default, if no kubelet network plugin is specified, the `noop` plugin is used, which sets
`net/bridge/bridge-nf-call-iptables=1` to ensure simple configurations (like Docker with a bridge)
work correctly with the iptables proxy.
### Loopback CNI
In addition to the CNI plugin installed on the nodes for implementing the Kubernetes network

View File

@ -71,22 +71,22 @@ separate endpoint for each group version.
### Aggregated discovery
{{< feature-state state="beta" for_k8s_version="v1.27" >}}
{{< feature-state feature_gate_name="AggregatedDiscoveryEndpoint" >}}
Kubernetes offers beta support for aggregated discovery, publishing
Kubernetes offers stable support for _aggregated discovery_, publishing
all resources supported by a cluster through two endpoints (`/api` and
`/apis`). Requesting this
endpoint drastically reduces the number of requests sent to fetch the
discovery data from the cluster. You can access the data by
requesting the respective endpoints with an `Accept` header indicating
the aggregated discovery resource:
`Accept: application/json;v=v2beta1;g=apidiscovery.k8s.io;as=APIGroupDiscoveryList`.
`Accept: application/json;v=v2;g=apidiscovery.k8s.io;as=APIGroupDiscoveryList`.
Without indicating the resource type using the `Accept` header, the default
response for the `/api` and `/apis` endpoint is an unaggregated discovery
document.
The [discovery document](https://github.com/kubernetes/kubernetes/blob/release-{{< skew currentVersion >}}/api/discovery/aggregated_v2beta1.json)
The [discovery document](https://github.com/kubernetes/kubernetes/blob/release-{{< skew currentVersion >}}/api/discovery/aggregated_v2.json)
for the built-in resources can be found in the Kubernetes GitHub repository.
This Github document can be used as a reference of the base set of the available resources
if a Kubernetes cluster is not available to query.
@ -282,30 +282,6 @@ packages that define the API objects.
Kubernetes stores the serialized state of objects by writing them into
{{< glossary_tooltip term_id="etcd" >}}.
## API Discovery
A list of all group versions supported by a cluster is published at
the `/api` and `/apis` endpoints. Each group version also advertises
the list of resources supported via `/apis/<group>/<version>` (for
example: `/apis/rbac.authorization.k8s.io/v1alpha1`). These endpoints
are used by kubectl to fetch the list of resources supported by a
cluster.
### Aggregated Discovery
{{< feature-state feature_gate_name="AggregatedDiscoveryEndpoint" >}}
Kubernetes offers beta support for aggregated discovery, publishing
all resources supported by a cluster through two endpoints (`/api` and
`/apis`) compared to one for every group version. Requesting this
endpoint drastically reduces the number of requests sent to fetch the
discovery for the average Kubernetes cluster. This may be accessed by
requesting the respective endpoints with an Accept header indicating
the aggregated discovery resource:
`Accept: application/json;v=v2beta1;g=apidiscovery.k8s.io;as=APIGroupDiscoveryList`.
The endpoint also supports ETag and protobuf encoding.
## API groups and versioning
To make it easier to eliminate fields or restructure resource representations,

View File

@ -9,13 +9,16 @@ weight: 65
<!-- overview -->
{{< feature-state for_k8s_version="v1.27" state="alpha" >}}
{{< feature-state feature_gate_name="DynamicResourceAllocation" >}}
Dynamic resource allocation is an API for requesting and sharing resources
between pods and containers inside a pod. It is a generalization of the
persistent volumes API for generic resources. Third-party resource drivers are
responsible for tracking and allocating resources. Different kinds of
resources support arbitrary parameters for defining requirements and
responsible for tracking and allocating resources, with additional support
provided by Kubernetes via _structured parameters_ (introduced in Kubernetes 1.30).
When a driver uses structured parameters, Kubernetes handles scheduling
and resource allocation without having to communicate with the driver.
Different kinds of resources support arbitrary parameters for defining requirements and
initialization.
## {{% heading "prerequisites" %}}
@ -56,11 +59,39 @@ PodSchedulingContext
to coordinate pod scheduling when ResourceClaims need to be allocated
for a Pod.
ResourceSlice
: Used with structured parameters to publish information about resources
that are available in the cluster.
ResourceClaimParameters
: Contain the parameters for a ResourceClaim which influence scheduling,
in a format that is understood by Kubernetes (the "structured parameter
model"). Additional parameters may be embedded in an opaque
extension, for use by the vendor driver when setting up the underlying
resource.
ResourceClassParameters
: Similar to ResourceClaimParameters, the ResourceClassParameters provides
a type for ResourceClass parameters which is understood by Kubernetes.
Parameters for ResourceClass and ResourceClaim are stored in separate objects,
typically using the type defined by a {{< glossary_tooltip
term_id="CustomResourceDefinition" text="CRD" >}} that was created when
installing a resource driver.
The developer of a resource driver decides whether they want to handle these
parameters in their own external controller or instead rely on Kubernetes to
handle them through the use of structured parameters. A
custom controller provides more flexibility, but cluster autoscaling is not
going to work reliably for node-local resources. Structured parameters enable
cluster autoscaling, but might not satisfy all use-cases.
When a driver uses structured parameters, it is still possible to let the
end-user specify parameters with vendor-specific CRDs. When doing so, the
driver needs to translate those
custom parameters into the in-tree types. Alternatively, a driver may also
document how to use the in-tree types directly.
The `core/v1` `PodSpec` defines ResourceClaims that are needed for a Pod in a
`resourceClaims` field. Entries in that list reference either a ResourceClaim
or a ResourceClaimTemplate. When referencing a ResourceClaim, all Pods using
@ -129,8 +160,11 @@ spec:
## Scheduling
### Without structured parameters
In contrast to native resources (CPU, RAM) and extended resources (managed by a
device plugin, advertised by kubelet), the scheduler has no knowledge of what
device plugin, advertised by kubelet), without structured parameters
the scheduler has no knowledge of what
dynamic resources are available in a cluster or how they could be split up to
satisfy the requirements of a specific ResourceClaim. Resource drivers are
responsible for that. They mark ResourceClaims as "allocated" once resources
@ -172,6 +206,27 @@ ResourceClaims, and thus scheduling the next pod gets delayed.
{{< /note >}}
### With structured parameters
When a driver uses structured parameters, the scheduler takes over the
responsibility of allocating resources to a ResourceClaim whenever a pod needs
them. It does so by retrieving the full list of available resources from
ResourceSlice objects, tracking which of those resources have already been
allocated to existing ResourceClaims, and then selecting from those resources
that remain. The exact resources selected are subject to the constraints
provided in any ResourceClaimParameters or ResourceClassParameters associated
with the ResourceClaim.
The chosen resource is recorded in the ResourceClaim status together with any
vendor-specific parameters, so when a pod is about to start on a node, the
resource driver on the node has all the information it needs to prepare the
resource.
By using structured parameters, the scheduler is able to reach a decision
without communicating with any DRA resource drivers. It is also able to
schedule multiple pods quickly by keeping information about ResourceClaim
allocations in memory and writing this information to the ResourceClaim objects
in the background while concurrently binding the pod to a node.
## Monitoring resources
@ -193,7 +248,13 @@ was not enabled in the scheduler at the time when the Pod got scheduled
detects this and tries to make the Pod runnable by triggering allocation and/or
reserving the required ResourceClaims.
However, it is better to avoid this because a Pod that is assigned to a node
{{< note >}}
This only works with resource drivers that don't use structured parameters.
{{< /note >}}
It is better to avoid bypassing the scheduler because a Pod that is assigned to a node
blocks normal resources (RAM, CPU) that then cannot be used for other Pods
while the Pod is stuck. To make a Pod run on a specific node while still going
through the normal scheduling flow, create the Pod with a node selector that
@ -255,4 +316,5 @@ be installed. Please refer to the driver's documentation for details.
## {{% heading "whatsnext" %}}
- For more information on the design, see the
[Dynamic Resource Allocation KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3063-dynamic-resource-allocation/README.md).
[Dynamic Resource Allocation KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3063-dynamic-resource-allocation/README.md)
and the [Structured Parameters KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4381-dra-structured-parameters).

View File

@ -169,6 +169,7 @@ The kubelet has the following default hard eviction thresholds:
- `nodefs.available<10%`
- `imagefs.available<15%`
- `nodefs.inodesFree<5%` (Linux nodes)
- `imagefs.inodesFree<5%` (Linux nodes)
These default values of hard eviction thresholds will only be set if none
of the parameters is changed. If you change the value of any parameter,

View File

@ -6,7 +6,7 @@ weight: 40
<!-- overview -->
{{< feature-state for_k8s_version="v1.27" state="beta" >}}
{{< feature-state for_k8s_version="v1.30" state="stable" >}}
Pods were considered ready for scheduling once created. Kubernetes scheduler
does its due diligence to find nodes to place all pending Pods. However, in a
@ -89,9 +89,7 @@ The metric `scheduler_pending_pods` comes with a new label `"gated"` to distingu
has been tried scheduling but claimed as unschedulable, or explicitly marked as not ready for
scheduling. You can use `scheduler_pending_pods{queue="gated"}` to check the metric result.
## Mutable Pod Scheduling Directives
{{< feature-state for_k8s_version="v1.27" state="beta" >}}
## Mutable Pod scheduling directives
You can mutate scheduling directives of Pods while they have scheduling gates, with certain constraints.
At a high level, you can only tighten the scheduling directives of a Pod. In other words, the updated

View File

@ -60,7 +60,7 @@ spec:
# Configure a topology spread constraint
topologySpreadConstraints:
- maxSkew: <integer>
minDomains: <integer> # optional; beta since v1.25
minDomains: <integer> # optional
topologyKey: <string>
whenUnsatisfiable: <string>
labelSelector: <object>
@ -96,11 +96,11 @@ your cluster. Those fields are:
A domain is a particular instance of a topology. An eligible domain is a domain whose
nodes match the node selector.
<!-- OK to remove this note once v1.29 Kubernetes is out of support -->
{{< note >}}
The `MinDomainsInPodTopologySpread` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
enables `minDomains` for pod topology spread. Starting from v1.28,
the `MinDomainsInPodTopologySpread` gate
is enabled by default. In older Kubernetes clusters it might be explicitly
Before Kubernetes v1.30, the `minDomains` field was only available if the
`MinDomainsInPodTopologySpread` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
was enabled (default since v1.28). In older Kubernetes clusters it might be explicitly
disabled or the field might not be available.
{{< /note >}}

View File

@ -143,7 +143,7 @@ To protect your compute at runtime, you can:
Pods with different trust contexts are run on separate sets of nodes.
1. Use a {{< glossary_tooltip text="container runtime" term_id="container-runtime" >}}
that provides security restrictions.
1. On Linux nodes, use a Linux security module such as [AppArmor](/docs/tutorials/security/apparmor/) (beta)
1. On Linux nodes, use a Linux security module such as [AppArmor](/docs/tutorials/security/apparmor/)
or [seccomp](/docs/tutorials/security/seccomp/).
### Runtime protection: storage {#protection-runtime-storage}
@ -223,4 +223,3 @@ logs are both tamper-proof and confidential.
* [Network policies](/docs/concepts/services-networking/network-policies/) for Pods
* [Pod security standards](/docs/concepts/security/pod-security-standards/)
* [RuntimeClasses](/docs/concepts/containers/runtime-class)

View File

@ -121,7 +121,7 @@ current policy level:
- Any metadata updates **except** changes to the seccomp or AppArmor annotations:
- `seccomp.security.alpha.kubernetes.io/pod` (deprecated)
- `container.seccomp.security.alpha.kubernetes.io/*` (deprecated)
- `container.apparmor.security.beta.kubernetes.io/*`
- `container.apparmor.security.beta.kubernetes.io/*` (deprecated)
- Valid updates to `.spec.activeDeadlineSeconds`
- Valid updates to `.spec.tolerations`

View File

@ -170,8 +170,21 @@ fail validation.
<tr>
<td style="white-space: nowrap">AppArmor</td>
<td>
<p>On supported hosts, the <code>runtime/default</code> AppArmor profile is applied by default. The baseline policy should prevent overriding or disabling the default AppArmor profile, or restrict overrides to an allowed set of profiles.</p>
<p>On supported hosts, the <code>RuntimeDefault</code> AppArmor profile is applied by default. The baseline policy should prevent overriding or disabling the default AppArmor profile, or restrict overrides to an allowed set of profiles.</p>
<p><strong>Restricted Fields</strong></p>
<ul>
<li><code>spec.securityContext.appArmorProfile.type</code></li>
<li><code>spec.containers[*].securityContext.appArmorProfile.type</code></li>
<li><code>spec.initContainers[*].securityContext.appArmorProfile.type</code></li>
<li><code>spec.ephemeralContainers[*].securityContext.appArmorProfile.type</code></li>
</ul>
<p><strong>Allowed Values</strong></p>
<ul>
<li>Undefined/nil</li>
<li><code>RuntimeDefault</code></li>
<li><code>Localhost</code></li>
</ul>
<hr />
<ul>
<li><code>metadata.annotations["container.apparmor.security.beta.kubernetes.io/*"]</code></li>
</ul>
@ -532,4 +545,3 @@ kernel. This allows for workloads requiring heightened permissions to still be i
Additionally, the protection of sandboxed workloads is highly dependent on the method of
sandboxing. As such, no single recommended profile is recommended for all sandboxed workloads.

View File

@ -177,10 +177,10 @@ Seccomp is only available on Linux nodes.
#### AppArmor
[AppArmor](https://apparmor.net/) is a Linux kernel security module that can
[AppArmor](/docs/tutorials/security/apparmor/) is a Linux kernel security module that can
provide an easy way to implement Mandatory Access Control (MAC) and better
auditing through system logs. To [enable AppArmor in Kubernetes](/docs/tutorials/security/apparmor/),
at least version 1.4 is required. Like seccomp, AppArmor is also configured
auditing through system logs. A default AppArmor profile is enforced on nodes that support it, or a custom profile can be configured.
Like seccomp, AppArmor is also configured
through profiles, where each profile is either running in enforcing mode, which
blocks access to disallowed resources or complain mode, which only reports
violations. AppArmor profiles are enforced on a per-container basis, with an

View File

@ -622,6 +622,16 @@ You can integrate with [Gateway](https://gateway-api.sigs.k8s.io/) rather than S
can define your own (provider specific) annotations on the Service that specify the equivalent detail.
{{< /note >}}
#### Node liveness impact on load balancer traffic
Load balancer health checks are critical to modern applications. They are used to
determine which server (virtual machine, or IP address) the load balancer should
dispatch traffic to. The Kubernetes APIs do not define how health checks have to be
implemented for Kubernetes managed load balancers, instead it's the cloud providers
(and the people implementing integration code) who decide on the behavior. Load
balancer health checks are extensively used within the context of supporting the
`externalTrafficPolicy` field for Services.
#### Load balancers with mixed protocol types
{{< feature-state feature_gate_name="MixedProtocolLBService" >}}
@ -675,7 +685,7 @@ Unprefixed names are reserved for end-users.
{{< feature-state feature_gate_name="LoadBalancerIPMode" >}}
Starting as Alpha in Kubernetes 1.29,
As a Beta feature in Kubernetes 1.30,
a [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
named `LoadBalancerIPMode` allows you to set the `.status.loadBalancer.ingress.ipMode`
for a Service with `type` set to `LoadBalancer`.
@ -980,6 +990,35 @@ to control how Kubernetes routes traffic to healthy (“ready”) backends.
See [Traffic Policies](/docs/reference/networking/virtual-ips/#traffic-policies) for more details.
### Trafic distribution
{{< feature-state feature_gate_name="ServiceTrafficDistribution" >}}
The `.spec.trafficDistribution` field provides another way to influence traffic
routing within a Kubernetes Service. While traffic policies focus on strict
semantic guarantees, traffic distribution allows you to express _preferences_
(such as routing to topologically closer endpoints). This can help optimize for
performance, cost, or reliability. This optional field can be used if you have
enabled the `ServiceTrafficDistribution` [feature
gate](/docs/reference/command-line-tools-reference/feature-gates/) for your
cluster and all of its nodes. In Kubernetes {{< skew currentVersion >}}, the
following field value is supported:
`PreferClose`
: Indicates a preference for routing traffic to endpoints that are topologically
proximate to the client. The interpretation of "topologically proximate" may
vary across implementations and could encompass endpoints within the same
node, rack, zone, or even region. Setting this value gives implementations
permission to make different tradeoffs, e.g. optimizing for proximity rather
than equal distribution of load. Users should not set this value if such
tradeoffs are not acceptable.
If the field is not set, the implementation will apply its default routing strategy.
See [Traffic
Distribution](/docs/reference/networking/virtual-ips/#traffic-distribution) for
more details
### Session stickiness
If you want to make sure that connections from a particular client are passed to

View File

@ -198,3 +198,8 @@ yet cover some relevant and plausible situations.
## {{% heading "whatsnext" %}}
* Follow the [Connecting Applications with Services](/docs/tutorials/services/connect-applications-service/) tutorial
* Learn about the
[trafficDistribution](/docs/concepts/services-networking/service/#trafic-distribution)
field, which is closely related to the `service.kubernetes.io/topology-mode`
annotation and provides flexible options for traffic routing within
Kubernetes.

View File

@ -509,30 +509,33 @@ PersistentVolume types are implemented as plugins. Kubernetes currently supports
mounted on nodes.
* [`nfs`](/docs/concepts/storage/volumes/#nfs) - Network File System (NFS) storage
The following types of PersistentVolume are deprecated.
This means that support is still available but will be removed in a future Kubernetes release.
The following types of PersistentVolume are deprecated but still available.
If you are using these volume types except for `flexVolume`, `cephfs` and `rbd`,
please install corresponding CSI drivers.
* [`awsElasticBlockStore`](/docs/concepts/storage/volumes/#awselasticblockstore) - AWS Elastic Block Store (EBS)
(**migration on by default** starting v1.23)
* [`azureDisk`](/docs/concepts/storage/volumes/#azuredisk) - Azure Disk
(**migration on by default** starting v1.23)
* [`azureFile`](/docs/concepts/storage/volumes/#azurefile) - Azure File
(**deprecated** in v1.21)
* [`flexVolume`](/docs/concepts/storage/volumes/#flexvolume) - FlexVolume
(**deprecated** in v1.23)
* [`portworxVolume`](/docs/concepts/storage/volumes/#portworxvolume) - Portworx volume
(**deprecated** in v1.25)
* [`vsphereVolume`](/docs/concepts/storage/volumes/#vspherevolume) - vSphere VMDK volume
(**deprecated** in v1.19)
(**migration on by default** starting v1.24)
* [`cephfs`](/docs/concepts/storage/volumes/#cephfs) - CephFS volume
(**deprecated** in v1.28)
(**deprecated** starting v1.28, no migration plan, support will be removed in a future release)
* [`cinder`](/docs/concepts/storage/volumes/#cinder) - Cinder (OpenStack block storage)
(**migration on by default** starting v1.21)
* [`flexVolume`](/docs/concepts/storage/volumes/#flexvolume) - FlexVolume
(**deprecated** starting v1.23, no migration plan and no plan to remove support)
* [`gcePersistentDisk`](/docs/concepts/storage/volumes/#gcePersistentDisk) - GCE Persistent Disk
(**migration on by default** starting v1.23)
* [`portworxVolume`](/docs/concepts/storage/volumes/#portworxvolume) - Portworx volume
(**deprecated** starting v1.25)
* [`rbd`](/docs/concepts/storage/volumes/#rbd) - Rados Block Device (RBD) volume
(**deprecated** in v1.28)
(**deprecated** starting v1.28, no migration plan, support will be removed in a future release)
* [`vsphereVolume`](/docs/concepts/storage/volumes/#vspherevolume) - vSphere VMDK volume
(**migration on by default** starting v1.25)
Older versions of Kubernetes also supported the following in-tree PersistentVolume types:
* [`awsElasticBlockStore`](/docs/concepts/storage/volumes/#awselasticblockstore) - AWS Elastic Block Store (EBS)
(**not available** in v1.27)
* [`azureDisk`](/docs/concepts/storage/volumes/#azuredisk) - Azure Disk
(**not available** in v1.27)
* [`cinder`](/docs/concepts/storage/volumes/#cinder) - Cinder (OpenStack block storage)
(**not available** in v1.26)
* `photonPersistentDisk` - Photon controller persistent disk.
(**not available** starting v1.15)
* `scaleIO` - ScaleIO volume.

View File

@ -65,12 +65,14 @@ a different volume.
Kubernetes supports several types of volumes.
### awsElasticBlockStore (removed) {#awselasticblockstore}
### awsElasticBlockStore (deprecated) {#awselasticblockstore}
<!-- maintenance note: OK to remove all mention of awsElasticBlockStore once the v1.27 release of
Kubernetes has gone out of support -->
Kubernetes {{< skew currentVersion >}} does not include a `awsElasticBlockStore` volume type.
In Kubernetes {{< skew currentVersion >}}, all operations for the in-tree `awsElasticBlockStore` type
are redirected to the `ebs.csi.aws.com` {{< glossary_tooltip text="CSI" term_id="csi" >}} driver.
The AWSElasticBlockStore in-tree storage driver was deprecated in the Kubernetes v1.19 release
and then removed entirely in the v1.27 release.
@ -78,12 +80,13 @@ and then removed entirely in the v1.27 release.
The Kubernetes project suggests that you use the [AWS EBS](https://github.com/kubernetes-sigs/aws-ebs-csi-driver) third party
storage driver instead.
### azureDisk (removed) {#azuredisk}
### azureDisk (deprecated) {#azuredisk}
<!-- maintenance note: OK to remove all mention of azureDisk once the v1.27 release of
Kubernetes has gone out of support -->
Kubernetes {{< skew currentVersion >}} does not include a `azureDisk` volume type.
In Kubernetes {{< skew currentVersion >}}, all operations for the in-tree `azureDisk` type
are redirected to the `disk.csi.azure.com` {{< glossary_tooltip text="CSI" term_id="csi" >}} driver.
The AzureDisk in-tree storage driver was deprecated in the Kubernetes v1.19 release
and then removed entirely in the v1.27 release.
@ -121,7 +124,7 @@ Azure File CSI driver does not support using same volume with different fsgroups
To disable the `azureFile` storage plugin from being loaded by the controller manager
and the kubelet, set the `InTreePluginAzureFileUnregister` flag to `true`.
### cephfs
### cephfs (deprecated) {#cephfs}
{{< feature-state for_k8s_version="v1.28" state="deprecated" >}}
{{< note >}}
@ -142,12 +145,13 @@ You must have your own Ceph server running with the share exported before you ca
See the [CephFS example](https://github.com/kubernetes/examples/tree/master/volumes/cephfs/) for more details.
### cinder (removed) {#cinder}
### cinder (deprecated) {#cinder}
<!-- maintenance note: OK to remove all mention of cinder once the v1.26 release of
Kubernetes has gone out of support -->
Kubernetes {{< skew currentVersion >}} does not include a `cinder` volume type.
In Kubernetes {{< skew currentVersion >}}, all operations for the in-tree `cinder` type
are redirected to the `cinder.csi.openstack.org` {{< glossary_tooltip text="CSI" term_id="csi" >}} driver.
The OpenStack Cinder in-tree storage driver was deprecated in the Kubernetes v1.11 release
and then removed entirely in the v1.26 release.
@ -298,9 +302,10 @@ beforehand so that Kubernetes hosts can access them.
See the [fibre channel example](https://github.com/kubernetes/examples/tree/master/staging/volumes/fibre_channel)
for more details.
### gcePersistentDisk (removed) {#gcepersistentdisk}
### gcePersistentDisk (deprecated) {#gcepersistentdisk}
Kubernetes {{< skew currentVersion >}} does not include a `gcePersistentDisk` volume type.
In Kubernetes {{< skew currentVersion >}}, all operations for the in-tree `gcePersistentDisk` type
are redirected to the `pd.csi.storage.gke.io` {{< glossary_tooltip text="CSI" term_id="csi" >}} driver.
The `gcePersistentDisk` in-tree storage driver was deprecated in the Kubernetes v1.17 release
and then removed entirely in the v1.28 release.
@ -1225,7 +1230,65 @@ in `containers[*].volumeMounts`. Its values are:
(unmounted) by the containers on termination.
{{< /warning >}}
## Read-only mounts
A mount can be made read-only by setting the `.spec.containers[].volumeMounts[].readOnly`
field to `true`.
This does not make the volume itself read-only, but that specific container will
not be able to write to it.
Other containers in the Pod may mount the same volume as read-write.
On Linux, read-only mounts are not recursively read-only by default.
For example, consider a Pod which mounts the hosts `/mnt` as a `hostPath` volume. If
there is another filesystem mounted read-write on `/mnt/<SUBMOUNT>` (such as tmpfs,
NFS, or USB storage), the volume mounted into the container(s) will also have a writeable
`/mnt/<SUBMOUNT>`, even if the mount itself was specified as read-only.
### Recursive read-only mounts
{{< feature-state feature_gate_name="RecursiveReadOnlyMounts" >}}
Recursive read-only mounts can be enabled by setting the
`RecursiveReadOnlyMounts` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
for kubelet and kube-apiserver, and setting the `.spec.containers[].volumeMounts[].recursiveReadOnly`
field for a pod.
The allowed values are:
* `Disabled` (default): no effect.
* `Enabled`: makes the mount recursively read-only.
Needs all the following requirements to be satisfied:
* `readOnly` is set to `true`
* `mountPropagation` is unset, or, set to `None`
* The host is running with Linux kernel v5.12 or later
* The [CRI-level](/docs/concepts/architecture/cri) container runtime supports recursive read-only mounts
* The OCI-level container runtime supports recursive read-only mounts.
It will fail if any of these is not true.
* `IfPossible`: attempts to apply `Enabled`, and falls back to `Disabled`
if the feature is not supported by the kernel or the runtime class.
Example:
{{% code_sample file="storage/rro.yaml" %}}
When this property is recognized by kubelet and kube-apiserver,
the `.status.containerStatuses[].volumeMounts[].recursiveReadOnly` field is set to either
`Enabled` or `Disabled`.
#### Implementations {#implementations-rro}
{{% thirdparty-content %}}
The following container runtimes are known to support recursive read-only mounts.
CRI-level:
- [containerd](https://containerd.io/), since v2.0
OCI-level:
- [runc](https://runc.io/), since v1.1
- [crun](https://github.com/containers/crun), since v1.8.6
## {{% heading "whatsnext" %}}

View File

@ -553,6 +553,62 @@ terminating Pods only once these Pods reach the terminal `Failed` phase. This be
to `podReplacementPolicy: Failed`. For more information, see [Pod replacement policy](#pod-replacement-policy).
{{< /note >}}
## Success policy {#success-policy}
{{< feature-state feature_gate_name="JobSuccessPolicy" >}}
{{< note >}}
You can only configure a success policy for an Indexed Job if you have the
`JobSuccessPolicy` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
enabled in your cluster.
{{< /note >}}
When creating an Indexed Job, you can define when a Job can be declared as succeeded using a `.spec.successPolicy`,
based on the pods that succeeded.
By default, a Job succeeds when the number of succeeded Pods equals `.spec.completions`.
These are some situations where you might want additional control for declaring a Job succeeded:
* When running simulations with different parameters,
you might not need all the simulations to succeed for the overall Job to be successful.
* When following a leader-worker pattern, only the success of the leader determines the success or
failure of a Job. Examples of this are frameworks like MPI and PyTorch etc.
You can configure a success policy, in the `.spec.successPolicy` field,
to meet the above use cases. This policy can handle Job success based on the
succeeded pods. After the Job meets the success policy, the job controller terminates the lingering Pods.
A success policy is defined by rules. Each rule can take one of the following forms:
* When you specify the `succeededIndexes` only,
once all indexes specified in the `succeededIndexes` succeed, the job controller marks the Job as succeeded.
The `succeededIndexes` must be a list of intervals between 0 and `.spec.completions-1`.
* When you specify the `succeededCount` only,
once the number of succeeded indexes reaches the `succeededCount`, the job controller marks the Job as succeeded.
* When you specify both `succeededIndexes` and `succeededCount`,
once the number of succeeded indexes from the subset of indexes specified in the `succeededIndexes` reaches the `succeededCount`,
the job controller marks the Job as succeeded.
Note that when you specify multiple rules in the `.spec.succeessPolicy.rules`,
the job controller evaluates the rules in order. Once the Job meets a rule, the job controller ignores remaining rules.
Here is a manifest for a Job with `successPolicy`:
{{% code_sample file="/controllers/job-success-policy.yaml" %}}
In the example above, both `succeededIndexes` and `succeededCount` have been specified.
Therefore, the job controller will mark the Job as succeeded and terminate the lingering Pods
when either of the specified indexes, 0, 2, or 3, succeed.
The Job that meets the success policy gets the `SuccessCriteriaMet` condition.
After the removal of the lingering Pods is issued, the Job gets the `Complete` condition.
Note that the `succeededIndexes` is represented as intervals separated by a hyphen.
The number are listed in represented by the first and last element of the series, separated by a hyphen.
{{< note >}}
When you specify both a success policy and some terminating policies such as `.spec.backoffLimit` and `.spec.podFailurePolicy`,
once the Job meets either policy, the job controller respects the terminating policy and ignores the success policy.
{{< /note >}}
## Job termination and cleanup
When a Job completes, no more Pods are created, but the Pods are [usually](#pod-backoff-failure-policy) not deleted either.
@ -1009,6 +1065,50 @@ status:
terminating: 3 # three Pods are terminating and have not yet reached the Failed phase
```
### Delegation of managing a Job object to external controller
{{< feature-state feature_gate_name="JobManagedBy" >}}
{{< note >}}
You can only set the `managedBy` field on Jobs if you enable the `JobManagedBy`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
(disabled by default).
{{< /note >}}
This feature allows you to disable the built-in Job controller, for a specific
Job, and delegate reconciliation of the Job to an external controller.
You indicate the controller that reconciles the Job by setting a custom value
for the `spec.managedBy` field - any value
other than `kubernetes.io/job-controller`. The value of the field is immutable.
{{< note >}}
When using this feature, make sure the controller indicated by the field is
installed, otherwise the Job may not be reconciled at all.
{{< /note >}}
{{< note >}}
When developing an external Job controller be aware that your controller needs
to operate in a fashion conformant with the definitions of the API spec and
status fields of the Job object.
Please review these in detail in the [Job API](/docs/reference/kubernetes-api/workload-resources/job-v1/).
We also recommend that you run the e2e conformance tests for the Job object to
verify your implementation.
Finally, when developing an external Job controller make sure it does not use the
`batch.kubernetes.io/job-tracking` finalizer, reserved for the built-in controller.
{{< /note >}}
{{< warning >}}
If you are considering to disable the `JobManagedBy` feature gate, or to
downgrade the cluster to a version without the feature gate enabled, check if
there are jobs with a custom value of the `spec.managedBy` field. If there
are such jobs, there is a risk that they might be reconciled by two controllers
after the operation: the built-in Job controller and the external controller
indicated by the field value.
{{< /warning >}}
## Alternatives
### Bare Pods

View File

@ -77,7 +77,6 @@ The following information is available through environment variables
`status.hostIPs`
: the IP addresses is a dual-stack version of `status.hostIP`, the first is always the same as `status.hostIP`.
The field is available if you enable the `PodHostIPs` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
`status.podIP`
: the pod's primary IP address (usually, its IPv4 address)

View File

@ -331,8 +331,10 @@ for resource usage apply:
Quota and limits are applied based on the effective Pod request and
limit.
Pod level control groups (cgroups) are based on the effective Pod request and
limit, the same as the scheduler.
### Init containers and Linux cgroups {#cgroups}
On Linux, resource allocations for Pod level control groups (cgroups) are based on the effective Pod
request and limit, the same as the scheduler.
{{< comment >}}
This section also present under [sidecar containers](/docs/concepts/workloads/pods/sidecar-containers/) page.

View File

@ -9,21 +9,43 @@ weight: 50
Sidecar containers are the secondary containers that run along with the main
application container within the same {{< glossary_tooltip text="Pod" term_id="pod" >}}.
These containers are used to enhance or to extend the functionality of the main application
container by providing additional services, or functionality such as logging, monitoring,
These containers are used to enhance or to extend the functionality of the primary _app
container_ by providing additional services, or functionality such as logging, monitoring,
security, or data synchronization, without directly altering the primary application code.
Typically, you only have one app container in a Pod. For example, if you have a web
application that requires a local webserver, the local webserver is a sidecar and the
web application itself is the app container.
<!-- body -->
## Enabling sidecar containers
## Sidecar containers in Kubernetes {#pod-sidecar-containers}
Enabled by default with Kubernetes 1.29, a
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) named
`SidecarContainers` allows you to specify a `restartPolicy` for containers listed in a
Pod's `initContainers` field. These restartable _sidecar_ containers are independent with
other [init containers](/docs/concepts/workloads/pods/init-containers/) and main
application container within the same pod. These can be started, stopped, or restarted
without affecting the main application container and other init containers.
Kubernetes implements sidecar containers as a special case of
[init containers](/docs/concepts/workloads/pods/init-containers/); sidecar containers remain
running after Pod startup. This document uses the term _regular init containers_ to clearly
refer to containers that only run during Pod startup.
Provided that your cluster has the `SidecarContainers`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) enabled
(the feature is active by default since Kubernetes v1.29), you can specify a `restartPolicy`
for containers listed in a Pod's `initContainers` field.
These restartable _sidecar_ containers are independent from other init containers and from
the main application container(s) within the same pod.
These can be started, stopped, or restarted without effecting the main application container
and other init containers.
You can also run a Pod with multiple containers that are not marked as init or sidecar
containers. This is appropriate if the containers within the Pod are required for the
Pod to work overall, but you don't need to control which containers start or stop first.
You could also do this if you need to support older versions of Kubernetes that don't
support a container-level `restartPolicy` field.
### Example application {#sidecar-example}
Here's an example of a Deployment with two containers, one of which is a sidecar:
{{% code_sample language="yaml" file="application/deployment-sidecar.yaml" %}}
## Sidecar containers and Pod lifecycle
@ -35,8 +57,8 @@ If a `readinessProbe` is specified for this init container, its result will be u
to determine the `ready` state of the Pod.
Since these containers are defined as init containers, they benefit from the same
ordering and sequential guarantees as other init containers, allowing them to
be mixed with other init containers into complex Pod initialization flows.
ordering and sequential guarantees as regular init containers, allowing you to mix
sidecar containers with regular init containers for complex Pod initialization flows.
Compared to regular init containers, sidecars defined within `initContainers` continue to
run after they have started. This is important when there is more than one entry inside
@ -46,30 +68,28 @@ next init container from the ordered `.spec.initContainers` list.
That status either becomes true because there is a process running in the
container and no startup probe defined, or as a result of its `startupProbe` succeeding.
Here's an example of a Deployment with two containers, one of which is a sidecar:
### Jobs with sidecar containers
{{% code_sample language="yaml" file="application/deployment-sidecar.yaml" %}}
This feature is also useful for running Jobs with sidecars, as the sidecar
container will not prevent the Job from completing after the main container
has finished.
If you define a Job that uses sidecar using Kubernetes-style init containers,
the sidecar container in each Pod does not prevent the Job from completing after the
main container has finished.
Here's an example of a Job with two containers, one of which is a sidecar:
{{% code_sample language="yaml" file="application/job/job-sidecar.yaml" %}}
## Differences from regular containers
## Differences from application containers
Sidecar containers run alongside regular containers in the same pod. However, they do not
Sidecar containers run alongside _app containers_ in the same pod. However, they do not
execute the primary application logic; instead, they provide supporting functionality to
the main application.
Sidecar containers have their own independent lifecycles. They can be started, stopped,
and restarted independently of regular containers. This means you can update, scale, or
and restarted independently of app containers. This means you can update, scale, or
maintain sidecar containers without affecting the primary application.
Sidecar containers share the same network and storage namespaces with the primary
container This co-location allows them to interact closely and share resources.
container. This co-location allows them to interact closely and share resources.
## Differences from init containers
@ -112,8 +132,10 @@ for resource usage apply:
Quota and limits are applied based on the effective Pod request and
limit.
Pod level control groups (cgroups) are based on the effective Pod request and
limit, the same as the scheduler.
### Sidecar containers and Linux cgroups {#cgroups}
On Linux, resource allocations for Pod level control groups (cgroups) are based on the effective Pod
request and limit, the same as the scheduler.
## {{% heading "whatsnext" %}}

View File

@ -7,7 +7,7 @@ min-kubernetes-server-version: v1.25
---
<!-- overview -->
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
{{< feature-state for_k8s_version="v1.30" state="beta" >}}
This page explains how user namespaces are used in Kubernetes pods. A user
namespace isolates the user running inside the container from the one
@ -46,7 +46,26 @@ tmpfs, Secrets use a tmpfs, etc.)
Some popular filesystems that support idmap mounts in Linux 6.3 are: btrfs,
ext4, xfs, fat, tmpfs, overlayfs.
In addition, support is needed in the
In addition, the container runtime and its underlying OCI runtime must support
user namespaces. The following OCI runtimes offer support:
* [crun](https://github.com/containers/crun) version 1.9 or greater (it's recommend version 1.13+).
<!-- ideally, update this if a newer minor release of runc comes out, whether or not it includes the idmap support -->
{{< note >}}
Many OCI runtimes do not include the support needed for using user namespaces in
Linux pods. If you use a managed Kubernetes, or have downloaded it from packages
and set it up, it's likely that nodes in your cluster use a runtime that doesn't
include this support. For example, the most widely used OCI runtime is `runc`,
and version `1.1.z` of runc doesn't support all the features needed by the
Kubernetes implementation of user namespaces.
If there is a newer release of runc than 1.1 available for use, check its
documentation and release notes for compatibility (look for idmap mounts support
in particular, because that is the missing feature).
{{< /note >}}
To use user namespaces with Kubernetes, you also need to use a CRI
{{< glossary_tooltip text="container runtime" term_id="container-runtime" >}}
to use this feature with Kubernetes pods:
@ -137,20 +156,67 @@ use, see `man 7 user_namespaces`.
## Set up a node to support user namespaces
It is recommended that the host's files and host's processes use UIDs/GIDs in
the range of 0-65535.
By default, the kubelet assigns pods UIDs/GIDs above the range 0-65535, based on
the assumption that the host's files and processes use UIDs/GIDs within this
range, which is standard for most Linux distributions. This approach prevents
any overlap between the UIDs/GIDs of the host and those of the pods.
The kubelet will assign UIDs/GIDs higher than that to pods. Therefore, to
guarantee as much isolation as possible, the UIDs/GIDs used by the host's files
and host's processes should be in the range 0-65535.
Avoiding the overlap is important to mitigate the impact of vulnerabilities such
as [CVE-2021-25741][CVE-2021-25741], where a pod can potentially read arbitrary
files in the host. If the UIDs/GIDs of the pod and the host don't overlap, it is
limited what a pod would be able to do: the pod UID/GID won't match the host's
file owner/group.
Note that this recommendation is important to mitigate the impact of CVEs like
[CVE-2021-25741][CVE-2021-25741], where a pod can potentially read arbitrary
files in the hosts. If the UIDs/GIDs of the pod and the host don't overlap, it
is limited what a pod would be able to do: the pod UID/GID won't match the
host's file owner/group.
The kubelet can use a custom range for user IDs and group IDs for pods. To
configure a custom range, the node needs to have:
* A user `kubelet` in the system (you cannot use any other username here)
* The binary `getsubids` installed (part of [shadow-utils][shadow-utils]) and
in the `PATH` for the kubelet binary.
* A configuration of subordinate UIDs/GIDs for the `kubelet` user (see
[`man 5 subuid`](https://man7.org/linux/man-pages/man5/subuid.5.html) and
[`man 5 subgid`](https://man7.org/linux/man-pages/man5/subgid.5.html)).
This setting only gathers the UID/GID range configuration and does not change
the user executing the `kubelet`.
You must follow some constraints for the subordinate ID range that you assign
to the `kubelet` user:
* The subordinate user ID, that starts the UID range for Pods, **must** be a
multiple of 65536 and must also be greater than or equal to 65536. In other
words, you cannot use any ID from the range 0-65535 for Pods; the kubelet
imposes this restriction to make it difficult to create an accidentally insecure
configuration.
* The subordinate ID count must be a multiple of 65536
* The subordinate ID count must be at least `65536 x <maxPods>` where `<maxPods>`
is the maximum number of pods that can run on the node.
* You must assign the same range for both user IDs and for group IDs, It doesn't
matter if other users have user ID ranges that don't align with the group ID
ranges.
* None of the assigned ranges should overlap with any other assignment.
* The subordinate configuration must be only one line. In other words, you can't
have multiple ranges.
For example, you could define `/etc/subuid` and `/etc/subgid` to both have
these entries for the `kubelet` user:
```
# The format is
# name:firstID:count of IDs
# where
# - firstID is 65536 (the minimum value possible)
# - count of IDs is 110 (default limit for number of) * 65536
kubelet:65536:7208960
```
[CVE-2021-25741]: https://github.com/kubernetes/kubernetes/issues/104980
[shadow-utils]: https://github.com/shadow-maint/shadow
## Integration with Pod security admission checks

View File

@ -792,49 +792,6 @@ defined in the corresponding RuntimeClass.
See also [Pod Overhead](/docs/concepts/scheduling-eviction/pod-overhead/)
for more information.
### SecurityContextDeny {#securitycontextdeny}
**Type**: Validating.
{{< feature-state for_k8s_version="v1.27" state="deprecated" >}}
{{< caution >}}
The Kubernetes project recommends that you **do not use** the
`SecurityContextDeny` admission controller.
The `SecurityContextDeny` admission controller plugin is deprecated and disabled
by default. It will be removed in a future version. If you choose to enable the
`SecurityContextDeny` admission controller plugin, you must enable the
`SecurityContextDeny` feature gate as well.
The `SecurityContextDeny` admission plugin is deprecated because it is outdated
and incomplete; it may be unusable or not do what you would expect. As
implemented, this plugin is unable to restrict all security-sensitive attributes
of the Pod API. For example, the `privileged` and `ephemeralContainers` fields
were never restricted by this plugin.
The [Pod Security Admission](/docs/concepts/security/pod-security-admission/)
plugin enforcing the [Pod Security Standards](/docs/concepts/security/pod-security-standards/)
`Restricted` profile captures what this plugin was trying to achieve in a better
and up-to-date way.
{{< /caution >}}
This admission controller will deny any Pod that attempts to set the following
[SecurityContext](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context)
fields:
- `.spec.securityContext.supplementalGroups`
- `.spec.securityContext.seLinuxOptions`
- `.spec.securityContext.runAsUser`
- `.spec.securityContext.fsGroup`
- `.spec.(init)Containers[*].securityContext.seLinuxOptions`
- `.spec.(init)Containers[*].securityContext.runAsUser`
For more historical context on this plugin, see
[The birth of PodSecurityPolicy](/blog/2022/08/23/podsecuritypolicy-the-historical-context/#the-birth-of-podsecuritypolicy)
from the Kubernetes blog article about PodSecurityPolicy and its removal. The
article details the PodSecurityPolicy historical context and the birth of the
`securityContext` field for Pods.
### ServiceAccount {#serviceaccount}
**Type**: Mutating and Validating.

View File

@ -329,19 +329,42 @@ To enable the plugin, configure the following flags on the API server:
| `--oidc-ca-file` | The path to the certificate for the CA that signed your identity provider's web certificate. Defaults to the host's root CAs. | `/etc/kubernetes/ssl/kc-ca.pem` | No |
| `--oidc-signing-algs` | The signing algorithms accepted. Default is "RS256". | `RS512` | No |
##### Using Authentication Configuration
##### Authentication configuration from a file {#using-authentication-configuration}
{{< feature-state for_k8s_version="v1.29" state="alpha" >}}
{{< feature-state feature_gate_name="StructuredAuthenticationConfiguration" >}}
JWT Authenticator is an authenticator to authenticate Kubernetes users using JWT compliant tokens. The authenticator will attempt to
parse a raw ID token, verify it's been signed by the configured issuer. The public key to verify the signature is discovered from the issuer's public endpoint using OIDC discovery.
The API server can be configured to use a JWT authenticator via the `--authentication-config` flag. This flag takes a path to a file containing the `AuthenticationConfiguration`. An example configuration is provided below.
To use this config, the `StructuredAuthenticationConfiguration` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
has to be enabled.
The minimum valid JWT payload must contain the following claims:
```yaml
{
"iss": "https://example.com", // must match the issuer.url
"aud": ["my-app"], // at least one of the entries in issuer.audiences must match the "aud" claim in presented JWTs.
"exp": 1234567890, // token expiration as Unix time (the number of seconds elapsed since January 1, 1970 UTC)
"<username-claim>": "user" // this is the username claim configured in the claimMappings.username.claim or claimMappings.username.expression
}
```
The configuration file approach allows you to configure multiple JWT authenticators, each with a unique `issuer.url` and `issuer.discoveryURL`. The configuration file even allows you to specify [CEL](/docs/reference/using-api/cel/)
expressions to map claims to user attributes, and to validate claims and user information. The API server also automatically reloads the authenticators when the configuration file is modified. You can use
`apiserver_authentication_config_controller_automatic_reload_last_timestamp_seconds` metric to monitor the last time the configuration was reloaded by the API server.
You must specify the path to the authentication configuration using the `--authentication-config` flag on the API server. If you want to use command line flags instead of the configuration file, those will continue to work as-is.
To access the new capabilities like configuring multiple authenticators, setting multiple audiences for an issuer, switch to using the configuration file.
For Kubernetes v{{< skew currentVersion >}}, the structured authentication configuration file format
is beta-level, and the mechanism for using that configuration is also beta. Provided you didn't specifically
disable the `StructuredAuthenticationConfiguration`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) for your cluster,
you can turn on structured authentication by specifying the `--authentication-config` command line
argument to the kube-apiserver. An example of the structured authentication configuration file is shown below.
{{< note >}}
When the feature is enabled, setting both `--authentication-config` and any of the `--oidc-*` flags will result in an error. If you want to use the feature, you have to remove the `--oidc-*` flags and use the configuration file instead.
If you specify `--authentication-config` along with any of the `--oidc-*` command line arguments, this is
a misconfiguration. In this situation, the API server reports an error and then immediately exits.
If you want to switch to using structured authentication configuration, you have to remove the `--oidc-*`
command line arguments, and use the configuration file instead.
{{< /note >}}
```yaml
@ -350,18 +373,37 @@ When the feature is enabled, setting both `--authentication-config` and any of t
# CAUTION: this is an example configuration.
# Do not use this for your own cluster!
#
apiVersion: apiserver.config.k8s.io/v1alpha1
apiVersion: apiserver.config.k8s.io/v1beta1
kind: AuthenticationConfiguration
# list of authenticators to authenticate Kubernetes users using JWT compliant tokens.
# the maximum number of allowed authenticators is 64.
jwt:
- issuer:
# url must be unique across all authenticators.
# url must not conflict with issuer configured in --service-account-issuer.
url: https://example.com # Same as --oidc-issuer-url.
# discoveryURL, if specified, overrides the URL used to fetch discovery
# information instead of using "{url}/.well-known/openid-configuration".
# The exact value specified is used, so "/.well-known/openid-configuration"
# must be included in discoveryURL if needed.
#
# The "issuer" field in the fetched discovery information must match the "issuer.url" field
# in the AuthenticationConfiguration and will be used to validate the "iss" claim in the presented JWT.
# This is for scenarios where the well-known and jwks endpoints are hosted at a different
# location than the issuer (such as locally in the cluster).
# discoveryURL must be different from url if specified and must be unique across all authenticators.
discoveryURL: https://discovery.example.com/.well-known/openid-configuration
# PEM encoded CA certificates used to validate the connection when fetching
# discovery information. If not set, the system verifier will be used.
# Same value as the content of the file referenced by the --oidc-ca-file flag.
certificateAuthority: <PEM encoded CA certificates>
# audiences is the set of acceptable audiences the JWT must be issued to.
# At least one of the entries must match the "aud" claim in presented JWTs.
audiences:
- my-app # Same as --oidc-client-id.
- my-other-app
# this is required to be set to "MatchAny" when multiple audiences are specified.
audienceMatchPolicy: MatchAny
# rules applied to validate token claims to authenticate users.
claimValidationRules:
# Same as --oidc-required-claim key=value.
@ -387,6 +429,13 @@ jwt:
prefix: ""
# Mutually exclusive with username.claim and username.prefix.
# expression is a CEL expression that evaluates to a string.
#
# 1. If username.expression uses 'claims.email', then 'claims.email_verified' must be used in
# username.expression or extra[*].valueExpression or claimValidationRules[*].expression.
# An example claim validation rule expression that matches the validation automatically
# applied when username.claim is set to 'email' is 'claims.?email_verified.orValue(true)'.
# 2. If the username asserted based on username.expression is the empty string, the authentication
# request will fail.
expression: 'claims.username + ":external-user"'
# groups represents an option for the groups attribute.
groups:
@ -446,7 +495,7 @@ jwt:
{{< tabs name="example_configuration" >}}
{{% tab name="Valid token" %}}
```yaml
apiVersion: apiserver.config.k8s.io/v1alpha1
apiVersion: apiserver.config.k8s.io/v1beta1
kind: AuthenticationConfiguration
jwt:
- issuer:
@ -506,7 +555,7 @@ jwt:
{{% /tab %}}
{{% tab name="Fails claim validation" %}}
```yaml
apiVersion: apiserver.config.k8s.io/v1alpha1
apiVersion: apiserver.config.k8s.io/v1beta1
kind: AuthenticationConfiguration
jwt:
- issuer:
@ -554,7 +603,7 @@ jwt:
{{% /tab %}}
{{% tab name="Fails user validation" %}}
```yaml
apiVersion: apiserver.config.k8s.io/v1alpha1
apiVersion: apiserver.config.k8s.io/v1beta1
kind: AuthenticationConfiguration
jwt:
- issuer:
@ -618,12 +667,10 @@ jwt:
{{% /tab %}}
{{< /tabs >}}
Importantly, the API server is not an OAuth2 client, rather it can only be
configured to trust a single issuer. This allows the use of public providers,
such as Google, without trusting credentials issued to third parties. Admins who
wish to utilize multiple OAuth clients should explore providers which support the
`azp` (authorized party) claim, a mechanism for allowing one client to issue
tokens on behalf of another.
###### Limitations
1. Distributed claims do not work via [CEL](/docs/reference/using-api/cel/) expressions.
1. Egress selector configuration is not supported for calls to `issuer.url` and `issuer.discoveryURL`.
Kubernetes does not provide an OpenID Connect Identity Provider.
You can use an existing public OpenID Connect Identity Provider (such as Google, or
@ -635,9 +682,15 @@ Tremolo Security's [OpenUnison](https://openunison.github.io/).
For an identity provider to work with Kubernetes it must:
1. Support [OpenID connect discovery](https://openid.net/specs/openid-connect-discovery-1_0.html); not all do.
1. Run in TLS with non-obsolete ciphers
1. Have a CA signed certificate (even if the CA is not a commercial CA or is self signed)
1. Support [OpenID connect discovery](https://openid.net/specs/openid-connect-discovery-1_0.html)
The public key to verify the signature is discovered from the issuer's public endpoint using OIDC discovery.
If you're using the authentication configuration file, the identity provider doesn't need to publicly expose the discovery endpoint.
You can host the discovery endpoint at a different location than the issuer (such as locally in the cluster) and specify the
`issuer.discoveryURL` in the configuration file.
2. Run in TLS with non-obsolete ciphers
3. Have a CA signed certificate (even if the CA is not a commercial CA or is self signed)
A note about requirement #3 above, requiring a CA signed certificate. If you deploy your own
identity provider (as opposed to one of the cloud providers like Google or Microsoft) you MUST

View File

@ -211,33 +211,31 @@ so an earlier module has higher priority to allow or deny a request.
## Configuring the API Server using an Authorization Config File
{{< feature-state state="alpha" for_k8s_version="v1.29" >}}
{{< feature-state feature_gate_name="StructuredAuthorizationConfiguration" >}}
The Kubernetes API server's authorizer chain can be configured using a
configuration file.
You specify the path to that authorization configuration using the
`--authorization-config` command line argument. This feature enables
creation of authorization chains with multiple webhooks with well-defined
parameters that validate requests in a certain order and enables fine grained
control - such as explicit Deny on failures. An example configuration with
all possible values is provided below.
This feature enables the creation of authorization chains with multiple webhooks with well-defined parameters that validate requests in a particular order and allows fine-grained control such as explicit Deny on failures. The configuration file approach even allows you to specify [CEL](/docs/reference/using-api/cel/) rules to pre-filter requests before they are dispatched to webhooks, helping you to prevent unnecessary invocations. The API server also automatically reloads the authorizer chain when the configuration file is modified. An example configuration with all possible values is provided below.
In order to customise the authorizer chain, you need to enable the
`StructuredAuthorizationConfiguration` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
You must specify the path to the authorization configuration using the `--authorization-config`command line argument. If you want to keep using command line flags instead of a configuration file, those will continue to work as-is. To gain access to new authorization webhook capabilities like multiple webhooks, failure policy, and pre-filter rules, switch to putting options in an `--authorization-config` file.
Note: When the feature is enabled, setting both `--authorization-config` and
Starting Kubernetes v{{< skew currentVersion >}}, the configuration file format is
beta-level, and only requires specifying `--authorization-config` since the `StructuredAuthorizationConfiguration` feature gate is enabled by default.
{{< caution >}}
If you want to keep using command line flags to configure authorization instead of a configuration file, those will continue to work as-is.
When the feature is enabled, setting both `--authorization-config` and
configuring an authorization webhook using the `--authorization-mode` and
`--authorization-webhook-*` command line flags is not allowed. If done, there
will be an error and API Server would exit right away.
{{< caution >}}
While the feature is in Alpha/Beta, there is no change if you want to keep on
using command line flags. When the feature goes Beta, the feature flag would
be turned on by default. The feature flag would be removed when feature goes GA.
Authorization Config file reloads when an observed file event occurs or a 1 minute poll is encountered. All non-webhook authorizer types are required to remain unchanged in the file on reload. Reload must not add or remove Node or RBAC
authorizers. They can be reordered, but cannot be added or removed.
When configuring the authorizer chain using a config file, make sure all the
apiserver nodes have the file. Also, take a note of the apiserver configuration
apiserver nodes have the file. Take a note of the apiserver configuration
when upgrading/downgrading the clusters. For example, if upgrading to v1.29+
clusters and using the config file, you would need to make sure the config file
exists before upgrading the cluster. When downgrading to v1.28, you would need
@ -248,9 +246,8 @@ to add the flags back to their bootstrap mechanism.
#
# DO NOT USE THE CONFIG AS IS. THIS IS AN EXAMPLE.
#
apiVersion: apiserver.config.k8s.io/v1alpha1
apiVersion: apiserver.config.k8s.io/v1beta1
kind: AuthorizationConfiguration
# authorizers are defined in order of precedence
authorizers:
- type: Webhook
# Name used to describe the authorizer
@ -283,7 +280,7 @@ authorizers:
# MatchConditionSubjectAccessReviewVersion specifies the SubjectAccessReview
# version the CEL expressions are evaluated against
# Valid values: v1
# Required only if matchConditions are specified, no default value
# Required, no default value
matchConditionSubjectAccessReviewVersion: v1
# Controls the authorization decision when a webhook request fails to
# complete or returns a malformed response or errors evaluating

View File

@ -721,7 +721,7 @@ The `matchPolicy` for an admission webhooks defaults to `Equivalent`.
### Matching requests: `matchConditions`
{{< feature-state state="beta" for_k8s_version="v1.28" >}}
{{< feature-state feature_gate_name="AdmissionWebhookMatchConditions" >}}
You can define _match conditions_ for webhooks if you need fine-grained request filtering. These
conditions are useful if you find that match rules, `objectSelectors` and `namespaceSelectors` still

View File

@ -60,6 +60,102 @@ for a number of reasons:
without many constraints and have namespaced names, such configuration is
usually portable.
## Bound service account tokens
ServiceAccount tokens can be bound to API objects that exist in the kube-apiserver.
This can be used to tie the validity of a token to the existence of another API object.
Supported object types are as follows:
* Pod (used for projected volume mounts, see below)
* Secret (can be used to allow revoking a token by deleting the Secret)
* Node (in v1.30, creating new node-bound tokens is alpha, using existing node-bound tokens is beta)
When a token is bound to an object, the object's `metadata.name` and `metadata.uid` are
stored as extra 'private claims' in the issued JWT.
When a bound token is presented to the kube-apiserver, the service account authenticator
will extract and verify these claims.
If the referenced object no longer exists (or its `metadata.uid` does not match),
the request will not be authenticated.
### Additional metadata in Pod bound tokens
{{< feature-state feature_gate_name="ServiceAccountTokenPodNodeInfo" >}}
When a service account token is bound to a Pod object, additional metadata is also
embedded into the token that indicates the value of the bound pod's `spec.nodeName` field,
and the uid of that Node, if available.
This node information is **not** verified by the kube-apiserver when the token is used for authentication.
It is included so integrators do not have to fetch Pod or Node API objects to check the associated Node name
and uid when inspecting a JWT.
### Verifying and inspecting private claims
The `TokenReview` API can be used to verify and extract private claims from a token:
1. First, assume you have a pod named `test-pod` and a service account named `my-sa`.
2. Create a token that is bound to this Pod:
```shell
kubectl create token my-sa --bound-object-kind="Pod" --bound-object-name="test-pod"
```
3. Copy this token into a new file named `tokenreview.yaml`:
```yaml
apiVersion: authentication.k8s.io/v1
kind: TokenReview
spec:
token: <token from step 2>
```
4. Submit this resource to the apiserver for review:
```shell
kubectl create -o yaml -f tokenreview.yaml # we use '-o yaml' so we can inspect the output
```
You should see an output like below:
```yaml
apiVersion: authentication.k8s.io/v1
kind: TokenReview
metadata:
creationTimestamp: null
spec:
token: <token>
status:
audiences:
- https://kubernetes.default.svc.cluster.local
authenticated: true
user:
extra:
authentication.kubernetes.io/credential-id:
- JTI=7ee52be0-9045-4653-aa5e-0da57b8dccdc
authentication.kubernetes.io/node-name:
- kind-control-plane
authentication.kubernetes.io/node-uid:
- 497e9d9a-47aa-4930-b0f6-9f2fb574c8c6
authentication.kubernetes.io/pod-name:
- test-pod
authentication.kubernetes.io/pod-uid:
- e87dbbd6-3d7e-45db-aafb-72b24627dff5
groups:
- system:serviceaccounts
- system:serviceaccounts:default
- system:authenticated
uid: f8b4161b-2e2b-11e9-86b7-2afc33b31a7e
username: system:serviceaccount:default:my-sa
```
{{< note >}}
Despite using `kubectl create -f` to create this resource, and defining it similar to
other resource types in Kubernetes, TokenReview is a special type and the kube-apiserver
does not actually persist the TokenReview object into etcd.
Hence `kubectl get tokenreview` is not a valid command.
{{< /note >}}
## Bound service account token volume mechanism {#bound-service-account-token-volume}
{{< feature-state feature_gate_name="BoundServiceAccountTokenVolume" >}}

View File

@ -9,7 +9,7 @@ content_type: concept
<!-- overview -->
{{< feature-state state="beta" for_k8s_version="v1.28" >}}
{{< feature-state state="stable" for_k8s_version="v1.30" >}}
This page provides an overview of Validating Admission Policy.

View File

@ -13,6 +13,10 @@ stages:
- stage: beta
defaultValue: true
fromVersion: "1.28"
toVersion: "1.29"
- stage: stable
defaultValue: true
fromVersion: "1.30"
---
Enable [match conditions](/docs/reference/access-authn-authz/extensible-admission-controllers/#matching-requests-matchconditions)
on mutating & validating admission webhooks.

View File

@ -13,6 +13,10 @@ stages:
- stage: beta
defaultValue: true
fromVersion: "1.27"
toVersion: "1.29"
- stage: stable
defaultValue: true
fromVersion: "1.30"
---
Enable a single HTTP endpoint `/discovery/<version>` which
supports native HTTP caching with ETags containing all APIResources known to the API server.

View File

@ -1,4 +1,5 @@
---
# Removed from Kubernetes
title: APISelfSubjectReview
content_type: feature_gate
_build:

View File

@ -13,6 +13,11 @@ stages:
- stage: beta
defaultValue: true
fromVersion: "1.29"
toVersion: "1.29"
- stage: stable
defaultValue: true
fromVersion: "1.30"
---
Enables dual-stack `kubelet --node-ip` with external cloud providers.
See [Configure IPv4/IPv6 dual-stack](/docs/concepts/services-networking/dual-stack/#configure-ipv4-ipv6-dual-stack)

View File

@ -9,6 +9,9 @@ stages:
- stage: alpha
defaultValue: false
fromVersion: "1.24"
- stage: beta
defaultValue: true
fromVersion: "1.30"
---
When you enable this feature gate, Kubernetes components that support
contextual logging add extra detail to log output.
Enables extra details in log output of Kubernetes components that support
contextual logging.

View File

@ -9,6 +9,10 @@ stages:
- stage: alpha
defaultValue: false
fromVersion: "1.28"
toVersion: "1.29"
- stage: beta
defaultValue: true
fromVersion: "1.30"
---
Enable updates to custom resources to contain
violations of their OpenAPI schema if the offending portions of the resource

View File

@ -0,0 +1,16 @@
---
title: CustomResourceFieldSelectors
content_type: feature_gate
_build:
list: never
render: false
stages:
- stage: alpha
defaultValue: false
fromVersion: "1.30"
---
Enable `selectableFields` in the
{{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}} API to allow filtering
of custom resource **list**, **watch** and **deletecollection** requests.

View File

@ -13,6 +13,10 @@ stages:
- stage: beta
defaultValue: true
fromVersion: "1.27"
toVersion: "1.29"
- stage: stable
defaultValue: true
fromVersion: "1.30"
---
Enable the `HorizontalPodAutoscaler` to scale based on
metrics from individual containers in target pods.
Allow {{< glossary_tooltip text="HorizontalPodAutoscalers" term_id="horizontal-pod-autoscaler" >}}
to scale based on metrics from individual containers within target pods.

View File

@ -9,5 +9,9 @@ stages:
- stage: alpha
defaultValue: false
fromVersion: "1.29"
toVersion: "1.29"
- stage: beta
defaultValue: true
fromVersion: "1.30"
---
Enables the kubelet configuration field `imageMaximumGCAge`, allowing an administrator to specify the age after which an image will be garbage collected.

View File

@ -0,0 +1,14 @@
---
title: JobManagedBy
content_type: feature_gate
_build:
list: never
render: false
stages:
- stage: alpha
defaultValue: false
fromVersion: "1.30"
---
Allows to delegate reconciliation of a Job object to an external controller.

View File

@ -0,0 +1,14 @@
---
title: JobSuccessPolicy
content_type: feature_gate
_build:
list: never
render: false
stages:
- stage: alpha
defaultValue: false
fromVersion: "1.30"
---
Allow users to specify when a Job can be declared as succeeded based on the set of succeeded pods.

View File

@ -9,6 +9,10 @@ stages:
- stage: alpha
defaultValue: false
fromVersion: "1.28"
toVersion: "1.30"
- stage: beta
defaultValue: true
fromVersion: "1.30"
---
Implement connection draining for
terminating nodes for `externalTrafficPolicy: Cluster` services.

View File

@ -13,6 +13,10 @@ stages:
- stage: beta
defaultValue: true
fromVersion: "1.29"
toVersion: "1.29"
- stage: stable
defaultValue: true
fromVersion: "1.30"
---
Enable cleaning up Secret-based
[service account tokens](/docs/concepts/security/service-accounts/#get-a-token)

View File

@ -9,6 +9,10 @@ stages:
- stage: alpha
defaultValue: false
fromVersion: "1.29"
toVersion: "1.30"
- stage: beta
defaultValue: true
fromVersion: "1.30"
---
Allows setting `ipMode` for Services where `type` is set to `LoadBalancer`.
See [Specifying IPMode of load balancer status](/docs/concepts/services-networking/service/#load-balancer-ip-mode)

View File

@ -17,6 +17,10 @@ stages:
- stage: beta
defaultValue: true
fromVersion: "1.27"
toVersion: "1.29"
- stage: stable
defaultValue: true
fromVersion: "1.30"
---
Enable `minDomains` in
[Pod topology spread constraints](/docs/concepts/scheduling-eviction/topology-spread-constraints/).

View File

@ -0,0 +1,19 @@
---
title: NameGenerationRetries
content_type: feature_gate
_build:
list: never
render: false
stages:
- stage: alpha
defaultValue: false
fromVersion: "1.30"
---
Enables retrying of object creation when the
{{< glossary_tooltip text="API server" term_id="kube-apiserver" >}}
is expected to generate a [name](/docs/concepts/overview/working-with-objects/names/#names).
When this feature is enabled, requests using `generateName` are retried automatically in case the
control plane detects a name conflict with an existing object, up to a limit of 8 total attempts.

View File

@ -13,16 +13,15 @@ stages:
- stage: beta
defaultValue: true
fromVersion: "1.28"
toVersion: "1.29"
- stage: stable
defaultValue: true
fromVersion: "1.30"
---
Enables improved discovery of mounted volumes during kubelet
startup. Since this code has been significantly refactored, we allow to opt-out in case kubelet
gets stuck at the startup or is not unmounting volumes from terminated Pods. Note that this
refactoring was behind `SELinuxMountReadWriteOncePod` alpha feature gate in Kubernetes 1.25.
startup. Since the associated code had been significantly refactored, Kubernetes versions 1.25 to 1.29
allowed you to opt-out in case the kubelet got stuck at the startup, or did not unmount volumes
from terminated Pods.
<!-- remove next 2 paragraphs when feature graduates to GA -->
Before Kubernetes v1.25, the kubelet used different default behavior for discovering mounted
volumes during the kubelet startup. If you disable this feature gate (it's enabled by default), you select
the legacy discovery behavior.
In Kubernetes v1.25 and v1.26, this behavior toggle was part of the `SELinuxMountReadWriteOncePod`
feature gate.
This refactoring was behind the `SELinuxMountReadWriteOncePod` feature gate in Kubernetes
releases 1.25 and 1.26.

View File

@ -9,5 +9,9 @@ stages:
- stage: alpha
defaultValue: false
fromVersion: "1.27"
toVersion: "1.29"
- stage: beta
defaultValue: false
fromVersion: "1.30"
---
Enables querying logs of node services using the `/logs` endpoint.

View File

@ -13,6 +13,10 @@ stages:
- stage: beta
defaultValue: true
fromVersion: "1.29"
toVersion: "1.30"
- stage: stable
defaultValue: true
fromVersion: "1.30"
---
Enable the `status.hostIPs` field for pods and the {{< glossary_tooltip term_id="downward-api" text="downward API" >}}.
The field lets you expose host IP addresses to workloads.

View File

@ -9,5 +9,9 @@ stages:
- stage: alpha
defaultValue: false
fromVersion: "1.29"
toVersion: "1.29"
- stage: beta
defaultValue: true
fromVersion: "1.30"
---
Enables the `sleep` action in Container lifecycle hooks.

View File

@ -13,5 +13,9 @@ stages:
- stage: beta
defaultValue: true
fromVersion: "1.27"
toVersion: "1.29"
- stage: stable
defaultValue: true
fromVersion: "1.30"
---
Enable setting `schedulingGates` field to control a Pod's [scheduling readiness](/docs/concepts/scheduling-eviction/pod-scheduling-readiness).

View File

@ -0,0 +1,15 @@
---
title: PortForwardWebsockets
content_type: feature_gate
_build:
list: never
render: false
stages:
- stage: alpha
defaultValue: false
fromVersion: "1.30"
---
Allow WebSocket streaming of the
portforward sub-protocol (`port-forward`) from clients requesting
version v2 (`v2.portforward.k8s.io`) of the sub-protocol.

View File

@ -0,0 +1,14 @@
---
title: RecursiveReadOnlyMounts
content_type: feature_gate
_build:
list: never
render: false
stages:
- stage: alpha
defaultValue: false
fromVersion: "1.30"
---
Enables support for recursive read-only mounts.
For more details, see [read-only mounts](/docs/concepts/storage/volumes/#read-only-mounts).

View File

@ -0,0 +1,13 @@
---
title: RelaxedEnvironmentVariableValidation
content_type: feature_gate
_build:
list: never
render: false
stages:
- stage: alpha
defaultValue: false
fromVersion: "1.30"
---
Allow almost all printable ASCII characters in environment variables.

View File

@ -1,4 +1,5 @@
---
removed: true
title: RemoveSelfLink
content_type: feature_gate
_build:
@ -17,6 +18,7 @@ stages:
- stage: stable
defaultValue: true
fromVersion: "1.24"
toVersion: "1.29"
---
Sets the `.metadata.selfLink` field to blank (empty string) for all
objects and collections. This field has been deprecated since the Kubernetes v1.16

View File

@ -0,0 +1,20 @@
---
title: SELinuxMount
content_type: feature_gate
_build:
list: never
render: false
stages:
- stage: alpha
defaultValue: false
fromVersion: "1.30"
---
Speeds up container startup by allowing kubelet to mount volumes
for a Pod directly with the correct SELinux label instead of changing each file on the volumes
recursively.
It widens the performance improvements behind the `SELinuxMountReadWriteOncePod`
feature gate by extending the implementation to all volumes.
Enabling the `SELinuxMount` feature gate requires the feature gate `SELinuxMountReadWriteOncePod` to
be enabled.

View File

@ -9,6 +9,10 @@ stages:
- stage: alpha
defaultValue: false
fromVersion: "1.29"
toVersion: "1.29"
- stage: beta
defaultValue: true
fromVersion: "1.30"
---
Controls whether JTIs (UUIDs) are embedded into generated service account tokens,
and whether these JTIs are recorded into the Kubernetes audit log for future requests made by these tokens.

View File

@ -9,6 +9,10 @@ stages:
- stage: alpha
defaultValue: false
fromVersion: "1.29"
toVersion: "1.29"
- stage: beta
defaultValue: true
fromVersion: "1.30"
---
Controls whether the apiserver will validate a Node reference in service account tokens.

View File

@ -9,6 +9,10 @@ stages:
- stage: alpha
defaultValue: false
fromVersion: "1.29"
toVersion: "1.29"
- stage: beta
defaultValue: true
fromVersion: "1.30"
---
Controls whether the apiserver embeds the node name and uid
for the associated node when issuing service account tokens bound to Pod objects.

View File

@ -0,0 +1,16 @@
---
title: ServiceTrafficDistribution
content_type: feature_gate
_build:
list: never
render: false
stages:
- stage: alpha
defaultValue: false
fromVersion: "1.30"
---
Allows usage of the optional `spec.trafficDistribution` field in Services. The
field offers a way to express preferences for how traffic is distributed to
Service endpoints.

View File

@ -9,6 +9,10 @@ stages:
- stage: beta
defaultValue: true
fromVersion: "1.27"
toVersion: "1.29"
- stage: stable
defaultValue: true
fromVersion: "1.30"
---
Enables less load balancer re-configurations by
the service controller (KCCM) as an effect of changing node state.

View File

@ -0,0 +1,14 @@
---
title: StorageVersionMigrator
content_type: feature_gate
_build:
list: never
render: false
stages:
- stage: alpha
defaultValue: false
fromVersion: "1.30"
toVersion: "1.32"
---
Enables storage version migration. See [Migrate Kubernetes Objects Using Storage Version Migration](/docs/tasks/manage-kubernetes-objects/storage-version-migration) for more details.

View File

@ -9,6 +9,10 @@ stages:
- stage: alpha
defaultValue: false
fromVersion: "1.29"
toVersion: "1.29"
- stage: beta
defaultValue: true
fromVersion: "1.30"
---
Enable [structured authentication configuration](/docs/reference/access-authn-authz/authentication/#configuring-the-api-server)
for the API server.

View File

@ -9,6 +9,10 @@ stages:
- stage: alpha
defaultValue: false
fromVersion: "1.29"
toVersion: "1.29"
- stage: beta
defaultValue: true
fromVersion: "1.30"
---
Enable structured authorization configuration, so that cluster administrators
can specify more than one [authorization webhook](/docs/reference/access-authn-authz/webhook/)

View File

@ -6,9 +6,9 @@ _build:
render: false
stages:
- stage: alpha
defaultValue: false
fromVersion: "1.29"
- stage: beta
defaultValue: true
fromVersion: "1.30"
---
Allow WebSocket streaming of the
remote command sub-protocol (`exec`, `cp`, `attach`) from clients requesting

View File

@ -9,5 +9,9 @@ stages:
- stage: alpha
defaultValue: false
fromVersion: "1.28"
toVersion: "1.29"
- stage: beta
defaultValue: false
fromVersion: "1.30"
---
Enable user namespace support for Pods.

View File

@ -13,5 +13,9 @@ stages:
- stage: beta
defaultValue: false
fromVersion: "1.28"
toVersion: "1.29"
- stage: stable
defaultValue: true
fromVersion: "1.30"
---
Enable [ValidatingAdmissionPolicy](/docs/reference/access-authn-authz/validating-admission-policy/) support for CEL validations be used in Admission Control.

View File

@ -416,7 +416,7 @@ KubeletPodResourcesGet=true|false (ALPHA - default=false)<br/>
KubeletSeparateDiskGC=true|false (ALPHA - default=false)<br/>
KubeletTracing=true|false (BETA - default=true)<br/>
LegacyServiceAccountTokenCleanUp=true|false (BETA - default=true)<br/>
LoadBalancerIPMode=true|false (ALPHA - default=false)<br/>
LoadBalancerIPMode=true|false (BETA - default=true)<br/>
LocalStorageCapacityIsolationFSQuotaMonitoring=true|false (ALPHA - default=false)<br/>
LogarithmicScaleDown=true|false (BETA - default=true)<br/>
LoggingAlphaOptions=true|false (ALPHA - default=false)<br/>

View File

@ -350,6 +350,14 @@ kubectl [flags]
<td></td><td style="line-height: 130%; word-wrap: break-word;">When set to false, turns off extra HTTP headers detailing invoked kubectl command (Kubernetes version v1.22 or later)</td>
</tr>
<tr>
<td colspan="2">KUBECTL_DEBUG_CUSTOM_PROFILE</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">When set to true, custom flag will be enabled in kubectl debug. This flag is used to customize the pre-defined profiles.
</td>
</tr>
<tr>
<td colspan="2">KUBECTL_EXPLAIN_OPENAPIV3</td>
</tr>
@ -366,6 +374,14 @@ kubectl [flags]
</td>
</tr>
<tr>
<td colspan="2">KUBECTL_PORT_FORWARD_WEBSOCKETS</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">When set to true, the kubectl port-forward command will attempt to stream using the websockets protocol. If the upgrade to websockets fails, the commands will fallback to use the current SPDY protocol.
</td>
</tr>
<tr>
<td colspan="2">KUBECTL_REMOTE_COMMAND_WEBSOCKETS</td>
</tr>

View File

@ -300,7 +300,7 @@ which is used by Kustomize and similar third-party tools.
For example, Kustomize removes objects with this annotation from its final build output.
### container.apparmor.security.beta.kubernetes.io/* (beta) {#container-apparmor-security-beta-kubernetes-io}
### container.apparmor.security.beta.kubernetes.io/* (deprecated) {#container-apparmor-security-beta-kubernetes-io}
Type: Annotation
@ -309,7 +309,7 @@ Example: `container.apparmor.security.beta.kubernetes.io/my-container: my-custom
Used on: Pods
This annotation allows you to specify the AppArmor security profile for a container within a
Kubernetes pod.
Kubernetes pod. As of Kubernetes v1.30, this should be set with the `appArmorProfile` field instead.
To learn more, see the [AppArmor](/docs/tutorials/security/apparmor/) tutorial.
The tutorial illustrates using AppArmor to restrict a container's abilities and access.
@ -1106,13 +1106,11 @@ Example: `kubernetes.io/legacy-token-invalid-since: 2023-10-27`
Used on: Secret
The control plane automatically adds this label to auto-generated Secrets that
have the type `kubernetes.io/service-account-token`, provided that you have the
`LegacyServiceAccountTokenCleanUp` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
enabled. Kubernetes {{< skew currentVersion >}} enables that behavior by default.
This label marks the Secret-based token as invalid for authentication. The value
of this label records the date (ISO 8601 format, UTC time zone) when the control
plane detects that the auto-generated Secret has not been used for a specified
duration (defaults to one year).
have the type `kubernetes.io/service-account-token`. This label marks the
Secret-based token as invalid for authentication. The value of this label
records the date (ISO 8601 format, UTC time zone) when the control plane detects
that the auto-generated Secret has not been used for a specified duration
(defaults to one year).
### endpointslice.kubernetes.io/managed-by {#endpointslicekubernetesiomanaged-by}

View File

@ -27,6 +27,7 @@ etcd cluster externally or on custom ports.
| Protocol | Direction | Port Range | Purpose | Used By |
|----------|-----------|-------------|-----------------------|-------------------------|
| TCP | Inbound | 10250 | Kubelet API | Self, Control plane |
| TCP | Inbound | 10256 | kube-proxy | Self, Load balancers |
| TCP | Inbound | 30000-32767 | NodePort Services† | All |
† Default port range for [NodePort Services](/docs/concepts/services-networking/service/).

View File

@ -488,6 +488,67 @@ route to ready node-local endpoints. If the traffic policy is `Local` and there
are no node-local endpoints, the kube-proxy does not forward any traffic for the
relevant Service.
If `Cluster` is specified all nodes are eligible load balancing targets _as long as_
the node is not being deleted and kube-proxy is healthy. In this mode: load balancer
health checks are configured to target the service proxy's readiness port and path.
In the case of kube-proxy this evaluates to: `${NODE_IP}:10256/healthz`. kube-proxy
will return either an HTTP code 200 or 503. kube-proxy's load balancer health check
endpoint returns 200 if:
1. kube-proxy is healthy, meaning:
- it's able to progress programming the network and isn't timing out while doing
so (the timeout is defined to be: **2 × `iptables.syncPeriod`**); and
2. the node is not being deleted (there is no deletion timestamp set for the Node).
The reason why kube-proxy returns 503 and marks the node as not
eligible when it's being deleted, is because kube-proxy supports connection
draining for terminating nodes. A couple of important things occur from the point
of view of a Kubernetes-managed load balancer when a node _is being_ / _is_ deleted.
While deleting:
* kube-proxy will start failing its readiness probe and essentially mark the
node as not eligible for load balancer traffic. The load balancer health
check failing causes load balancers which support connection draining to
allow existing connections to terminate, and block new connections from
establishing.
When deleted:
* The service controller in the Kubernetes cloud controller manager removes the
node from the referenced set of eligible targets. Removing any instance from
the load balancer's set of backend targets immediately terminates all
connections. This is also the reason kube-proxy first fails the health check
while the node is deleting.
It's important to note for Kubernetes vendors that if any vendor configures the
kube-proxy readiness probe as a liveness probe: that kube-proxy will start
restarting continuously when a node is deleting until it has been fully deleted.
kube-proxy exposes a `/livez` path which, as opposed to the `/healthz` one, does
**not** consider the Node's deleting state and only its progress programming the
network. `/livez` is therefore the recommended path for anyone looking to define
a livenessProbe for kube-proxy.
Users deploying kube-proxy can inspect both the readiness / liveness state by
evaluating the metrics: `proxy_livez_total` / `proxy_healthz_total`. Both
metrics publish two series, one with the 200 label and one with the 503 one.
For `Local` Services: kube-proxy will return 200 if
1. kube-proxy is healthy/ready, and
2. has a local endpoint on the node in question.
Node deletion does **not** have an impact on kube-proxy's return
code for what concerns load balancer health checks. The reason for this is:
deleting nodes could end up causing an ingress outage should all endpoints
simultaneously be running on said nodes.
The Kubernetes project recommends that cloud provider integration code
configures load balancer health checks that target the service proxy's healthz
port. If you are using or implementing your own virtual IP implementation,
that people can use instead of kube-proxy, you should set up a similar health
checking port with logic that matches the kube-proxy implementation.
### Traffic to terminating endpoints
{{< feature-state for_k8s_version="v1.28" state="stable" >}}
@ -513,6 +574,94 @@ those terminating Pods. By the time the Pod completes termination, the external
should have seen the node's health check failing and fully removed the node from the backend
pool.
## Traffic Distribution
The `spec.trafficDistribution` field within a Kubernetes Service allows you to
express preferences for how traffic should be routed to Service endpoints.
Implementations like kube-proxy use the `spec.trafficDistribution` field as a
guideline. The behavior associated with a given preference may subtly differ
between implementations.
`PreferClose` with kube-proxy
: For kube-proxy, this means prioritizing sending traffic to endpoints within
the same zone as the client. The EndpointSlice controller updates
EndpointSlices with `hints` to communicate this preference, which kube-proxy
then uses for routing decisions. If a client's zone does not have any
available endpoints, traffic will be routed cluster-wide for that client.
In the absence of any value for `trafficDistribution`, the default routing
strategy for kube-proxy is to distribute traffic to any endpoint in the cluster.
### Comparison with `service.kubernetes.io/topology-mode: Auto`
The `trafficDistribution` field with `PreferClose` and the
`service.kubernetes.io/topology-mode: Auto` annotation both aim to prioritize
same-zone traffic. However, there are key differences in their approaches:
* `service.kubernetes.io/topology-mode: Auto`: Attempts to distribute traffic
proportionally across zones based on allocatable CPU resources. This heuristic
includes safeguards (such as the [fallback
behavior](/docs/concepts/services-networking/topology-aware-routing/#three-or-more-endpoints-per-zone)
for small numbers of endpoints) and could lead to the feature being disabled
in certain scenarios for load-balancing reasons. This approach sacrifices some
predictability in favor of potential load balancing.
* `trafficDistribution: PreferClose`: This approach aims to be slightly simpler
and more predictable: "If there are endpoints in the zone, they will receive
all traffic for that zone, if there are no endpoints in a zone, the traffic
will be distributed to other zones". While the approach may offer more
predictability, it does mean that you are in control of managing a [potential
overload](#considerations-for-using-traffic-distribution-control).
If the `service.kubernetes.io/topology-mode` annotation is set to `Auto`, it
will take precedence over `trafficDistribution`. (The annotation may be deprecated
in the future in favour of the `trafficDistribution` field).
### Interaction with Traffic Policies
When compared to the `trafficDistribution` field, the traffic policy fields
(`externalTrafficPolicy` and `internalTrafficPolicy`) are meant to offer a
stricter traffic locality requirements. Here's how `trafficDistribution`
interacts with them:
* Precedence of Traffic Policies: For a given Service, if a traffic policy
(`externalTrafficPolicy` or `internalTrafficPolicy`) is set to `Local`, it
takes precedence over `trafficDistribution: PreferClose` for the corresponding
traffic type (external or internal, respectively).
* `trafficDistribution` Influence: For a given Service, if a traffic policy
(`externalTrafficPolicy` or `internalTrafficPolicy`) is set to `Cluster` (the
default), or if the fields are not set, then `trafficDistribution:
PreferClose` guides the routing behavior for the corresponding traffic type
(external or internal, respectively). This means that an attempt will be made
to route traffic to an endpoint that is in the same zone as the client.
### Considerations for using traffic distribution control
* **Increased Probability of Overloaded Endpoints:** The `PreferClose`
heuristic will attempt to route traffic to the closest healthy endpoints
instead of spreading that traffic evenly across all endpoints. If you do not
have a sufficient number of endpoints within a zone, they may become
overloaded. This is especially likely if incoming traffic is not
proportionally distributed across zones. To mitigate this, consider the
following strategies:
* [Pod Topology Spread
Constraints](/docs/concepts/scheduling-eviction/topology-spread-constraints/):
Use Pod Topology Spread Constraints to distribute your pods more evenly
across zones.
* Zone-specific Deployments: If you expect to see skewed traffic patterns,
create a separate Deployment for each zone. This approach allows the
separate workloads to scale independently. There are also workload
management addons available from the ecosystem, outside the Kubernetes
project itself, that can help here.
* **Implementation-specific behavior:** Each dataplane implementation may handle
this field slightly differently. If you're using an implementation other than
kube-proxy, refer the documentation specific to that implementation to
understand how this field is being handled.
## {{% heading "whatsnext" %}}
To learn more about Services,

View File

@ -0,0 +1,155 @@
---
content_type: "reference"
title: Kubelet Configuration Directory Merging
weight: 50
---
When using the kubelet's `--config-dir` flag to specify a drop-in directory for
configuration, there is some specific behavior on how different types are
merged.
Here are some examples of how different data types behave during configuration merging:
### Structure Fields
There are two types of structure fields in a YAML structure: singular (or a
scalar type) and embedded (structures that contain scalar types).
The configuration merging process handles the overriding of singular and embedded struct fields to create a resulting kubelet configuration.
For instance, you may want a baseline kubelet configuration for all nodes, but you may want to customize the `address` and `authorization` fields.
This can be done as follows:
Main kubelet configuration file contents:
```yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
port: 20250
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: "5m"
cacheUnauthorizedTTL: "30s"
serializeImagePulls: false
address: "192.168.0.1"
```
Contents of a file in `--config-dir` directory:
```yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
authorization:
mode: AlwaysAllow
webhook:
cacheAuthorizedTTL: "8m"
cacheUnauthorizedTTL: "45s"
address: "192.168.0.8"
```
The resulting configuration will be as follows:
```yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
port: 20250
serializeImagePulls: false
authorization:
mode: AlwaysAllow
webhook:
cacheAuthorizedTTL: "8m"
cacheUnauthorizedTTL: "45s"
address: "192.168.0.8"
```
### Lists
You can overide the slices/lists values of the kubelet configuration.
However, the entire list gets overridden during the merging process.
For example, you can override the `clusterDNS` list as follows:
Main kubelet configuration file contents:
```yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
port: 20250
serializeImagePulls: false
clusterDNS:
- "192.168.0.9"
- "192.168.0.8"
```
Contents of a file in `--config-dir` directory:
```yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
clusterDNS:
- "192.168.0.2"
- "192.168.0.3"
- "192.168.0.5"
```
The resulting configuration will be as follows:
```yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
port: 20250
serializeImagePulls: false
clusterDNS:
- "192.168.0.2"
- "192.168.0.3"
- "192.168.0.5"
```
### Maps, including Nested Structures
Individual fields in maps, regardless of their value types (boolean, string, etc.), can be selectively overridden.
However, for `map[string][]string`, the entire list associated with a specific field gets overridden.
Let's understand this better with an example, particularly on fields like `featureGates` and `staticPodURLHeader`:
Main kubelet configuration file contents:
```yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
port: 20250
serializeImagePulls: false
featureGates:
AllAlpha: false
MemoryQoS: true
staticPodURLHeader:
kubelet-api-support:
- "Authorization: 234APSDFA"
- "X-Custom-Header: 123"
custom-static-pod:
- "Authorization: 223EWRWER"
- "X-Custom-Header: 456"
```
Contents of a file in `--config-dir` directory:
```yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
featureGates:
MemoryQoS: false
KubeletTracing: true
DynamicResourceAllocation: true
staticPodURLHeader:
custom-static-pod:
- "Authorization: 223EWRWER"
- "X-Custom-Header: 345"
```
The resulting configuration will be as follows:
```yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
port: 20250
serializeImagePulls: false
featureGates:
AllAlpha: false
MemoryQoS: false
KubeletTracing: true
DynamicResourceAllocation: true
staticPodURLHeader:
kubelet-api-support:
- "Authorization: 234APSDFA"
- "X-Custom-Header: 123"
custom-static-pod:
- "Authorization: 223EWRWER"
- "X-Custom-Header: 345"
```

View File

@ -109,8 +109,6 @@ The user can skip specific preflight checks or all of them with the `--ignore-pr
- [warning] if firewalld is active
- [error] if API server bindPort or ports 10250/10251/10252 are used
- [Error] if `/etc/kubernetes/manifest` folder already exists and it is not empty
- [Error] if `/proc/sys/net/bridge/bridge-nf-call-iptables` file does not exist/does not contain 1
- [Error] if advertise address is ipv6 and `/proc/sys/net/bridge/bridge-nf-call-ip6tables` does not exist/does not contain 1.
- [Error] if swap is on
- [Error] if `conntrack`, `ip`, `iptables`, `mount`, `nsenter` commands are not present in the command path
- [warning] if `ebtables`, `ethtool`, `socat`, `tc`, `touch`, `crictl` commands are not present in the command path

View File

@ -1,7 +1,4 @@
---
reviewers:
- luxas
- jbeda
title: kubeadm init
content_type: concept
weight: 20
@ -161,6 +158,7 @@ Feature | Default | Alpha | Beta | GA
`EtcdLearnerMode` | `true` | 1.27 | 1.29 | -
`PublicKeysECDSA` | `false` | 1.19 | - | -
`RootlessControlPlane` | `false` | 1.22 | - | -
`WaitForAllControlPlaneComponents` | `false` | 1.30 | - | -
{{< /table >}}
{{< note >}}
@ -184,6 +182,16 @@ for `kube-apiserver`, `kube-controller-manager`, `kube-scheduler` and `etcd` to
If the flag is not set, those components run as root. You can change the value of this feature gate before
you upgrade to a newer version of Kubernetes.
`WaitForAllControlPlaneComponents`
: With this feature gate enabled kubeadm will wait for all control plane components (kube-apiserver,
kube-controller-manager, kube-scheduler) on a control plane node to report status 200 on their `/healthz`
endpoints. These checks are performed on `https://127.0.0.1:PORT/healthz`, where `PORT` is taken from
`--secure-port` of a component. If you specify custom `--secure-port` values in the kubeadm configuration
they will be respected. Without the feature gate enabled, kubeadm will only wait for the kube-apiserver
on a control plane node to become ready. The wait process starts right after the kubelet on the host
is started by kubeadm. You are advised to enable this feature gate in case you wish to observe a ready
state from all control plane components during the `kubeadm init` or `kubeadm join` command execution.
List of deprecated feature gates:
{{< table caption="kubeadm deprecated feature gates" >}}

View File

@ -47,31 +47,22 @@ check the documentation for that version.
<!-- body -->
## Install and configure prerequisites
The following steps apply common settings for Kubernetes nodes on Linux.
### Network configuration
You can skip a particular setting if you're certain you don't need it.
By default, the Linux kernel does not allow IPv4 packets to be routed
between interfaces. Most Kubernetes cluster networking implementations
will change this setting (if needed), but some might expect the
administrator to do it for them. (Some might also expect other sysctl
parameters to be set, kernel modules to be loaded, etc; consult the
documentation for your specific network implementation.)
For more information, see
[Network Plugin Requirements](/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#network-plugin-requirements)
or the documentation for your specific container runtime.
### Enable IPv4 packet forwarding {#prerequisite-ipv4-forwarding-optional}
### Forwarding IPv4 and letting iptables see bridged traffic
Execute the below mentioned instructions:
To manually enable IPv4 packet forwarding:
```bash
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# sysctl params required by setup, params persist across reboots
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
@ -79,18 +70,10 @@ EOF
sudo sysctl --system
```
Verify that the `br_netfilter`, `overlay` modules are loaded by running the following commands:
Verify that `net.ipv4.ip_forward` is set to 1 with:
```bash
lsmod | grep br_netfilter
lsmod | grep overlay
```
Verify that the `net.bridge.bridge-nf-call-iptables`, `net.bridge.bridge-nf-call-ip6tables`, and
`net.ipv4.ip_forward` system variables are set to `1` in your `sysctl` config by running the following command:
```bash
sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward
sysctl net.ipv4.ip_forward
```
## cgroup drivers

View File

@ -7,6 +7,16 @@ content_type: task
weight: 330
---
## {{% heading "prerequisites" %}}
Some steps in this page use the `jq` tool. If you don't have `jq`, you can
install it via your operating system's software sources, or fetch it from
[https://jqlang.github.io/jq/](https://jqlang.github.io/jq/).
Some steps also involve installing `curl`, which can be installed via your
operating system's software sources.
<!-- overview -->
A subset of the kubelet's configuration parameters may be
@ -86,46 +96,195 @@ In the above example, this version is `kubelet.config.k8s.io/v1beta1`.
## Drop-in directory for kubelet configuration files {#kubelet-conf-d}
As of Kubernetes v1.28.0, the kubelet has been extended to support a drop-in configuration directory. The location of it can be specified with
`--config-dir` flag, and it defaults to `""`, or disabled, by default.
{{<feature-state for_k8s_version="v1.30" state="beta" >}}
You can only set `--config-dir` if you set the environment variable `KUBELET_CONFIG_DROPIN_DIR_ALPHA` for the kubelet process (the value of that variable does not matter).
For Kubernetes v{{< skew currentVersion >}}, the kubelet returns an error if you specify `--config-dir` without that variable set, and startup fails.
You cannot specify the drop-in configuration directory using the kubelet configuration file; only the CLI argument `--config-dir` can set it.
You can specify a drop-in configuration directory for the kubelet. By default, the kubelet does not look
for drop-in configuration files anywhere - you must specify a path.
For example: `--config-dir=/etc/kubernetes/kubelet.conf.d`
For Kubernetes v1.28 to v1.29, you can only specify `--config-dir` if you also set
the environment variable `KUBELET_CONFIG_DROPIN_DIR_ALPHA` for the kubelet process (the value
of that variable does not matter).
One can use the kubelet configuration directory in a similar way to the kubelet config file.
{{< note >}}
The suffix of a valid kubelet drop-in configuration file must be `.conf`. For instance: `99-kubelet-address.conf`
The suffix of a valid kubelet drop-in configuration file **must** be `.conf`. For instance: `99-kubelet-address.conf`
{{< /note >}}
For instance, you may want a baseline kubelet configuration for all nodes, but you may want to customize the `address` field. This can be done as follows:
The kubelet processes files in its config drop-in directory by sorting the **entire file name** alphanumerically.
For instance, `00-kubelet.conf` is processed first, and then overridden with a file named `01-kubelet.conf`.
Main kubelet configuration file contents:
```yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
port: 20250
serializeImagePulls: false
evictionHard:
memory.available: "200Mi"
```
These files may contain partial configurations and might not be valid config files by themselves.
Validation is only performed on the final resulting configuration structure
stored internally in the kubelet.
This offers you flexibility in how you manage and combine kubelet configuration that comes from different sources.
However, it's important to note that the behavior varies based on the data type of the configuration fields.
Contents of a file in `--config-dir` directory:
```yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
address: "192.168.0.8"
```
Different data types in the kubelet configuration structure merge differently.
See the [reference
document](/docs/reference/node/kubelet-config-directory-merging.md) for more
information.
### Kubelet configuration merging order
On startup, the kubelet merges configuration from:
* Command line arguments (lowest precedence).
* the kubelet configuration
* Feature gates specified over the command line (lowest precedence).
* The kubelet configuration.
* Drop-in configuration files, according to sort order.
* Feature gates specified over the command line (highest precedence).
* Command line arguments excluding feature gates (highest precedence).
This produces the same outcome as if you used the [single configuration file](#create-the-config-file) used in the earlier example.
{{< note >}}
The config drop-in dir mechanism for the kubelet is similar but different from how the `kubeadm` tool allows you to patch configuration.
The `kubeadm` tool uses a specific [patching strategy](/docs/setup/production-environment/tools/kubeadm/control-plane-flags/#patches) for its configuration,
whereas the only patch strategy for kubelet configuration drop-in files is `replace`. The kubelet determines the order of merges based on sorting the **suffixes** alphanumerically,
and replaces every field present in a higher priority file.
{{< /note >}}
## Viewing the kubelet configuration
Since the configuration could now be spread over multiple files with this feature, if someone wants to inspect the final actuated configuration,
they can follow these steps to inspect the kubelet configuration:
1. Start a proxy server using [`kubectl proxy`](/docs/reference/kubectl/generated/kubectl-commands#proxy) in your terminal.
```bash
kubectl proxy
```
Which gives output like:
```bash
Starting to serve on 127.0.0.1:8001
```
2. Open another terminal window and use `curl` to fetch the kubelet configuration.
Replace `<node-name>` with the actual name of your node:
```bash
curl -X GET http://127.0.0.1:8001/api/v1/nodes/<node-name>/proxy/configz | jq .
```
```bash
{
"kubeletconfig": {
"enableServer": true,
"staticPodPath": "/var/run/kubernetes/static-pods",
"syncFrequency": "1m0s",
"fileCheckFrequency": "20s",
"httpCheckFrequency": "20s",
"address": "192.168.1.16",
"port": 10250,
"readOnlyPort": 10255,
"tlsCertFile": "/var/lib/kubelet/pki/kubelet.crt",
"tlsPrivateKeyFile": "/var/lib/kubelet/pki/kubelet.key",
"rotateCertificates": true,
"authentication": {
"x509": {
"clientCAFile": "/var/run/kubernetes/client-ca.crt"
},
"webhook": {
"enabled": true,
"cacheTTL": "2m0s"
},
"anonymous": {
"enabled": true
}
},
"authorization": {
"mode": "AlwaysAllow",
"webhook": {
"cacheAuthorizedTTL": "5m0s",
"cacheUnauthorizedTTL": "30s"
}
},
"registryPullQPS": 5,
"registryBurst": 10,
"eventRecordQPS": 50,
"eventBurst": 100,
"enableDebuggingHandlers": true,
"healthzPort": 10248,
"healthzBindAddress": "127.0.0.1",
"oomScoreAdj": -999,
"clusterDomain": "cluster.local",
"clusterDNS": [
"10.0.0.10"
],
"streamingConnectionIdleTimeout": "4h0m0s",
"nodeStatusUpdateFrequency": "10s",
"nodeStatusReportFrequency": "5m0s",
"nodeLeaseDurationSeconds": 40,
"imageMinimumGCAge": "2m0s",
"imageMaximumGCAge": "0s",
"imageGCHighThresholdPercent": 85,
"imageGCLowThresholdPercent": 80,
"volumeStatsAggPeriod": "1m0s",
"cgroupsPerQOS": true,
"cgroupDriver": "systemd",
"cpuManagerPolicy": "none",
"cpuManagerReconcilePeriod": "10s",
"memoryManagerPolicy": "None",
"topologyManagerPolicy": "none",
"topologyManagerScope": "container",
"runtimeRequestTimeout": "2m0s",
"hairpinMode": "promiscuous-bridge",
"maxPods": 110,
"podPidsLimit": -1,
"resolvConf": "/run/systemd/resolve/resolv.conf",
"cpuCFSQuota": true,
"cpuCFSQuotaPeriod": "100ms",
"nodeStatusMaxImages": 50,
"maxOpenFiles": 1000000,
"contentType": "application/vnd.kubernetes.protobuf",
"kubeAPIQPS": 50,
"kubeAPIBurst": 100,
"serializeImagePulls": true,
"evictionHard": {
"imagefs.available": "15%",
"memory.available": "100Mi",
"nodefs.available": "10%",
"nodefs.inodesFree": "5%"
},
"evictionPressureTransitionPeriod": "1m0s",
"enableControllerAttachDetach": true,
"makeIPTablesUtilChains": true,
"iptablesMasqueradeBit": 14,
"iptablesDropBit": 15,
"featureGates": {
"AllAlpha": false
},
"failSwapOn": false,
"memorySwap": {},
"containerLogMaxSize": "10Mi",
"containerLogMaxFiles": 5,
"configMapAndSecretChangeDetectionStrategy": "Watch",
"enforceNodeAllocatable": [
"pods"
],
"volumePluginDir": "/usr/libexec/kubernetes/kubelet-plugins/volume/exec/",
"logging": {
"format": "text",
"flushFrequency": "5s",
"verbosity": 3,
"options": {
"json": {
"infoBufferSize": "0"
}
}
},
"enableSystemLogHandler": true,
"enableSystemLogQuery": false,
"shutdownGracePeriod": "0s",
"shutdownGracePeriodCriticalPods": "0s",
"enableProfilingHandler": true,
"enableDebugFlagsHandler": true,
"seccompDefault": false,
"memoryThrottlingFactor": 0.9,
"registerNode": true,
"localStorageCapacityIsolation": true,
"containerRuntimeEndpoint": "unix:///var/run/crio/crio.sock"
}
}
```
<!-- discussion -->
@ -134,3 +293,5 @@ This produces the same outcome as if you used the [single configuration file](#c
- Learn more about kubelet configuration by checking the
[`KubeletConfiguration`](/docs/reference/config-api/kubelet-config.v1beta1/)
reference.
- Learn more about kubelet configuration merging in the
[reference document](/docs/reference/node/kubelet-config-directory-merging.md).

View File

@ -440,7 +440,17 @@ To assign SELinux labels, the SELinux security module must be loaded on the host
### Efficient SELinux volume relabeling
{{< feature-state for_k8s_version="v1.27" state="beta" >}}
{{< feature-state feature_gate_name="SELinuxMountReadWriteOncePod" >}}
{{< note >}}
Kubernetes v1.27 introduced an early limited form of this behavior that was only applicable
to volumes (and PersistentVolumeClaims) using the `ReadWriteOncePod` access mode.
As an alpha feature, you can enable the `SELinuxMount`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to widen that
performance improvement to other kinds of PersistentVolumeClaims, as explained in detail
below.
{{< /note >}}
By default, the container runtime recursively assigns SELinux label to all
files on all Pod volumes. To speed up this process, Kubernetes can change the
@ -451,7 +461,9 @@ To benefit from this speedup, all these conditions must be met:
* The [feature gates](/docs/reference/command-line-tools-reference/feature-gates/) `ReadWriteOncePod`
and `SELinuxMountReadWriteOncePod` must be enabled.
* Pod must use PersistentVolumeClaim with `accessModes: ["ReadWriteOncePod"]`.
* Pod must use PersistentVolumeClaim with applicable `accessModes` and [feature gates](/docs/reference/command-line-tools-reference/feature-gates/):
* Either the volume has `accessModes: ["ReadWriteOncePod"]`, and feature gate `SELinuxMountReadWriteOncePod` is enabled.
* Or the volume can use any other access modes and both feature gates `SELinuxMountReadWriteOncePod` and `SELinuxMount` must be enabled.
* Pod (or all its Containers that use the PersistentVolumeClaim) must
have `seLinuxOptions` set.
* The corresponding PersistentVolume must be either:
@ -465,13 +477,56 @@ runtime recursively changes the SELinux label for all inodes (files and directo
in the volume.
The more files and directories in the volume, the longer that relabelling takes.
## Managing access to the `/proc` filesystem {#proc-access}
{{< feature-state feature_gate_name="ProcMountType" >}}
For runtimes that follow the OCI runtime specification, containers default to running in a mode where
there are multiple paths that are both masked and read-only.
The result of this is the container has these paths present inside the container's mount namespace, and they can function similarly to if
the container was an isolated host, but the container process cannot write to
them. The list of masked and read-only paths are as follows:
- Masked Paths:
- `/proc/asound`
- `/proc/acpi`
- `/proc/kcore`
- `/proc/keys`
- `/proc/latency_stats`
- `/proc/timer_list`
- `/proc/timer_stats`
- `/proc/sched_debug`
- `/proc/scsi`
- `/sys/firmware`
- Read-Only Paths:
- `/proc/bus`
- `/proc/fs`
- `/proc/irq`
- `/proc/sys`
- `/proc/sysrq-trigger`
For some Pods, you might want to bypass that default masking of paths.
The most common context for wanting this is if you are trying to run containers within
a Kubernetes container (within a pod).
The `securityContext` field `procMount` allows a user to request a container's `/proc`
be `Unmasked`, or be mounted as read-write by the container process. This also
applies to `/sys/firmware` which is not in `/proc`.
```yaml
...
securityContext:
procMount: Unmasked
```
{{< note >}}
<!-- remove after Kubernetes v1.30 is released -->
If you are running Kubernetes v1.25, refer to the v1.25 version of this task page:
[Configure a Security Context for a Pod or Container](https://v1-25.docs.kubernetes.io/docs/tasks/configure-pod-container/security-context/) (v1.25).
There is an important note in that documentation about a situation where the kubelet
can lose track of volume labels after restart. This deficiency has been fixed
in Kubernetes 1.26.
Setting `procMount` to Unmasked requires the `spec.hostUsers` value in the pod
spec to be `false`. In other words: a container that wishes to have an Unmasked
`/proc` or unmasked `/sys` must also be in a
[user namespace](/docs/concepts/workloads/pods/user-namespaces/).
Kubernetes v1.12 to v1.29 did not enforce that requirement.
{{< /note >}}
## Discussion
@ -520,3 +575,7 @@ kubectl delete pod security-context-demo-4
* For more information about security mechanisms in Linux, see
[Overview of Linux Kernel Security Features](https://www.linux.com/learn/overview-linux-kernel-security-features)
(Note: Some information is out of date)
* Read about [User Namespaces](/docs/concepts/workloads/pods/user-namespaces/)
for Linux pods.
* [Masked Paths in the OCI Runtime
Specification](https://github.com/opencontainers/runtime-spec/blob/f66aad47309/config-linux.md#masked-paths)

View File

@ -7,7 +7,7 @@ min-kubernetes-server-version: v1.25
---
<!-- overview -->
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
{{< feature-state for_k8s_version="v1.30" state="beta" >}}
This page shows how to configure a user namespace for pods. This allows you to
isolate the user running inside the container from the one in the host.
@ -57,10 +57,6 @@ If you have a mixture of nodes and only some of the nodes provide user namespace
Pods, you also need to ensure that the user namespace Pods are
[scheduled](/docs/concepts/scheduling-eviction/assign-pod-node/) to suitable nodes.
Please note that **if your container runtime doesn't support user namespaces, the
`hostUsers` field in the pod spec will be silently ignored and the pod will be
created without user namespaces.**
<!-- steps -->
## Run a Pod that uses a user namespace {#create-pod}
@ -82,27 +78,42 @@ to `false`. For example:
kubectl attach -it userns bash
```
And run the command. The output is similar to this:
Run this command:
```none
```shell
readlink /proc/self/ns/user
user:[4026531837]
cat /proc/self/uid_map
0 0 4294967295
```
Then, open a shell in the host and run the same command.
The output is similar to:
The output must be different. This means the host and the pod are using a
different user namespace. When user namespaces are not enabled, the host and the
pod use the same user namespace.
```shell
user:[4026531837]
```
Also run:
```shell
cat /proc/self/uid_map
```
The output is similar to:
```shell
0 833617920 65536
```
Then, open a shell in the host and run the same commands.
The `readlink` command shows the user namespace the process is running in. It
should be different when it is run on the host and inside the container.
The last number of the `uid_map` file inside the container must be 65536, on the
host it must be a bigger number.
If you are running the kubelet inside a user namespace, you need to compare the
output from running the command in the pod to the output of running in the host:
```none
```shell
readlink /proc/$pid/ns/user
user:[4026534732]
```
replacing `$pid` with the kubelet PID.

View File

@ -203,7 +203,7 @@ status:
type: PIDPressure
- lastHeartbeatTime: "2022-02-17T22:20:15Z"
lastTransitionTime: "2022-02-17T22:15:15Z"
message: kubelet is posting ready status. AppArmor enabled
message: kubelet is posting ready status
reason: KubeletReady
status: "True"
type: Ready
@ -330,4 +330,3 @@ This is an incomplete list of things that could go wrong, and how to adjust your
* Use `crictl` to [debug Kubernetes nodes](/docs/tasks/debug/debug-cluster/crictl/)
* Get more information about [Kubernetes auditing](/docs/tasks/debug/debug-cluster/audit/)
* Use `telepresence` to [develop and debug services locally](/docs/tasks/debug/debug-cluster/local-debugging/)

View File

@ -719,9 +719,10 @@ crontab "my-new-cron-object" created
```
### Validation ratcheting
{{< feature-state state="alpha" for_k8s_version="v1.28" >}}
{{< feature-state feature_gate_name="CRDValidationRatcheting" >}}
You need to enable the `CRDValidationRatcheting`
If you are using a version of Kubernetes older than v1.30, you need to explicitly
enable the `CRDValidationRatcheting`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to
use this behavior, which then applies to all CustomResourceDefinitions in your
cluster.
@ -751,10 +752,12 @@ validations are not supported by ratcheting under the implementation in Kubernet
- `x-kubernetes-validations`
For Kubernetes 1.28, CRD validation rules](#validation-rules) are ignored by
ratcheting. Starting with Alpha 2 in Kubernetes 1.29, `x-kubernetes-validations`
are ratcheted.
are ratcheted only if they do not refer to `oldSelf`.
Transition Rules are never ratcheted: only errors raised by rules that do not
use `oldSelf` will be automatically ratcheted if their values are unchanged.
To write custom ratcheting logic for CEL expressions, check out [optionalOldSelf](#field-optional-oldself).
- `x-kubernetes-list-type`
Errors arising from changing the list type of a subschema will not be
ratcheted. For example adding `set` onto a list with duplicates will always
@ -772,8 +775,10 @@ validations are not supported by ratcheting under the implementation in Kubernet
To remove a previously specified `additionalProperties` validation will not be
ratcheted.
- `metadata`
Errors arising from changes to fields within an object's `metadata` are not
ratcheted.
Errors that come from Kubernetes' built-in validation of an object's `metadata`
are not ratcheted (such as object name, or characters in a label value).
If you specify your own additional rules for the metadata of a custom resource,
that additional validation will be ratcheted.
### Validation rules
@ -1177,10 +1182,11 @@ Setting `fieldPath` is optional.
#### The `optionalOldSelf` field {#field-optional-oldself}
{{< feature-state state="alpha" for_k8s_version="v1.29" >}}
{{< feature-state feature_gate_name="CRDValidationRatcheting" >}}
The feature [CRDValidationRatcheting](#validation-ratcheting) must be enabled in order to
make use of this field.
If your cluster does not have [CRD validation ratcheting](#validation-ratcheting) enabled,
the CustomResourceDefinition API doesn't include this field, and trying to set it may result
in an error.
The `optionalOldSelf` field is a boolean field that alters the behavior of [Transition Rules](#transition-rules) described
below. Normally, a transition rule will not evaluate if `oldSelf` cannot be determined:
@ -1624,6 +1630,96 @@ my-new-cron-object * * * * * 1 7s
The `NAME` column is implicit and does not need to be defined in the CustomResourceDefinition.
{{< /note >}}
### Field selectors
[Field Selectors](/docs/concepts/overview/working-with-objects/field-selectors/)
let clients select custom resources based on the value of one or more resource
fields.
All custom resources support the `metadata.name` and `metadata.namespace` field
selectors.
Fields declared in a {{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}}
may also be used with field selectors when included in the `spec.versions[*].selectableFields` field of the
{{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}}.
#### Selectable fields for custom resources {#crd-selectable-fields}
{{< feature-state state="alpha" for_k8s_version="v1.30" >}}
{{< feature-state feature_gate_name="CustomResourceFieldSelectors" >}}
You need to enable the `CustomResourceFieldSelectors`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to
use this behavior, which then applies to all CustomResourceDefinitions in your
cluster.
The `spec.versions[*].selectableFields` field of a {{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}} may be used to
declare which other fields in a custom resource may be used in field selectors.
The following example adds the `.spec.color` and `.spec.size` fields as
selectable fields.
Save the CustomResourceDefinition to `shirt-resource-definition.yaml`:
{{% code_sample file="customresourcedefinition/shirt-resource-definition.yaml" %}}
Create the CustomResourceDefinition:
```shell
kubectl apply -f https://k8s.io/examples/customresourcedefinition/shirt-resource-definition.yaml
```
Define some Shirts by editing `shirt-resources.yaml`; for example:
{{% code_sample file="customresourcedefinition/shirt-resources.yaml" %}}
Create the custom resources:
```shell
kubectl apply -f https://k8s.io/examples/customresourcedefinition/shirt-resources.yaml
```
Get all the resources:
```shell
kubectl get shirts.stable.example.com
```
The output is:
```
NAME COLOR SIZE
example1 blue S
example2 blue M
example3 green M
```
Fetch blue shirts (retrieve Shirts with a `color` of `blue`):
```shell
kubectl get shirts.stable.example.com --field-selector spec.color=blue
```
Should output:
```
NAME COLOR SIZE
example1 blue S
example2 blue M
```
Get only resources with a `color` of `green` and a `size` of `M`:
```shell
kubectl get shirts.stable.example.com --field-selector spec.color=green,spec.size=M
```
Should output:
```
NAME COLOR SIZE
example2 blue M
```
#### Priority
Each column includes a `priority` field. Currently, the priority

View File

@ -102,6 +102,11 @@ Honorable`, and `Kubernetes`, respectively. The environment variable
`MESSAGE` combines the set of all these environment variables and then uses it
as a CLI argument passed to the `env-print-demo` container.
Environment variable names consist of letters, numbers, underscores,
dots, or hyphens, but the first character cannot be a digit.
If the `RelaxedEnvironmentVariableValidation` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled,
all [printable ASCII characters](https://www.ascii-code.com/characters/printable-characters) except "=" may be used for environment variable names.
```yaml
apiVersion: v1
kind: Pod

View File

@ -0,0 +1,313 @@
---
title: Migrate Kubernetes Objects Using Storage Version Migration
reviewers:
- deads2k
- jpbetz
- enj
- nilekhc
content_type: task
min-kubernetes-server-version: v1.30
weight: 60
---
<!-- overview -->
{{< feature-state feature_gate_name="StorageVersionMigrator" >}}
Kubernetes relies on API data being actively re-written, to support some
maintenance activities related to at rest storage. Two prominent examples are
the versioned schema of stored resources (that is, the preferred storage schema
changing from v1 to v2 for a given resource) and encryption at rest
(that is, rewriting stale data based on a change in how the data should be encrypted).
## {{% heading "prerequisites" %}}
Install [`kubectl`](/docs/tasks/tools/#kubectl).
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
<!-- steps -->
## Re-encrypt Kubernetes secrets using storage version migration
- To begin with, [configure KMS provider](/docs/tasks/administer-cluster/kms-provider/)
to encrypt data at rest in etcd using following encryption configuration.
```yaml
kind: EncryptionConfiguration
apiVersion: apiserver.config.k8s.io/v1
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: c2VjcmV0IGlzIHNlY3VyZQ==
```
Make sure to enable automatic reload of encryption
configuration file by setting `--encryption-provider-config-automatic-reload` to true.
- Create a Secret using kubectl.
```shell
kubectl create secret generic my-secret --from-literal=key1=supersecret
```
- [Verify](/docs/tasks/administer-cluster/kms-provider/#verifying-that-the-data-is-encrypted)
the serialized data for that Secret object is prefixed with `k8s:enc:aescbc:v1:key1`.
- Update the encryption configuration file as follows to rotate the encryption key.
```yaml
kind: EncryptionConfiguration
apiVersion: apiserver.config.k8s.io/v1
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key2
secret: c2VjcmV0IGlzIHNlY3VyZSwgaXMgaXQ/
- aescbc:
keys:
- name: key1
secret: c2VjcmV0IGlzIHNlY3VyZQ==
```
- To ensure that previously created secret `my-secert` is re-encrypted
with new key `key2`, you will use _Storage Version Migration_.
- Create a StorageVersionMigration manifest named `migrate-secret.yaml` as follows:
```yaml
kind: StorageVersionMigration
apiVersion: storagemigration.k8s.io/v1alpha1
metadata:
name: secrets-migration
spec:
resource:
group: ""
version: v1
resource: secrets
```
Create the object using _kubectl_ as follows:
```shell
kubectl apply -f migrate-secret.yaml
```
- Monitor migration of Secrets by checking the `.status` of the StorageVersionMigration.
A successful migration should have its
`Succeeded` condition set to true. Get the StorageVersionMigration object
as follows:
```shell
kubectl get storageversionmigration.storagemigration.k8s.io/secrets-migration -o yaml
```
The output is similar to:
```yaml
kind: StorageVersionMigration
apiVersion: storagemigration.k8s.io/v1alpha1
metadata:
name: secrets-migration
uid: 628f6922-a9cb-4514-b076-12d3c178967c
resourceVersion: '90'
creationTimestamp: '2024-03-12T20:29:45Z'
spec:
resource:
group: ""
version: v1
resource: secrets
status:
conditions:
- type: Running
status: 'False'
lastUpdateTime: '2024-03-12T20:29:46Z'
reason: StorageVersionMigrationInProgress
- type: Succeeded
status: 'True'
lastUpdateTime: '2024-03-12T20:29:46Z'
reason: StorageVersionMigrationSucceeded
resourceVersion: '84'
```
- [Verify](/docs/tasks/administer-cluster/kms-provider/#verifying-that-the-data-is-encrypted)
the stored secret is now prefixed with `k8s:enc:aescbc:v1:key2`.
## Update the preferred storage schema of a CRD
Consider a scenario where a {{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinition" >}}
(CRD) is created to serve custom resources (CRs) and is set as the preferred storage schema. When it's time
to introduce v2 of the CRD, it can be added for serving only with a conversion
webhook. This enables a smoother transition where users can create CRs using
either the v1 or v2 schema, with the webhook in place to perform the necessary
schema conversion between them. Before setting v2 as the preferred storage schema
version, it's important to ensure that all existing CRs stored as v1 are migrated to v2.
This migration can be achieved through _Storage Version Migration_ to migrate all CRs from v1 to v2.
- Create a manifest for the CRD, named `test-crd.yaml`, as follows:
```yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: selfierequests.stable.example.com
spec:
group: stable.example.com
names:
plural: SelfieRequests
singular: SelfieRequest
kind: SelfieRequest
listKind: SelfieRequestList
scope: Namespaced
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
hostPort:
type: string
conversion:
strategy: Webhook
webhook:
clientConfig:
url: https://127.0.0.1:9443/crdconvert
caBundle: <CABundle info>
conversionReviewVersions:
- v1
- v2
```
Create CRD using kubectl
```shell
kubectl apply -f test-crd.yaml
```
- Create a manifest for an example testcrd. Name the manifest `cr1.yaml` and use these contents:
```yaml
apiVersion: stable.example.com/v1
kind: SelfieRequest
metadata:
name: cr1
namespace: default
```
Create CR using kubectl
```shell
kubectl apply -f cr1.yaml
```
- Verify that CR is written and stored as v1 by getting the object from etcd.
```shell
ETCDCTL_API=3 etcdctl get /kubernetes.io/stable.example.com/testcrds/default/cr1 [...] | hexdump -C
```
where `[...]` contains the additional arguments for connecting to the etcd server.
- Update the CRD `test-crd.yaml` to include v2 version for serving and storage
and v1 as serving only, as follows:
```yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: selfierequests.stable.example.com
spec:
group: stable.example.com
names:
plural: SelfieRequests
singular: SelfieRequest
kind: SelfieRequest
listKind: SelfieRequestList
scope: Namespaced
versions:
- name: v2
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
host:
type: string
port:
type: string
- name: v1
served: true
storage: false
schema:
openAPIV3Schema:
type: object
properties:
hostPort:
type: string
conversion:
strategy: Webhook
webhook:
clientConfig:
url: 'https://127.0.0.1:9443/crdconvert'
caBundle: <CABundle info>
conversionReviewVersions:
- v1
- v2
```
Update CRD using kubectl
```shell
kubectl apply -f test-crd.yaml
```
- Create CR resource file with name `cr2.yaml` as follows:
```yaml
apiVersion: stable.example.com/v2
kind: SelfieRequest
metadata:
name: cr2
namespace: default
```
- Create CR using kubectl
```shell
kubectl apply -f cr2.yaml
```
- Verify that CR is written and stored as v2 by getting the object from etcd.
```shell
ETCDCTL_API=3 etcdctl get /kubernetes.io/stable.example.com/testcrds/default/cr2 [...] | hexdump -C
```
where `[...]` contains the additional arguments for connecting to the etcd server.
- Create a StorageVersionMigration manifest named `migrate-crd.yaml`, with the contents as follows:
```yaml
kind: StorageVersionMigration
apiVersion: storagemigration.k8s.io/v1alpha1
metadata:
name: crdsvm
spec:
resource:
group: stable.example.com
version: v1
resource: SelfieRequest
```
Create the object using _kubectl_ as follows:
```shell
kubectl apply -f migrate-crd.yaml
```
- Monitor migration of secrets using status. Successful migration should have
`Succeeded` condition set to "True" in the status field. Get the migration resource
as follows:
```shell
kubectl get storageversionmigration.storagemigration.k8s.io/crdsvm -o yaml
```
The output is similar to:
```yaml
kind: StorageVersionMigration
apiVersion: storagemigration.k8s.io/v1alpha1
metadata:
name: crdsvm
uid: 13062fe4-32d7-47cc-9528-5067fa0c6ac8
resourceVersion: '111'
creationTimestamp: '2024-03-12T22:40:01Z'
spec:
resource:
group: stable.example.com
version: v1
resource: testcrds
status:
conditions:
- type: Running
status: 'False'
lastUpdateTime: '2024-03-12T22:40:03Z'
reason: StorageVersionMigrationInProgress
- type: Succeeded
status: 'True'
lastUpdateTime: '2024-03-12T22:40:03Z'
reason: StorageVersionMigrationSucceeded
resourceVersion: '106'
```
- Verify that previously created cr1 is now written and stored as v2 by getting the object from etcd.
```shell
ETCDCTL_API=3 etcdctl get /kubernetes.io/stable.example.com/testcrds/default/cr1 [...] | hexdump -C
```
where `[...]` contains the additional arguments for connecting to the etcd server.

View File

@ -278,12 +278,12 @@ pod usage is still within acceptable limits.
### Container resource metrics
{{< feature-state for_k8s_version="v1.27" state="beta" >}}
{{< feature-state feature_gate_name="HPAContainerMetrics" >}}
The HorizontalPodAutoscaler API also supports a container metric source where the HPA can track the
resource usage of individual containers across a set of Pods, in order to scale the target resource.
This lets you configure scaling thresholds for the containers that matter most in a particular Pod.
For example, if you have a web application and a logging sidecar, you can scale based on the resource
For example, if you have a web application and a sidecar container that provides logging, you can scale based on the resource
use of the web application, ignoring the sidecar container and its resource use.
If you revise the target resource to have a new Pod specification with a different set of containers,

View File

@ -8,7 +8,7 @@ weight: 30
<!-- overview -->
{{< feature-state for_k8s_version="v1.4" state="beta" >}}
{{< feature-state feature_gate_name="AppArmor" >}}
[AppArmor](https://apparmor.net/) is a Linux kernel security module that supplements the standard Linux user and group based
@ -54,7 +54,7 @@ Nodes before proceeding:
Y
```
The Kubelet verifies that AppArmor is enabled on the host before admitting a pod with AppArmor
The kubelet verifies that AppArmor is enabled on the host before admitting a pod with AppArmor
explicitly configured.
3. Container runtime supports AppArmor -- All common Kubernetes-supported container
@ -64,7 +64,7 @@ Nodes before proceeding:
4. Profile is loaded -- AppArmor is applied to a Pod by specifying an AppArmor profile that each
container should be run with. If any of the specified profiles are not loaded in the
kernel, the Kubelet will reject the Pod. You can view which profiles are loaded on a
kernel, the kubelet will reject the Pod. You can view which profiles are loaded on a
node by checking the `/sys/kernel/security/apparmor/profiles` file. For example:
```shell
@ -85,25 +85,26 @@ Nodes before proceeding:
## Securing a Pod
{{< note >}}
AppArmor is currently in beta, so options are specified as annotations. Once support graduates to
general availability, the annotations will be replaced with first-class fields.
Prior to Kubernetes v1.30, AppArmor was specified through annotations. Use the documentation version
selector to view the documentation with this deprecated API.
{{< /note >}}
AppArmor profiles are specified *per-container*. To specify the AppArmor profile to run a Pod
container with, add an annotation to the Pod's metadata:
AppArmor profiles can be specified at the pod level or container level. The container AppArmor
profile takes precedence over the pod profile.
```yaml
container.apparmor.security.beta.kubernetes.io/<container_name>: <profile_ref>
securityContext:
appArmorProfile:
type: <profile_type>
```
Where `<container_name>` is the name of the container to apply the profile to, and `<profile_ref>`
specifies the profile to apply. The `<profile_ref>` can be one of:
Where `<profile_type>` is one of:
* `runtime/default` to apply the runtime's default profile
* `localhost/<profile_name>` to apply the profile loaded on the host with the name `<profile_name>`
* `unconfined` to indicate that no profiles will be loaded
* `RuntimeDefault` to use the runtime's default profile
* `Localhost` to use a profile loaded on the host (see below)
* `Unconfined` to run without AppArmor
See the [API Reference](#api-reference) for the full details on the annotation and profile name formats.
See the [API Reference](#api-reference) for the full details on the AppArmor profile API.
To verify that the profile was applied, you can check that the container's root process is
running with the correct profile by examining its proc attr:
@ -115,14 +116,14 @@ kubectl exec <pod_name> -- cat /proc/1/attr/current
The output should look something like this:
```
k8s-apparmor-example-deny-write (enforce)
cri-containerd.apparmor.d (enforce)
```
## Example
*This example assumes you have already set up a cluster with AppArmor support.*
First, load the profile you want to use onto your Nodes. This profile denies all file writes:
First, load the profile you want to use onto your Nodes. This profile blocks all file write operations:
```
#include <tunables/global>
@ -197,9 +198,11 @@ apiVersion: v1
kind: Pod
metadata:
name: hello-apparmor-2
annotations:
container.apparmor.security.beta.kubernetes.io/hello: localhost/k8s-apparmor-example-allow-write
spec:
securityContext:
appArmorProfile:
type: Localhost
localhostProfile: k8s-apparmor-example-allow-write
containers:
- name: hello
image: busybox:1.28
@ -243,7 +246,7 @@ An Event provides the error message with the reason, the specific wording is run
### Setting up Nodes with profiles
Kubernetes does not currently provide any built-in mechanisms for loading AppArmor profiles onto
Kubernetes {{< skew currentVersion >}} does not provide any built-in mechanisms for loading AppArmor profiles onto
Nodes. Profiles can be loaded through custom infrastructure or tools like the
[Kubernetes Security Profiles Operator](https://github.com/kubernetes-sigs/security-profiles-operator).
@ -270,29 +273,31 @@ logs or through `journalctl`. More information is provided in
[AppArmor failures](https://gitlab.com/apparmor/apparmor/wikis/AppArmor_Failures).
## API Reference
## Specifying AppArmor confinement
### Pod Annotation
{{< caution >}}
Prior to Kubernetes v1.30, AppArmor was specified through annotations. Use the documentation version
selector to view the documentation with this deprecated API.
{{< /caution >}}
Specifying the profile a container will run with:
### AppArmor profile within security context {#appArmorProfile}
- **key**: `container.apparmor.security.beta.kubernetes.io/<container_name>`
Where `<container_name>` matches the name of a container in the Pod.
A separate profile can be specified for each container in the Pod.
- **value**: a profile reference, described below
You can specify the `appArmorProfile` on either a container's `securityContext` or on a Pod's
`securityContext`. If the profile is set at the pod level, it will be used as the default profile
for all containers in the pod (including init, sidecar, and ephemeral containers). If both a pod & container
AppArmor profile are set, the container's profile will be used.
### Profile Reference
An AppArmor profile has 2 fields:
- `runtime/default`: Refers to the default runtime profile.
- Equivalent to not specifying a profile, except it still requires AppArmor to be enabled.
- In practice, many container runtimes use the same OCI default profile, defined here:
https://github.com/containers/common/blob/main/pkg/apparmor/apparmor_linux_template.go
- `localhost/<profile_name>`: Refers to a profile loaded on the node (localhost) by name.
- The possible profile names are detailed in the
[core policy reference](https://gitlab.com/apparmor/apparmor/wikis/AppArmor_Core_Policy_Reference#profile-names-and-attachment-specifications).
- `unconfined`: This effectively disables AppArmor on the container.
`type` _(required)_ - indicates which kind of AppArmor profile will be applied. Valid options are:
- `Localhost` - a profile pre-loaded on the node (specified by `localhostProfile`).
- `RuntimeDefault` - the container runtime's default profile.
- `Unconfined` - no AppArmor enforcement.
`localhostProfile` - The name of a profile loaded on the node that should be used.
The profile must be preconfigured on the node to work.
This option must be provided if and only if the `type` is `Localhost`.
Any other profile reference format is invalid.
## {{% heading "whatsnext" %}}

View File

@ -1,4 +1,4 @@
apiVersion: admissionregistration.k8s.io/v1alpha1
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: "deploy-replica-policy.example.com"

View File

@ -2,7 +2,7 @@
# Except for "exempt" deployments, or any containers that do not belong to the "example.com" organization (e.g. common sidecars).
# For example, if the namespace has a label of {"environment": "staging"}, all container images must be either staging.example.com/*
# or do not contain "example.com" at all, unless the deployment has {"exempt": "true"} label.
apiVersion: admissionregistration.k8s.io/v1beta1
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: "image-matches-namespace-environment.policy.example.com"

View File

@ -1,4 +1,4 @@
apiVersion: admissionregistration.k8s.io/v1alpha1
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: "demo-policy.example.com"

View File

@ -1,4 +1,4 @@
apiVersion: admissionregistration.k8s.io/v1alpha1
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: "demo-policy.example.com"

View File

@ -0,0 +1,25 @@
apiVersion: batch/v1
kind: Job
spec:
parallelism: 10
completions: 10
completionMode: Indexed # Required for the success policy
successPolicy:
rules:
- succeededIndexes: 0,2-3
succeededCount: 1
template:
spec:
containers:
- name: main
image: python
command: # Provided that at least one of the Pods with 0, 2, and 3 indexes has succeeded,
# the overall Job is a success.
- python3
- -c
- |
import os, sys
if os.environ.get("JOB_COMPLETION_INDEX") == "2":
sys.exit(0)
else:
sys.exit(1)

View File

@ -0,0 +1,36 @@
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: shirts.stable.example.com
spec:
group: stable.example.com
scope: Namespaced
names:
plural: shirts
singular: shirt
kind: Shirt
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
color:
type: string
size:
type: string
selectableFields:
- jsonPath: .spec.color
- jsonPath: .spec.size
additionalPrinterColumns:
- jsonPath: .spec.color
name: Color
type: string
- jsonPath: .spec.size
name: Size
type: string

View File

@ -0,0 +1,24 @@
---
apiVersion: stable.example.com/v1
kind: Shirt
metadata:
name: example1
spec:
color: blue
size: S
---
apiVersion: stable.example.com/v1
kind: Shirt
metadata:
name: example2
spec:
color: blue
size: M
---
apiVersion: stable.example.com/v1
kind: Shirt
metadata:
name: example3
spec:
color: green
size: M

View File

@ -2,10 +2,11 @@ apiVersion: v1
kind: Pod
metadata:
name: hello-apparmor
annotations:
# Tell Kubernetes to apply the AppArmor profile "k8s-apparmor-example-deny-write".
container.apparmor.security.beta.kubernetes.io/hello: localhost/k8s-apparmor-example-deny-write
spec:
securityContext:
appArmorProfile:
type: Localhost
localhostProfile: k8s-apparmor-example-deny-write
containers:
- name: hello
image: busybox:1.28

View File

@ -0,0 +1,28 @@
apiVersion: v1
kind: Pod
metadata:
name: rro
spec:
volumes:
- name: mnt
hostPath:
# tmpfs is mounted on /mnt/tmpfs
path: /mnt
containers:
- name: busybox
image: busybox
args: ["sleep", "infinity"]
volumeMounts:
# /mnt-rro/tmpfs is not writable
- name: mnt
mountPath: /mnt-rro
readOnly: true
mountPropagation: None
recursiveReadOnly: Enabled
# /mnt-ro/tmpfs is writable
- name: mnt
mountPath: /mnt-ro
readOnly: true
# /mnt-rw/tmpfs is writable
- name: mnt
mountPath: /mnt-rw

View File

@ -1,4 +1,4 @@
apiVersion: admissionregistration.k8s.io/v1beta1
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: "demo-binding-test.example.com"

View File

@ -1,4 +1,4 @@
apiVersion: admissionregistration.k8s.io/v1beta1
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicy
metadata:
name: "demo-policy.example.com"

View File

@ -1,4 +1,4 @@
apiVersion: admissionregistration.k8s.io/v1beta1
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingAdmissionPolicyBinding
metadata:
name: "replicalimit-binding-nontest"

Some files were not shown because too many files have changed in this diff Show More