Merge pull request #43082 from kubernetes/dev-1.29

Official v1.29 Release Docs
pull/44328/head snapshot-initial-v1.29
Kat Cosgrove 2023-12-13 10:52:55 -06:00 committed by GitHub
commit fd224a167f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
58 changed files with 5243 additions and 3431 deletions

View File

@ -392,52 +392,63 @@ footer {
}
main {
.td-content table code,
.td-content>table td {
word-break: break-word;
/* SCSS Related to the Metrics list */
div.metric:nth-of-type(odd) { // Look & Feel , Aesthetics
background-color: $light-grey;
}
/* SCSS Related to the Metrics Table */
div.metrics {
@media (max-width: 767px) { // for mobile devices, Display the names, Stability levels & types
table.metrics {
th:nth-child(n + 4),
td:nth-child(n + 4) {
.metric {
div:empty{
display: none;
}
td.metric_type{
min-width: 7em;
display: flex;
flex-direction: column;
flex-wrap: wrap;
gap: .75em;
padding:.75em .75em .75em .75em;
.metric_name{
font-size: large;
font-weight: bold;
word-break: break-word;
}
td.metric_stability_level{
min-width: 6em;
label{
font-weight: bold;
margin-right: .5em;
}
}
ul {
li:empty{
display: none;
}
display: flex;
flex-direction: column;
gap: .75em;
flex-wrap: wrap;
li.metric_labels_varying{
span{
display: inline-block;
background-color: rgb(240, 239, 239);
padding: 0 0.5em;
margin-right: .35em;
font-family: monospace;
border: 1px solid rgb(230 , 230 , 230);
border-radius: 5%;
margin-bottom: .35em;
}
}
}
}
table.metrics tbody{ // Tested dimensions to improve overall aesthetic of the table
tr {
td {
font-size: smaller;
}
td.metric_labels_varying{
min-width: 9em;
}
td.metric_type{
min-width: 9em;
}
td.metric_description{
min-width: 10em;
}
}
}
table.no-word-break td,
table.no-word-break code {
word-break: normal;
}
}
}
// blockquotes and callouts

View File

@ -137,6 +137,20 @@ collection, which deletes images in order based on the last time they were used,
starting with the oldest first. The kubelet deletes images
until disk usage reaches the `LowThresholdPercent` value.
#### Garbage collection for unused container images {#image-maximum-age-gc}
{{< feature-state for_k8s_version="v1.29" state="alpha" >}}
As an alpha feature, you can specify the maximum time a local image can be unused for,
regardless of disk usage. This is a kubelet setting that you configure for each node.
To configure the setting, enable the `ImageMaximumGCAge`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) for the kubelet,
and also set a value for the `ImageMaximumGCAge` field in the kubelet configuration file.
The value is specified as a Kubernetes _duration_; for example, you can set the configuration
field to `3d12h`, which means 3 days and 12 hours.
### Container garbage collection {#container-image-garbage-collection}
The kubelet garbage collects unused containers based on the following variables,
@ -178,4 +192,4 @@ configure garbage collection:
* Learn more about [ownership of Kubernetes objects](/docs/concepts/overview/working-with-objects/owners-dependents/).
* Learn more about Kubernetes [finalizers](/docs/concepts/overview/working-with-objects/finalizers/).
* Learn about the [TTL controller](/docs/concepts/workloads/controllers/ttlafterfinished/) that cleans up finished Jobs.
* Learn about the [TTL controller](/docs/concepts/workloads/controllers/ttlafterfinished/) that cleans up finished Jobs.

View File

@ -7,7 +7,7 @@ weight: 110
<!-- overview -->
{{< feature-state state="beta" for_k8s_version="v1.20" >}}
{{< feature-state state="stable" for_k8s_version="v1.29" >}}
Controlling the behavior of the Kubernetes API server in an overload situation
is a key task for cluster administrators. The {{< glossary_tooltip
@ -45,30 +45,27 @@ are not subject to the `--max-requests-inflight` limit.
## Enabling/Disabling API Priority and Fairness
The API Priority and Fairness feature is controlled by a feature gate
and is enabled by default. See [Feature
Gates](/docs/reference/command-line-tools-reference/feature-gates/)
for a general explanation of feature gates and how to enable and
disable them. The name of the feature gate for APF is
"APIPriorityAndFairness". This feature also involves an {{<
glossary_tooltip term_id="api-group" text="API Group" >}} with: (a) a
`v1alpha1` version and a `v1beta1` version, disabled by default, and
(b) `v1beta2` and `v1beta3` versions, enabled by default. You can
disable the feature gate and API group beta versions by adding the
The API Priority and Fairness feature is controlled by a command-line flag
and is enabled by default. See
[Options](/docs/reference/command-line-tools-reference/kube-apiserver/options/)
for a general explanation of the available kube-apiserver command-line
options and how to enable and disable them. The name of the
command-line option for APF is "--enable-priority-and-fairness". This feature
also involves an {{<glossary_tooltip term_id="api-group" text="API Group" >}}
with: (a) a stable `v1` version, introduced in 1.29, and
enabled by default (b) a `v1beta3` version, enabled by default, and
deprecated in v1.29. You can
disable the API group beta version `v1beta3` by adding the
following command-line flags to your `kube-apiserver` invocation:
```shell
kube-apiserver \
--feature-gates=APIPriorityAndFairness=false \
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta2=false,flowcontrol.apiserver.k8s.io/v1beta3=false \
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta3=false \
# …and other flags as usual
```
Alternatively, you can enable the v1alpha1 and v1beta1 versions of the API group
with `--runtime-config=flowcontrol.apiserver.k8s.io/v1alpha1=true,flowcontrol.apiserver.k8s.io/v1beta1=true`.
The command-line flag `--enable-priority-and-fairness=false` will disable the
API Priority and Fairness feature, even if other flags have enabled it.
API Priority and Fairness feature.
## Concepts
@ -178,14 +175,12 @@ server.
## Resources
The flow control API involves two kinds of resources.
[PriorityLevelConfigurations](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#prioritylevelconfiguration-v1beta2-flowcontrol-apiserver-k8s-io)
[PriorityLevelConfigurations](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#prioritylevelconfiguration-v1-flowcontrol-apiserver-k8s-io)
define the available priority levels, the share of the available concurrency
budget that each can handle, and allow for fine-tuning queuing behavior.
[FlowSchemas](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#flowschema-v1beta2-flowcontrol-apiserver-k8s-io)
[FlowSchemas](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#flowschema-v1-flowcontrol-apiserver-k8s-io)
are used to classify individual inbound requests, matching each to a
single PriorityLevelConfiguration. There is also a `v1alpha1` version
of the same API group, and it has the same Kinds with the same syntax and
semantics.
single PriorityLevelConfiguration.
### PriorityLevelConfiguration

View File

@ -202,10 +202,23 @@ Here is an example:
--allow-label-value number_count_metric,odd_number='1,3,5', number_count_metric,even_number='2,4,6', date_gauge_metric,weekend='Saturday,Sunday'
```
In addition to specifying this from the CLI, this can also be done within a configuration file. You
can specify the path to that configuration file using the `--allow-metric-labels-manifest` command
line argument to a component. Here's an example of the contents of that configuration file:
```yaml
allow-list:
- "metric1,label2": "v1,v2,v3"
- "metric2,label1": "v1,v2,v3"
```
Additionally, the `cardinality_enforcement_unexpected_categorizations_total` meta-metric records the
count of unexpected categorizations during cardinality enforcement, that is, whenever a label value
is encountered that is not allowed with respect to the allow-list contraints.
## {{% heading "whatsnext" %}}
* Read about the [Prometheus text format](https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-based-format)
for metrics
* See the list of [stable Kubernetes metrics](https://github.com/kubernetes/kubernetes/blob/master/test/instrumentation/testdata/stable-metrics-list.yaml)
* Read about the [Kubernetes deprecation policy](/docs/reference/using-api/deprecation-policy/#deprecating-a-feature-or-behavior)
* Read about the [Kubernetes deprecation policy](/docs/reference/using-api/deprecation-policy/#deprecating-a-feature-or-behavior)

View File

@ -55,12 +55,15 @@ There are two types of hook handlers that can be implemented for Containers:
* Exec - Executes a specific command, such as `pre-stop.sh`, inside the cgroups and namespaces of the Container.
Resources consumed by the command are counted against the Container.
* HTTP - Executes an HTTP request against a specific endpoint on the Container.
* Sleep - Pauses the container for a specified duration.
The "Sleep" action is available when the [feature gate](/docs/reference/command-line-tool-reference/feagure-gates/)
`PodLifecycleSleepAction` is enabled.
### Hook handler execution
When a Container lifecycle management hook is called,
the Kubernetes management system executes the handler according to the hook action,
`httpGet` and `tcpSocket` are executed by the kubelet process, and `exec` is executed in the container.
`httpGet` , `tcpSocket` and `sleep` are executed by the kubelet process, and `exec` is executed in the container.
Hook handler calls are synchronous within the context of the Pod containing the Container.
This means that for a `PostStart` hook,

View File

@ -159,6 +159,17 @@ that Kubernetes will keep trying to pull the image, with an increasing back-off
Kubernetes raises the delay between each attempt until it reaches a compiled-in limit,
which is 300 seconds (5 minutes).
### Image pull per runtime class
{{< feature-state for_k8s_version="v1.29" state="alpha" >}}
Kubernetes includes alpha support for performing image pulls based on the RuntimeClass of a Pod.
If you enable the `RuntimeClassInImageCriApi` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/),
the kubelet references container images by a tuple of (image name, runtime handler) rather than just the
image name or digest. Your {{< glossary_tooltip text="container runtime" term_id="container-runtime" >}}
may adapt its behavior based on the selected runtime handler.
Pulling images based on runtime class will be helpful for VM based containers like windows hyperV containers.
## Serial and parallel image pulls
By default, kubelet pulls images serially. In other words, kubelet sends only

View File

@ -159,8 +159,8 @@ The general workflow of a device plugin includes the following steps:
{{< note >}}
The processing of the fully-qualified CDI device names by the Device Manager requires
that the `DevicePluginCDIDevices` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
is enabled for the kubelet and the kube-apiserver. This was added as an alpha feature in Kubernetes
v1.28.
is enabled for both the kubelet and the kube-apiserver. This was added as an alpha feature in Kubernetes
v1.28 and graduated to beta in v1.29.
{{< /note >}}
### Handling kubelet restarts

View File

@ -358,6 +358,108 @@ The affinity term is applied to namespaces selected by both `namespaceSelector`
Note that an empty `namespaceSelector` ({}) matches all namespaces, while a null or empty `namespaces` list and
null `namespaceSelector` matches the namespace of the Pod where the rule is defined.
#### matchLabelKeys
{{< feature-state for_k8s_version="v1.29" state="alpha" >}}
{{< note >}}
<!-- UPDATE THIS WHEN PROMOTING TO BETA -->
The `matchLabelKeys` field is a alpha-level field and is disabled by default in
Kubernetes {{< skew currentVersion >}}.
When you want to use it, you have to enable it via the
`MatchLabelKeysInPodAffinity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
{{< /note >}}
Kubernetes includes an optional `matchLabelKeys` field for Pod affinity
or anti-affinity. The field specifies keys for the labels that should match with the incoming Pod's labels,
when satisfying the Pod (anti)affinity.
The keys are used to look up values from the pod labels; those key-value labels are combined
(using `AND`) with the match restrictions defined using the `labelSelector` field. The combined
filtering selects the set of existing pods that will be taken into Pod (anti)affinity calculation.
A common use case is to use `matchLabelKeys` with `pod-template-hash` (set on Pods
managed as part of a Deployment, where the value is unique for each revision).
Using `pod-template-hash` in `matchLabelKeys` allows you to target the Pods that belong
to the same revision as the incoming Pod, so that a rolling upgrade won't break affinity.
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: application-server
...
spec:
template:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- database
topologyKey: topology.kubernetes.io/zone
# Only Pods from a given rollout are taken into consideration when calculating pod affinity.
# If you update the Deployment, the replacement Pods follow their own affinity rules
# (if there are any defined in the new Pod template)
matchLabelKeys:
- pod-template-hash
```
#### mismatchLabelKeys
{{< feature-state for_k8s_version="v1.29" state="alpha" >}}
{{< note >}}
<!-- UPDATE THIS WHEN PROMOTING TO BETA -->
The `mismatchLabelKeys` field is a alpha-level field and is disabled by default in
Kubernetes {{< skew currentVersion >}}.
When you want to use it, you have to enable it via the
`MatchLabelKeysInPodAffinity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
{{< /note >}}
Kubernetes includes an optional `mismatchLabelKeys` field for Pod affinity
or anti-affinity. The field specifies keys for the labels that should **not** match with the incoming Pod's labels,
when satisfying the Pod (anti)affinity.
One example use case is to ensure Pods go to the topology domain (node, zone, etc) where only Pods from the same tenant or team are scheduled in.
In other words, you want to avoid running Pods from two different tenants on the same topology domain at the same time.
```yaml
apiVersion: v1
kind: Pod
metadata:
labels:
# Assume that all relevant Pods have a "tenant" label set
tenant: tenant-a
...
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
# ensure that pods associated with this tenant land on the correct node pool
- matchLabelKeys:
- tenant
topologyKey: node-pool
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
# ensure that pods associated with this tenant can't schedule to nodes used for another tenant
- mismatchLabelKeys:
- tenant # whatever the value of the "tenant" label for this Pod, prevent
# scheduling to nodes in any pool where any Pod from a different
# tenant is running.
labelSelector:
# We have to have the labelSelector which selects only Pods with the tenant label,
# otherwise this Pod would hate Pods from daemonsets as well, for example,
# which aren't supposed to have the tenant label.
matchExpressions:
- key: tenant
operator: Exists
topologyKey: node-pool
```
#### More practical use-cases
Inter-pod affinity and anti-affinity can be even more useful when they are used with higher

View File

@ -162,6 +162,17 @@ gets scheduled onto one node and then cannot run there, which is bad because
such a pending Pod also blocks all other resources like RAM or CPU that were
set aside for it.
{{< note >}}
Scheduling of pods which use ResourceClaims is going to be slower because of
the additional communication that is required. Beware that this may also impact
pods that don't use ResourceClaims because only one pod at a time gets
scheduled, blocking API calls are made while handling a pod with
ResourceClaims, and thus scheduling the next pod gets delayed.
{{< /note >}}
## Monitoring resources
The kubelet provides a gRPC service to enable discovery of dynamic resources of

View File

@ -84,8 +84,12 @@ the Pod is put into the active queue or the backoff queue
so that the scheduler will retry the scheduling of the Pod.
{{< note >}}
QueueingHint evaluation during scheduling is a beta-level feature and is enabled by default in 1.28.
You can disable it via the
QueueingHint evaluation during scheduling is a beta-level feature.
The v1.28 release series initially enabled the associated feature gate; however, after the
discovery of an excessive memory footprint, the Kubernetes project set that feature gate
to be disabled by default. In Kubernetes {{< skew currentVersion >}}, this feature gate is
disabled and you need to enable it manually.
You can enable it via the
`SchedulerQueueingHints` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
{{< /note >}}

View File

@ -486,6 +486,12 @@ Restrictions on the following controls are only required if `.spec.os.name` is n
- Seccomp
- Linux Capabilities
## User namespaces
User Namespaces are a Linux-only feature to run workloads with increased
isolation. How they work together with Pod Security Standards is described in
the [documentation](/docs/concepts/workloads/pods/user-namespaces#integration-with-pod-security-admission-checks) for Pods that use user namespaces.
## FAQ
### Why isn't there a profile between privileged and baseline?

View File

@ -247,7 +247,8 @@ request. The API server checks the validity of that bearer token as follows:
The TokenRequest API produces _bound tokens_ for a ServiceAccount. This
binding is linked to the lifetime of the client, such as a Pod, that is acting
as that ServiceAccount.
as that ServiceAccount. See [Token Volume Projection](/docs/tasks/configure-pod-container/configure-service-account/#serviceaccount-token-volume-projection)
for an example of a bound pod service account token's JWT schema and payload.
For tokens issued using the `TokenRequest` API, the API server also checks that
the specific object reference that is using the ServiceAccount still exists,
@ -269,7 +270,7 @@ account credentials, you can use the following methods:
The Kubernetes project recommends that you use the TokenReview API, because
this method invalidates tokens that are bound to API objects such as Secrets,
ServiceAccounts, and Pods when those objects are deleted. For example, if you
ServiceAccounts, Pods or Nodes when those objects are deleted. For example, if you
delete the Pod that contains a projected ServiceAccount token, the cluster
invalidates that token immediately and a TokenReview immediately fails.
If you use OIDC validation instead, your clients continue to treat the token

View File

@ -65,12 +65,12 @@ To configure IPv4/IPv6 dual-stack, set dual-stack cluster network assignments:
* kube-proxy:
* `--cluster-cidr=<IPv4 CIDR>,<IPv6 CIDR>`
* kubelet:
* when there is no `--cloud-provider` the administrator can pass a comma-separated pair of IP
addresses via `--node-ip` to manually configure dual-stack `.status.addresses` for that Node.
If a Pod runs on that node in HostNetwork mode, the Pod reports these IP addresses in its
`.status.podIPs` field.
All `podIPs` in a node match the IP family preference defined by the `.status.addresses`
field for that Node.
* `--node-ip=<IPv4 IP>,<IPv6 IP>`
* This option is required for bare metal dual-stack nodes (nodes that do not define a
cloud provider with the `--cloud-provider` flag). If you are using a cloud provider
and choose to override the node IPs chosen by the cloud provider, set the
`--node-ip` option.
* (The legacy built-in cloud providers do not support dual-stack `--node-ip`.)
{{< note >}}
An example of an IPv4 CIDR: `10.244.0.0/16` (though you would supply your own address range)
@ -79,13 +79,6 @@ An example of an IPv6 CIDR: `fdXY:IJKL:MNOP:15::/64` (this shows the format but
address - see [RFC 4193](https://tools.ietf.org/html/rfc4193))
{{< /note >}}
{{< feature-state for_k8s_version="v1.27" state="alpha" >}}
When using an external cloud provider, you can pass a dual-stack `--node-ip` value to
kubelet if you enable the `CloudDualStackNodeIPs` feature gate in both kubelet and the
external cloud provider. This is only supported for cloud providers that support dual
stack clusters.
## Services
You can create {{< glossary_tooltip text="Services" term_id="service" >}} which can use IPv4, IPv6, or both.

View File

@ -520,16 +520,15 @@ spec:
#### Reserve Nodeport ranges to avoid collisions {#avoid-nodeport-collisions}
{{< feature-state for_k8s_version="v1.28" state="beta" >}}
{{< feature-state for_k8s_version="v1.29" state="stable" >}}
The policy for assigning ports to NodePort services applies to both the auto-assignment and
the manual assignment scenarios. When a user wants to create a NodePort service that
uses a specific port, the target port may conflict with another port that has already been assigned.
In this case, you can enable the feature gate `ServiceNodePortStaticSubrange`, which allows you
to use a different port allocation strategy for NodePort Services. The port range for NodePort services
is divided into two bands. Dynamic port assignment uses the upper band by default, and it may use
the lower band once the upper band has been exhausted. Users can then allocate from the lower band
with a lower risk of port collision.
To avoid this problem, the port range for NodePort services is divided into two bands.
Dynamic port assignment uses the upper band by default, and it may use the lower band once the
upper band has been exhausted. Users can then allocate from the lower band with a lower risk of port collision.
#### Custom IP address configuration for `type: NodePort` Services {#service-nodeport-custom-listen-address}
@ -669,6 +668,28 @@ The value of `spec.loadBalancerClass` must be a label-style identifier,
with an optional prefix such as "`internal-vip`" or "`example.com/internal-vip`".
Unprefixed names are reserved for end-users.
#### Specifying IPMode of load balancer status {#load-balancer-ip-mode}
{{< feature-state for_k8s_version="v1.29" state="alpha" >}}
Starting as Alpha in Kubernetes 1.29,
a [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
named `LoadBalancerIPMode` allows you to set the `.status.loadBalancer.ingress.ipMode`
for a Service with `type` set to `LoadBalancer`.
The `.status.loadBalancer.ingress.ipMode` specifies how the load-balancer IP behaves.
It may be specified only when the `.status.loadBalancer.ingress.ip` field is also specified.
There are two possible values for `.status.loadBalancer.ingress.ipMode`: "VIP" and "Proxy".
The default value is "VIP" meaning that traffic is delivered to the node
with the destination set to the load-balancer's IP and port.
There are two cases when setting this to "Proxy", depending on how the load-balancer
from the cloud provider delivers the traffics:
- If the traffic is delivered to the node then DNATed to the pod, the destination would be set to the node's IP and node port;
- If the traffic is delivered directly to the pod, the destination would be set to the pod's IP and port.
Service implementations may use this information to adjust traffic routing.
#### Internal load balancer
In a mixed environment it is sometimes necessary to route traffic from Services inside the same

View File

@ -17,7 +17,8 @@ weight: 20
<!-- overview -->
This document describes _persistent volumes_ in Kubernetes. Familiarity with
[volumes](/docs/concepts/storage/volumes/) is suggested.
[volumes](/docs/concepts/storage/volumes/), [StorageClasses](/docs/concepts/storage/storage-classes/)
and [VolumeAttributesClasses](/docs/concepts/storage/volume-attributes-classes/) is suggested.
<!-- body -->
@ -39,8 +40,8 @@ NFS, iSCSI, or a cloud-provider-specific storage system.
A _PersistentVolumeClaim_ (PVC) is a request for storage by a user. It is similar
to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can
request specific levels of resources (CPU and Memory). Claims can request specific
size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany or
ReadWriteMany, see [AccessModes](#access-modes)).
size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany,
ReadWriteMany, or ReadWriteOncePod, see [AccessModes](#access-modes)).
While PersistentVolumeClaims allow a user to consume abstract storage resources,
it is common that users need PersistentVolumes with varying properties, such as
@ -618,7 +619,8 @@ The access modes are:
`ReadWriteOnce`
: the volume can be mounted as read-write by a single node. ReadWriteOnce access
mode still can allow multiple pods to access the volume when the pods are running on the same node.
mode still can allow multiple pods to access the volume when the pods are
running on the same node. For single pod access, please see ReadWriteOncePod.
`ReadOnlyMany`
: the volume can be mounted as read-only by many nodes.
@ -627,15 +629,22 @@ The access modes are:
: the volume can be mounted as read-write by many nodes.
`ReadWriteOncePod`
: {{< feature-state for_k8s_version="v1.27" state="beta" >}}
: {{< feature-state for_k8s_version="v1.29" state="stable" >}}
the volume can be mounted as read-write by a single Pod. Use ReadWriteOncePod
access mode if you want to ensure that only one pod across the whole cluster can
read that PVC or write to it. This is only supported for CSI volumes and
Kubernetes version 1.22+.
read that PVC or write to it.
The blog article
[Introducing Single Pod Access Mode for PersistentVolumes](/blog/2021/09/13/read-write-once-pod-access-mode-alpha/)
covers this in more detail.
{{< note >}}
The `ReadWriteOncePod` access mode is only supported for
{{< glossary_tooltip text="CSI" term_id="csi" >}} volumes and Kubernetes version
1.22+. To use this feature you will need to update the following
[CSI sidecars](https://kubernetes-csi.github.io/docs/sidecar-containers.html)
to these versions or greater:
* [csi-provisioner:v3.0.0+](https://github.com/kubernetes-csi/external-provisioner/releases/tag/v3.0.0)
* [csi-attacher:v3.3.0+](https://github.com/kubernetes-csi/external-attacher/releases/tag/v3.3.0)
* [csi-resizer:v1.3.0+](https://github.com/kubernetes-csi/external-resizer/releases/tag/v1.3.0)
{{< /note >}}
In the CLI, the access modes are abbreviated to:
@ -753,7 +762,7 @@ You can see the name of the PVC bound to the PV using `kubectl describe persiste
#### Phase transition timestamp
{{< feature-state for_k8s_version="v1.28" state="alpha" >}}
{{< feature-state for_k8s_version="v1.29" state="beta" >}}
The `.status` field for a PersistentVolume can include an alpha `lastPhaseTransitionTime` field. This field records
the timestamp of when the volume last transitioned its phase. For newly created

View File

@ -24,6 +24,7 @@ Currently, the following types of volume sources can be projected:
* [`downwardAPI`](/docs/concepts/storage/volumes/#downwardapi)
* [`configMap`](/docs/concepts/storage/volumes/#configmap)
* [`serviceAccountToken`](#serviceaccounttoken)
* [`clusterTrustBundle`](#clustertrustbundle)
All sources are required to be in the same namespace as the Pod. For more details,
see the [all-in-one volume](https://git.k8s.io/design-proposals-archive/node/all-in-one-volume.md) design document.
@ -70,6 +71,31 @@ A container using a projected volume source as a [`subPath`](/docs/concepts/stor
volume mount will not receive updates for those volume sources.
{{< /note >}}
## clusterTrustBundle projected volumes {#clustertrustbundle}
{{<feature-state for_k8s_version="v1.29" state="alpha" >}}
{{< note >}}
To use this feature in Kubernetes {{< skew currentVersion >}}, you must enable support for ClusterTrustBundle objects with the `ClusterTrustBundle` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) and `--runtime-config=certificates.k8s.io/v1alpha1/clustertrustbundles=true` kube-apiserver flag, then enable the `ClusterTrustBundleProjection` feature gate.
{{< /note >}}
The `clusterTrustBundle` projected volume source injects the contents of one or more [ClusterTrustBundle](/docs/reference/access-authn-authz/certificate-signing-requests#cluster-trust-bundles) objects as an automatically-updating file in the container filesystem.
ClusterTrustBundles can be selected either by [name](/docs/reference/access-authn-authz/certificate-signing-requests#ctb-signer-unlinked) or by [signer name](/docs/reference/access-authn-authz/certificate-signing-requests#ctb-signer-linked).
To select by name, use the `name` field to designate a single ClusterTrustBundle object.
To select by signer name, use the `signerName` field (and optionally the
`labelSelector` field) to designate a set of ClusterTrustBundle objects that use
the given signer name. If `labelSelector` is not present, then all
ClusterTrustBundles for that signer are selected.
The kubelet deduplicates the certificates in the selected ClusterTrustBundle objects, normalizes the PEM representations (discarding comments and headers), reorders the certificates, and writes them into the file named by `path`. As the set of selected ClusterTrustBundles or their content changes, kubelet keeps the file up-to-date.
By default, the kubelet will prevent the pod from starting if the named ClusterTrustBundle is not found, or if `signerName` / `labelSelector` do not match any ClusterTrustBundles. If this behavior is not what you want, then set the `optional` field to `true`, and the pod will start up with an empty file at `path`.
{{% code_sample file="pods/storage/projected-clustertrustbundle.yaml" %}}
## SecurityContext interactions
The [proposal](https://git.k8s.io/enhancements/keps/sig-storage/2451-service-account-token-volumes#proposal) for file permission handling in projected service account volume enhancement introduced the projected files having the correct owner permissions set.

View File

@ -17,8 +17,6 @@ with [volumes](/docs/concepts/storage/volumes/) and
<!-- body -->
## Introduction
A StorageClass provides a way for administrators to describe the "classes" of
storage they offer. Different classes might map to quality-of-service levels,
or to backup policies, or to arbitrary policies determined by the cluster
@ -26,7 +24,7 @@ administrators. Kubernetes itself is unopinionated about what classes
represent. This concept is sometimes called "profiles" in other storage
systems.
## The StorageClass Resource
## The StorageClass API
Each StorageClass contains the fields `provisioner`, `parameters`, and
`reclaimPolicy`, which are used when a PersistentVolume belonging to the

View File

@ -0,0 +1,131 @@
---
reviewers:
- msau42
- xing-yang
title: Volume Attributes Classes
content_type: concept
weight: 40
---
<!-- overview -->
{{< feature-state for_k8s_version="v1.29" state="alpha" >}}
This page assumes that you are familiar with [StorageClasses](/docs/concepts/storage/storage-classes/),
[volumes](/docs/concepts/storage/volumes/) and [PersistentVolumes](/docs/concepts/storage/persistent-volumes/)
in Kubernetes.
<!-- body -->
A VolumeAttributesClass provides a way for administrators to describe the mutable
"classes" of storage they offer. Different classes might map to different quality-of-service levels.
Kubernetes itself is unopinionated about what these classes represent.
This is an alpha feature and disabled by default.
If you want to test the feature whilst it's alpha, you need to enable the `VolumeAttributesClass`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) for the kube-controller-manager and the kube-apiserver. You use the `--feature-gates` command line argument:
```
--feature-gates="...,VolumeAttributesClass=true"
```
You can also only use VolumeAttributesClasses with storage backed by
{{< glossary_tooltip text="Container Storage Interface" term_id="csi" >}}, and only where the
relevant CSI driver implements the `ModifyVolume` API.
## The VolumeAttributesClass API
Each VolumeAttributesClass contains the `driverName` and `parameters`, which are
used when a PersistentVolume (PV) belonging to the class needs to be dynamically provisioned
or modified.
The name of a VolumeAttributesClass object is significant and is how users can request a particular class.
Administrators set the name and other parameters of a class when first creating VolumeAttributesClass objects.
While the name of a VolumeAttributesClass object in a `PersistentVolumeClaim` is mutable, the parameters in an existing class are immutable.
```yaml
apiVersion: storage.k8s.io/v1alpha1
kind: VolumeAttributesClass
metadata:
name: silver
driverName: pd.csi.storage.gke.io
parameters:
provisioned-iops: "3000"
provisioned-throughput: "50"
```
### Provisioner
Each VolumeAttributesClass has a provisioner that determines what volume plugin is used for provisioning PVs. The field `driverName` must be specified.
The feature support for VolumeAttributesClass is implemented in [kubernetes-csi/external-provisioner](https://github.com/kubernetes-csi/external-provisioner).
You are not restricted to specifying the [kubernetes-csi/external-provisioner](https://github.com/kubernetes-csi/external-provisioner). You can also run and specify external provisioners,
which are independent programs that follow a specification defined by Kubernetes.
Authors of external provisioners have full discretion over where their code lives, how
the provisioner is shipped, how it needs to be run, what volume plugin it uses, etc.
### Resizer
Each VolumeAttributesClass has a resizer that determines what volume plugin is used for modifying PVs. The field `driverName` must be specified.
The modifying volume feature support for VolumeAttributesClass is implemented in [kubernetes-csi/external-resizer](https://github.com/kubernetes-csi/external-resizer).
For example, a existing PersistentVolumeClaim is using a VolumeAttributesClass named silver:
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-pv-claim
spec:
volumeAttributesClassName: silver
```
A new VolumeAttributesClass gold is available in the cluster:
```yaml
apiVersion: storage.k8s.io/v1alpha1
kind: VolumeAttributesClass
metadata:
name: gold
driverName: pd.csi.storage.gke.io
parameters:
iops: "4000"
throughput: "60"
```
The end user can update the PVC with the new VolumeAttributesClass gold and apply:
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-pv-claim
spec:
volumeAttributesClassName: gold
```
## Parameters
VolumeAttributeClasses have parameters that describe volumes belonging to them. Different parameters may be accepted
depending on the provisioner or the resizer. For example, the value `4000`, for the parameter `iops`,
and the parameter `throughput` are specific to GCE PD.
When a parameter is omitted, the default is used at volume provisioning.
If a user apply the PVC with a different VolumeAttributesClass with omitted parameters, the default value of
the parameters may be used depends on the CSI driver implementation.
Please refer to the related CSI driver documentation for more details.
There can be at most 512 parameters defined for a VolumeAttributesClass.
The total length of the parameters object including its keys and values cannot exceed 256 KiB.

View File

@ -181,15 +181,14 @@ A time zone database from the Go standard library is included in the binaries an
### Unsupported TimeZone specification
The implementation of the CronJob API in Kubernetes {{< skew currentVersion >}} lets you set
the `.spec.schedule` field to include a timezone; for example: `CRON_TZ=UTC * * * * *`
or `TZ=UTC * * * * *`.
Specifying a timezone using `CRON_TZ` or `TZ` variables inside `.spec.schedule`
is **not officially supported** (and never has been).
Specifying a timezone that way is **not officially supported** (and never has been).
If you try to set a schedule that includes `TZ` or `CRON_TZ` timezone specification,
Kubernetes reports a [warning](/blog/2020/09/03/warnings/) to the client.
Future versions of Kubernetes will prevent setting the unofficial timezone mechanism entirely.
Starting with Kubernetes 1.29 if you try to set a schedule that includes `TZ` or `CRON_TZ`
timezone specification, Kubernetes will fail to create the resource with a validation
error.
Updates to CronJobs already using `TZ` or `CRON_TZ` will continue to report a
[warning](/blog/2020/09/03/warnings/) to the client.
### Modifying a CronJob

View File

@ -382,7 +382,7 @@ from failed Jobs is not lost inadvertently.
### Backoff limit per index {#backoff-limit-per-index}
{{< feature-state for_k8s_version="v1.28" state="alpha" >}}
{{< feature-state for_k8s_version="v1.29" state="beta" >}}
{{< note >}}
You can only configure the backoff limit per index for an [Indexed](#completion-mode) Job, if you
@ -958,11 +958,12 @@ scaling an indexed Job, such as MPI, Horovord, Ray, and PyTorch training jobs.
### Delayed creation of replacement pods {#pod-replacement-policy}
{{< feature-state for_k8s_version="v1.28" state="alpha" >}}
{{< feature-state for_k8s_version="v1.29" state="beta" >}}
{{< note >}}
You can only set `podReplacementPolicy` on Jobs if you enable the `JobPodReplacementPolicy`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
(enabled by default).
{{< /note >}}
By default, the Job controller recreates Pods as soon they either fail or are terminating (have a deletion timestamp).

View File

@ -116,6 +116,12 @@ spec:
storage: 1Gi
```
{{< note >}}
This example uses the `ReadWriteOnce` access mode, for simplicity. For
production use, the Kubernetes project recommends using the `ReadWriteOncePod`
access mode instead.
{{< /note >}}
In the above example:
* A Headless Service, named `nginx`, is used to control the network domain.

View File

@ -111,9 +111,9 @@ Some Pods have {{< glossary_tooltip text="init containers" term_id="init-contain
as well as {{< glossary_tooltip text="app containers" term_id="app-container" >}}.
By default, init containers run and complete before the app containers are started.
{{< feature-state for_k8s_version="v1.28" state="alpha" >}}
{{< feature-state for_k8s_version="v1.29" state="beta" >}}
Enabling the `SidecarContainers` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
Enabled by default, the `SidecarContainers` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
allows you to specify `restartPolicy: Always` for init containers.
Setting the `Always` restart policy ensures that the init containers where you set it are
kept running during the entire lifetime of the Pod.

View File

@ -175,7 +175,7 @@ through which the Pod has or has not passed. Kubelet manages the following
PodConditions:
* `PodScheduled`: the Pod has been scheduled to a node.
* `PodReadyToStartContainers`: (alpha feature; must be [enabled explicitly](#pod-has-network)) the
* `PodReadyToStartContainers`: (beta feature; enabled by [default](#pod-has-network)) the
Pod sandbox has been successfully created and networking configured.
* `ContainersReady`: all containers in the Pod are ready.
* `Initialized`: all [init containers](/docs/concepts/workloads/pods/init-containers/)
@ -253,19 +253,21 @@ When a Pod's containers are Ready but at least one custom condition is missing o
### Pod network readiness {#pod-has-network}
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
{{< feature-state for_k8s_version="v1.29" state="beta" >}}
{{< note >}}
This condition was renamed from PodHasNetwork to PodReadyToStartContainers.
During its early development, this condition was named `PodHasNetwork`.
{{< /note >}}
After a Pod gets scheduled on a node, it needs to be admitted by the Kubelet and
have any volumes mounted. Once these phases are complete, the Kubelet works with
After a Pod gets scheduled on a node, it needs to be admitted by the kubelet and
to have any required storage volumes mounted. Once these phases are complete,
the kubelet works with
a container runtime (using {{< glossary_tooltip term_id="cri" >}}) to set up a
runtime sandbox and configure networking for the Pod. If the
`PodReadyToStartContainersCondition` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled,
Kubelet reports whether a pod has reached this initialization milestone through
the `PodReadyToStartContainers` condition in the `status.conditions` field of a Pod.
`PodReadyToStartContainersCondition`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled
(it is enabled by default for Kubernetes {{< skew currentVersion >}}), the
`PodReadyToStartContainers` condition will be added to the `status.conditions` field of a Pod.
The `PodReadyToStartContainers` condition is set to `False` by the Kubelet when it detects a
Pod does not have a runtime sandbox with networking configured. This occurs in
@ -515,6 +517,22 @@ termination grace period _begins_. The behavior above is described when the
feature gate `EndpointSliceTerminatingCondition` is enabled.
{{</note>}}
{{<note>}}
Beginning with Kubernetes 1.29, if your Pod includes one or more sidecar containers
(init containers with an Always restart policy), the kubelet will delay sending
the TERM signal to these sidecar containers until the last main container has fully terminated.
The sidecar containers will be terminated in the reverse order they are defined in the Pod spec.
This ensures that sidecar containers continue serving the other containers in the Pod until they are no longer needed.
Note that slow termination of a main container will also delay the termination of the sidecar containers.
If the grace period expires before the termination process is complete, the Pod may enter emergency termination.
In this case, all remaining containers in the Pod will be terminated simultaneously with a short grace period.
Similarly, if the Pod has a preStop hook that exceeds the termination grace period, emergency termination may occur.
In general, if you have used preStop hooks to control the termination order without sidecar containers, you can now
remove them and allow the kubelet to manage sidecar termination automatically.
{{</note>}}
1. When the grace period expires, the kubelet triggers forcible shutdown. The container runtime sends
`SIGKILL` to any processes still running in any container in the Pod.
The kubelet also cleans up a hidden `pause` container if that container runtime uses one.
@ -597,4 +615,4 @@ for more details.
* For detailed information about Pod and container status in the API, see
the API reference documentation covering
[`status`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodStatus) for Pod.
[`status`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#PodStatus) for Pod.

View File

@ -5,7 +5,7 @@ weight: 50
---
<!-- overview -->
{{< feature-state for_k8s_version="v1.28" state="alpha" >}}
{{< feature-state for_k8s_version="v1.29" state="beta" >}}
Sidecar containers are the secondary containers that run along with the main
application container within the same {{< glossary_tooltip text="Pod" term_id="pod" >}}.

View File

@ -152,6 +152,35 @@ host's file owner/group.
[CVE-2021-25741]: https://github.com/kubernetes/kubernetes/issues/104980
## Integration with Pod security admission checks
{{< feature-state state="alpha" for_k8s_version="v1.29" >}}
For Linux Pods that enable user namespaces, Kubernetes relaxes the application of
[Pod Security Standards](/docs/concepts/security/pod-security-standards) in a controlled way.
This behavior can be controlled by the [feature
gate](/docs/reference/command-line-tools-reference/feature-gates/)
`UserNamespacesPodSecurityStandards`, which allows an early opt-in for end
users. Admins have to ensure that user namespaces are enabled by all nodes
within the cluster if using the feature gate.
If you enable the associated feature gate and create a Pod that uses user
namespaces, the following fields won't be constrained even in contexts that enforce the
_Baseline_ or _Restricted_ pod security standard. This behavior does not
present a security concern because `root` inside a Pod with user namespaces
actually refers to the user inside the container, that is never mapped to a
privileged user on the host. Here's the list of fields that are **not** checks for Pods in those
circumstances:
- `spec.securityContext.runAsNonRoot`
- `spec.containers[*].securityContext.runAsNonRoot`
- `spec.initContainers[*].securityContext.runAsNonRoot`
- `spec.ephemeralContainers[*].securityContext.runAsNonRoot`
- `spec.securityContext.runAsUser`
- `spec.containers[*].securityContext.runAsUser`
- `spec.initContainers[*].securityContext.runAsUser`
- `spec.ephemeralContainers[*].securityContext.runAsUser`
## Limitations
When using a user namespace for the pod, it is disallowed to use other host

View File

@ -242,7 +242,7 @@ and are assigned to the groups `system:serviceaccounts` and `system:serviceaccou
{{< warning >}}
Because service account tokens can also be stored in Secret API objects, any user with
write access to Secrets can request a token, and any user with read access to those
write access to Secrets can request a token, and any user with read access to those
Secrets can authenticate as the service account. Be cautious when granting permissions
to service accounts and read or write capabilities for Secrets.
{{< /warning >}}
@ -293,8 +293,9 @@ sequenceDiagram
1. Your identity provider will provide you with an `access_token`, `id_token` and a `refresh_token`
1. When using `kubectl`, use your `id_token` with the `--token` flag or add it directly to your `kubeconfig`
1. `kubectl` sends your `id_token` in a header called Authorization to the API server
1. The API server will make sure the JWT signature is valid by checking against the certificate named in the configuration
1. The API server will make sure the JWT signature is valid
1. Check to make sure the `id_token` hasn't expired
1. Perform claim and/or user validation if CEL expressions are configured with `AuthenticationConfiguration`.
1. Make sure the user is authorized
1. Once authorized the API server returns a response to `kubectl`
1. `kubectl` provides feedback to the user
@ -312,6 +313,8 @@ very scalable solution for authentication. It does offer a few challenges:
#### Configuring the API Server
##### Using flags
To enable the plugin, configure the following flags on the API server:
| Parameter | Description | Example | Required |
@ -326,6 +329,291 @@ To enable the plugin, configure the following flags on the API server:
| `--oidc-ca-file` | The path to the certificate for the CA that signed your identity provider's web certificate. Defaults to the host's root CAs. | `/etc/kubernetes/ssl/kc-ca.pem` | No |
| `--oidc-signing-algs` | The signing algorithms accepted. Default is "RS256". | `RS512` | No |
##### Using Authentication Configuration
{{< feature-state for_k8s_version="v1.29" state="alpha" >}}
JWT Authenticator is an authenticator to authenticate Kubernetes users using JWT compliant tokens. The authenticator will attempt to
parse a raw ID token, verify it's been signed by the configured issuer. The public key to verify the signature is discovered from the issuer's public endpoint using OIDC discovery.
The API server can be configured to use a JWT authenticator via the `--authentication-config` flag. This flag takes a path to a file containing the `AuthenticationConfiguration`. An example configuration is provided below.
To use this config, the `StructuredAuthenticationConfiguration` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
has to be enabled.
{{< note >}}
When the feature is enabled, setting both `--authentication-config` and any of the `--oidc-*` flags will result in an error. If you want to use the feature, you have to remove the `--oidc-*` flags and use the configuration file instead.
{{< /note >}}
```yaml
---
#
# CAUTION: this is an example configuration.
# Do not use this for your own cluster!
#
apiVersion: apiserver.config.k8s.io/v1alpha1
kind: AuthenticationConfiguration
# list of authenticators to authenticate Kubernetes users using JWT compliant tokens.
jwt:
- issuer:
url: https://example.com # Same as --oidc-issuer-url.
audiences:
- my-app # Same as --oidc-client-id.
# rules applied to validate token claims to authenticate users.
claimValidationRules:
# Same as --oidc-required-claim key=value.
- claim: hd
requiredValue: example.com
# Instead of claim and requiredValue, you can use expression to validate the claim.
# expression is a CEL expression that evaluates to a boolean.
# all the expressions must evaluate to true for validation to succeed.
- expression: 'claims.hd == "example.com"'
# Message customizes the error message seen in the API server logs when the validation fails.
message: the hd claim must be set to example.com
- expression: 'claims.exp - claims.nbf <= 86400'
message: total token lifetime must not exceed 24 hours
claimMappings:
# username represents an option for the username attribute.
# This is the only required attribute.
username:
# Same as --oidc-username-claim. Mutually exclusive with username.expression.
claim: "sub"
# Same as --oidc-username-prefix. Mutually exclusive with username.expression.
# if username.claim is set, username.prefix is required.
# Explicitly set it to "" if no prefix is desired.
prefix: ""
# Mutually exclusive with username.claim and username.prefix.
# expression is a CEL expression that evaluates to a string.
expression: 'claims.username + ":external-user"'
# groups represents an option for the groups attribute.
groups:
# Same as --oidc-groups-claim. Mutually exclusive with groups.expression.
claim: "sub"
# Same as --oidc-groups-prefix. Mutually exclusive with groups.expression.
# if groups.claim is set, groups.prefix is required.
# Explicitly set it to "" if no prefix is desired.
prefix: ""
# Mutually exclusive with groups.claim and groups.prefix.
# expression is a CEL expression that evaluates to a string or a list of strings.
expression: 'claims.roles.split(",")'
# uid represents an option for the uid attribute.
uid:
# Mutually exclusive with uid.expression.
claim: 'sub'
# Mutually exclusive with uid.claim
# expression is a CEL expression that evaluates to a string.
expression: 'claims.sub'
# extra attributes to be added to the UserInfo object. Keys must be domain-prefix path and must be unique.
extra:
- key: 'example.com/tenant'
# valueExpression is a CEL expression that evaluates to a string or a list of strings.
valueExpression: 'claims.tenant'
# validation rules applied to the final user object.
userValidationRules:
# expression is a CEL expression that evaluates to a boolean.
# all the expressions must evaluate to true for the user to be valid.
- expression: "!user.username.startsWith('system:')"
# Message customizes the error message seen in the API server logs when the validation fails.
message: 'username cannot used reserved system: prefix'
- expression: "user.groups.all(group, !group.startsWith('system:'))"
message: 'groups cannot used reserved system: prefix'
```
* Claim validation rule expression
`jwt.claimValidationRules[i].expression` represents the expression which will be evaluated by CEL.
CEL expressions have access to the contents of the token payload, organized into `claims` CEL variable.
`claims` is a map of claim names (as strings) to claim values (of any type).
* User validation rule expression
`jwt.userValidationRules[i].expression` represents the expression which will be evaluated by CEL.
CEL expressions have access to the contents of `userInfo`, organized into `user` CEL variable.
Refer to the [UserInfo](/docs/reference/generated/kubernetes-api/v{{< skew currentVersion >}}/#userinfo-v1-authentication-k8s-io) API documentation for the schema of `user`.
* Claim mapping expression
`jwt.claimMappings.username.expression`, `jwt.claimMappings.groups.expression`, `jwt.claimMappings.uid.expression`
`jwt.claimMappings.extra[i].valueExpression` represents the expression which will be evaluated by CEL.
CEL expressions have access to the contents of the token payload, organized into `claims` CEL variable.
`claims` is a map of claim names (as strings) to claim values (of any type).
To learn more, see the [Documentation on CEL](/docs/reference/using-api/cel/)
Here are examples of the `AuthenticationConfiguration` with different token payloads.
{{< tabs name="example_configuration" >}}
{{% tab name="Valid token" %}}
```yaml
apiVersion: apiserver.config.k8s.io/v1alpha1
kind: AuthenticationConfiguration
jwt:
- issuer:
url: https://example.com
audiences:
- my-app
claimMappings:
username:
expression: 'claims.username + ":external-user"'
groups:
expression: 'claims.roles.split(",")'
uid:
expression: 'claims.sub'
extra:
- key: 'example.com/tenant'
valueExpression: 'claims.tenant'
userValidationRules:
- expression: "!user.username.startsWith('system:')" # the expression will evaluate to true, so validation will succeed.
message: 'username cannot used reserved system: prefix'
```
```bash
TOKEN=eyJhbGciOiJSUzI1NiIsImtpZCI6ImY3dF9tOEROWmFTQk1oWGw5QXZTWGhBUC04Y0JmZ0JVbFVpTG5oQkgxdXMiLCJ0eXAiOiJKV1QifQ.eyJhdWQiOiJrdWJlcm5ldGVzIiwiZXhwIjoxNzAzMjMyOTQ5LCJpYXQiOjE3MDExMDcyMzMsImlzcyI6Imh0dHBzOi8vZXhhbXBsZS5jb20iLCJqdGkiOiI3YzMzNzk0MjgwN2U3M2NhYTJjMzBjODY4YWMwY2U5MTBiY2UwMmRkY2JmZWJlOGMyM2I4YjVmMjdhZDYyODczIiwibmJmIjoxNzAxMTA3MjMzLCJyb2xlcyI6InVzZXIsYWRtaW4iLCJzdWIiOiJhdXRoIiwidGVuYW50IjoiNzJmOTg4YmYtODZmMS00MWFmLTkxYWItMmQ3Y2QwMTFkYjRhIiwidXNlcm5hbWUiOiJmb28ifQ.TBWF2RkQHm4QQz85AYPcwLxSk-VLvQW-mNDHx7SEOSv9LVwcPYPuPajJpuQn9C_gKq1R94QKSQ5F6UgHMILz8OfmPKmX_00wpwwNVGeevJ79ieX2V-__W56iNR5gJ-i9nn6FYk5pwfVREB0l4HSlpTOmu80gbPWAXY5hLW0ZtcE1JTEEmefORHV2ge8e3jp1xGafNy6LdJWabYuKiw8d7Qga__HxtKB-t0kRMNzLRS7rka_SfQg0dSYektuxhLbiDkqhmRffGlQKXGVzUsuvFw7IGM5ZWnZgEMDzCI357obHeM3tRqpn5WRjtB8oM7JgnCymaJi-P3iCd88iu1xnzA
```
where the token payload is:
```json
{
"aud": "kubernetes",
"exp": 1703232949,
"iat": 1701107233,
"iss": "https://example.com",
"jti": "7c337942807e73caa2c30c868ac0ce910bce02ddcbfebe8c23b8b5f27ad62873",
"nbf": 1701107233,
"roles": "user,admin",
"sub": "auth",
"tenant": "72f988bf-86f1-41af-91ab-2d7cd011db4a",
"username": "foo"
}
```
The token with the above `AuthenticationConfiguration` will produce the following `UserInfo` object and successfully authenticate the user.
```json
{
"username": "foo:external-user",
"uid": "auth",
"groups": [
"user",
"admin"
],
"extra": {
"example.com/tenant": "tenant1"
}
}
```
{{% /tab %}}
{{% tab name="Fails claim validation" %}}
```yaml
apiVersion: apiserver.config.k8s.io/v1alpha1
kind: AuthenticationConfiguration
jwt:
- issuer:
url: https://example.com
audiences:
- my-app
claimValidationRules:
- expression: 'claims.hd == "example.com"' # the token below does not have this claim, so validation will fail.
message: the hd claim must be set to example.com
claimMappings:
username:
expression: 'claims.username + ":external-user"'
groups:
expression: 'claims.roles.split(",")'
uid:
expression: 'claims.sub'
extra:
- key: 'example.com/tenant'
valueExpression: 'claims.tenant'
userValidationRules:
- expression: "!user.username.startsWith('system:')" # the expression will evaluate to true, so validation will succeed.
message: 'username cannot used reserved system: prefix'
```
```bash
TOKEN=eyJhbGciOiJSUzI1NiIsImtpZCI6ImY3dF9tOEROWmFTQk1oWGw5QXZTWGhBUC04Y0JmZ0JVbFVpTG5oQkgxdXMiLCJ0eXAiOiJKV1QifQ.eyJhdWQiOiJrdWJlcm5ldGVzIiwiZXhwIjoxNzAzMjMyOTQ5LCJpYXQiOjE3MDExMDcyMzMsImlzcyI6Imh0dHBzOi8vZXhhbXBsZS5jb20iLCJqdGkiOiI3YzMzNzk0MjgwN2U3M2NhYTJjMzBjODY4YWMwY2U5MTBiY2UwMmRkY2JmZWJlOGMyM2I4YjVmMjdhZDYyODczIiwibmJmIjoxNzAxMTA3MjMzLCJyb2xlcyI6InVzZXIsYWRtaW4iLCJzdWIiOiJhdXRoIiwidGVuYW50IjoiNzJmOTg4YmYtODZmMS00MWFmLTkxYWItMmQ3Y2QwMTFkYjRhIiwidXNlcm5hbWUiOiJmb28ifQ.TBWF2RkQHm4QQz85AYPcwLxSk-VLvQW-mNDHx7SEOSv9LVwcPYPuPajJpuQn9C_gKq1R94QKSQ5F6UgHMILz8OfmPKmX_00wpwwNVGeevJ79ieX2V-__W56iNR5gJ-i9nn6FYk5pwfVREB0l4HSlpTOmu80gbPWAXY5hLW0ZtcE1JTEEmefORHV2ge8e3jp1xGafNy6LdJWabYuKiw8d7Qga__HxtKB-t0kRMNzLRS7rka_SfQg0dSYektuxhLbiDkqhmRffGlQKXGVzUsuvFw7IGM5ZWnZgEMDzCI357obHeM3tRqpn5WRjtB8oM7JgnCymaJi-P3iCd88iu1xnzA
```
where the token payload is:
```json
{
"aud": "kubernetes",
"exp": 1703232949,
"iat": 1701107233,
"iss": "https://example.com",
"jti": "7c337942807e73caa2c30c868ac0ce910bce02ddcbfebe8c23b8b5f27ad62873",
"nbf": 1701107233,
"roles": "user,admin",
"sub": "auth",
"tenant": "72f988bf-86f1-41af-91ab-2d7cd011db4a",
"username": "foo"
}
```
The token with the above `AuthenticationConfiguration` will fail to authenticate because the `hd` claim is not set to `example.com`. The API server will return `401 Unauthorized` error.
{{% /tab %}}
{{% tab name="Fails user validation" %}}
```yaml
apiVersion: apiserver.config.k8s.io/v1alpha1
kind: AuthenticationConfiguration
jwt:
- issuer:
url: https://example.com
audiences:
- my-app
claimValidationRules:
- expression: 'claims.hd == "example.com"'
message: the hd claim must be set to example.com
claimMappings:
username:
expression: '"system:" + claims.username' # this will prefix the username with "system:" and will fail user validation.
groups:
expression: 'claims.roles.split(",")'
uid:
expression: 'claims.sub'
extra:
- key: 'example.com/tenant'
valueExpression: 'claims.tenant'
userValidationRules:
- expression: "!user.username.startsWith('system:')" # the username will be system:foo and expression will evaluate to false, so validation will fail.
message: 'username cannot used reserved system: prefix'
```
```bash
TOKEN=eyJhbGciOiJSUzI1NiIsImtpZCI6ImY3dF9tOEROWmFTQk1oWGw5QXZTWGhBUC04Y0JmZ0JVbFVpTG5oQkgxdXMiLCJ0eXAiOiJKV1QifQ.eyJhdWQiOiJrdWJlcm5ldGVzIiwiZXhwIjoxNzAzMjMyOTQ5LCJoZCI6ImV4YW1wbGUuY29tIiwiaWF0IjoxNzAxMTEzMTAxLCJpc3MiOiJodHRwczovL2V4YW1wbGUuY29tIiwianRpIjoiYjViMDY1MjM3MmNkMjBlMzQ1YjZmZGZmY2RjMjE4MWY0YWZkNmYyNTlhYWI0YjdlMzU4ODEyMzdkMjkyMjBiYyIsIm5iZiI6MTcwMTExMzEwMSwicm9sZXMiOiJ1c2VyLGFkbWluIiwic3ViIjoiYXV0aCIsInRlbmFudCI6IjcyZjk4OGJmLTg2ZjEtNDFhZi05MWFiLTJkN2NkMDExZGI0YSIsInVzZXJuYW1lIjoiZm9vIn0.FgPJBYLobo9jnbHreooBlvpgEcSPWnKfX6dc0IvdlRB-F0dCcgy91oCJeK_aBk-8zH5AKUXoFTlInfLCkPivMOJqMECA1YTrMUwt_IVqwb116AqihfByUYIIqzMjvUbthtbpIeHQm2fF0HbrUqa_Q0uaYwgy8mD807h7sBcUMjNd215ff_nFIHss-9zegH8GI1d9fiBf-g6zjkR1j987EP748khpQh9IxPjMJbSgG_uH5x80YFuqgEWwq-aYJPQxXX6FatP96a2EAn7wfPpGlPRt0HcBOvq5pCnudgCgfVgiOJiLr_7robQu4T1bis0W75VPEvwWtgFcLnvcQx0JWg
```
where the token payload is:
```json
{
"aud": "kubernetes",
"exp": 1703232949,
"hd": "example.com",
"iat": 1701113101,
"iss": "https://example.com",
"jti": "b5b0652372cd20e345b6fdffcdc2181f4afd6f259aab4b7e35881237d29220bc",
"nbf": 1701113101,
"roles": "user,admin",
"sub": "auth",
"tenant": "72f988bf-86f1-41af-91ab-2d7cd011db4a",
"username": "foo"
}
```
The token with the above `AuthenticationConfiguration` will produce the following `UserInfo` object:
```json
{
"username": "system:foo",
"uid": "auth",
"groups": [
"user",
"admin"
],
"extra": {
"example.com/tenant": "tenant1"
}
}
```
which will fail user validation because the username starts with `system:`. The API server will return `401 Unauthorized` error.
{{% /tab %}}
{{< /tabs >}}
Importantly, the API server is not an OAuth2 client, rather it can only be
configured to trust a single issuer. This allows the use of public providers,
such as Google, without trusting credentials issued to third parties. Admins who
@ -432,7 +720,7 @@ Webhook authentication is a hook for verifying bearer tokens.
* `--authentication-token-webhook-config-file` a configuration file describing how to access the remote webhook service.
* `--authentication-token-webhook-cache-ttl` how long to cache authentication decisions. Defaults to two minutes.
* `--authentication-token-webhook-version` determines whether to use `authentication.k8s.io/v1beta1` or `authentication.k8s.io/v1`
* `--authentication-token-webhook-version` determines whether to use `authentication.k8s.io/v1beta1` or `authentication.k8s.io/v1`
`TokenReview` objects to send/receive information from the webhook. Defaults to `v1beta1`.
The configuration file uses the [kubeconfig](/docs/concepts/configuration/organize-cluster-access-kubeconfig/)
@ -489,9 +777,9 @@ To opt into receiving `authentication.k8s.io/v1` token reviews, the API server m
"spec": {
# Opaque bearer token sent to the API server
"token": "014fbff9a07c...",
# Optional list of the audience identifiers for the server the token was presented to.
# Audience-aware token authenticators (for example, OIDC token authenticators)
# Audience-aware token authenticators (for example, OIDC token authenticators)
# should verify the token was intended for at least one of the audiences in this list,
# and return the intersection of this list and the valid audiences for the token in the response status.
# This ensures the token is valid to authenticate to the server it was presented to.
@ -509,9 +797,9 @@ To opt into receiving `authentication.k8s.io/v1` token reviews, the API server m
"spec": {
# Opaque bearer token sent to the API server
"token": "014fbff9a07c...",
# Optional list of the audience identifiers for the server the token was presented to.
# Audience-aware token authenticators (for example, OIDC token authenticators)
# Audience-aware token authenticators (for example, OIDC token authenticators)
# should verify the token was intended for at least one of the audiences in this list,
# and return the intersection of this list and the valid audiences for the token in the response status.
# This ensures the token is valid to authenticate to the server it was presented to.
@ -870,7 +1158,7 @@ rules:
{{< note >}}
Impersonating a user or group allows you to perform any action as if you were that user or group;
for that reason, impersonation is not namespace scoped.
If you want to allow impersonation using Kubernetes RBAC,
If you want to allow impersonation using Kubernetes RBAC,
this requires using a `ClusterRole` and a `ClusterRoleBinding`,
not a `Role` and `RoleBinding`.
{{< /note >}}
@ -1378,7 +1666,7 @@ status:
{{% /tab %}}
{{< /tabs >}}
This feature is extremely useful when a complicated authentication flow is used in a Kubernetes cluster,
This feature is extremely useful when a complicated authentication flow is used in a Kubernetes cluster,
for example, if you use [webhook token authentication](/docs/reference/access-authn-authz/authentication/#webhook-token-authentication)
or [authenticating proxy](/docs/reference/access-authn-authz/authentication/#authenticating-proxy).
@ -1390,7 +1678,7 @@ you see the user details and properties for the user that was impersonated.
{{< /note >}}
By default, all authenticated users can create `SelfSubjectReview` objects when the `APISelfSubjectReview`
feature is enabled. It is allowed by the `system:basic-user` cluster role.
feature is enabled. It is allowed by the `system:basic-user` cluster role.
{{< note >}}
You can only make `SelfSubjectReview` requests if:

View File

@ -209,6 +209,143 @@ The following flags can be used:
You can choose more than one authorization module. Modules are checked in order
so an earlier module has higher priority to allow or deny a request.
## Configuring the API Server using an Authorization Config File
{{< feature-state state="alpha" for_k8s_version="v1.29" >}}
The Kubernetes API server's authorizer chain can be configured using a
configuration file.
You specify the path to that authorization configuration using the
`--authorization-config` command line argument. This feature enables
creation of authorization chains with multiple webhooks with well-defined
parameters that validate requests in a certain order and enables fine grained
control - such as explicit Deny on failures. An example configuration with
all possible values is provided below.
In order to customise the authorizer chain, you need to enable the
`StructuredAuthorizationConfiguration` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
Note: When the feature is enabled, setting both `--authorization-config` and
configuring an authorization webhook using the `--authorization-mode` and
`--authorization-webhook-*` command line flags is not allowed. If done, there
will be an error and API Server would exit right away.
{{< caution >}}
While the feature is in Alpha/Beta, there is no change if you want to keep on
using command line flags. When the feature goes Beta, the feature flag would
be turned on by default. The feature flag would be removed when feature goes GA.
When configuring the authorizer chain using a config file, make sure all the
apiserver nodes have the file. Also, take a note of the apiserver configuration
when upgrading/downgrading the clusters. For example, if upgrading to v1.29+
clusters and using the config file, you would need to make sure the config file
exists before upgrading the cluster. When downgrading to v1.28, you would need
to add the flags back to their bootstrap mechanism.
{{< /caution >}}
```yaml
#
# DO NOT USE THE CONFIG AS IS. THIS IS AN EXAMPLE.
#
apiVersion: apiserver.config.k8s.io/v1alpha1
kind: AuthorizationConfiguration
# authorizers are defined in order of precedence
authorizers:
- type: Webhook
# Name used to describe the authorizer
# This is explicitly used in monitoring machinery for metrics
# Note:
# - Validation for this field is similar to how K8s labels are validated today.
# Required, with no default
name: webhook
webhook:
# The duration to cache 'authorized' responses from the webhook
# authorizer.
# Same as setting `--authorization-webhook-cache-authorized-ttl` flag
# Default: 5m0s
authorizedTTL: 30s
# The duration to cache 'unauthorized' responses from the webhook
# authorizer.
# Same as setting `--authorization-webhook-cache-unauthorized-ttl` flag
# Default: 30s
unauthorizedTTL: 30s
# Timeout for the webhook request
# Maximum allowed is 30s.
# Required, with no default.
timeout: 3s
# The API version of the authorization.k8s.io SubjectAccessReview to
# send to and expect from the webhook.
# Same as setting `--authorization-webhook-version` flag
# Required, with no default
# Valid values: v1beta1, v1
subjectAccessReviewVersion: v1
# MatchConditionSubjectAccessReviewVersion specifies the SubjectAccessReview
# version the CEL expressions are evaluated against
# Valid values: v1
# Required only if matchConditions are specified, no default value
matchConditionSubjectAccessReviewVersion: v1
# Controls the authorization decision when a webhook request fails to
# complete or returns a malformed response or errors evaluating
# matchConditions.
# Valid values:
# - NoOpinion: continue to subsequent authorizers to see if one of
# them allows the request
# - Deny: reject the request without consulting subsequent authorizers
# Required, with no default.
failurePolicy: Deny
connectionInfo:
# Controls how the webhook should communicate with the server.
# Valid values:
# - KubeConfig: use the file specified in kubeConfigFile to locate the
# server.
# - InClusterConfig: use the in-cluster configuration to call the
# SubjectAccessReview API hosted by kube-apiserver. This mode is not
# allowed for kube-apiserver.
type: KubeConfig
# Path to KubeConfigFile for connection info
# Required, if connectionInfo.Type is KubeConfig
kubeConfigFile: /kube-system-authz-webhook.yaml
# matchConditions is a list of conditions that must be met for a request to be sent to this
# webhook. An empty list of matchConditions matches all requests.
# There are a maximum of 64 match conditions allowed.
#
# The exact matching logic is (in order):
# 1. If at least one matchCondition evaluates to FALSE, then the webhook is skipped.
# 2. If ALL matchConditions evaluate to TRUE, then the webhook is called.
# 3. If at least one matchCondition evaluates to an error (but none are FALSE):
# - If failurePolicy=Deny, then the webhook rejects the request
# - If failurePolicy=NoOpinion, then the error is ignored and the webhook is skipped
matchConditions:
# expression represents the expression which will be evaluated by CEL. Must evaluate to bool.
# CEL expressions have access to the contents of the SubjectAccessReview in v1 version.
# If version specified by subjectAccessReviewVersion in the request variable is v1beta1,
# the contents would be converted to the v1 version before evaluating the CEL expression.
#
# Documentation on CEL: https://kubernetes.io/docs/reference/using-api/cel/
#
# only send resource requests to the webhook
- expression: has(request.resourceAttributes)
# only intercept requests to kube-system
- expression: request.resourceAttributes.namespace == 'kube-system'
# don't intercept requests from kube-system service accounts
- expression: !('system:serviceaccounts:kube-system' in request.user.groups)
- type: Node
name: node
- type: RBAC
name: rbac
- type: Webhook
name: in-cluster-authorizer
webhook:
authorizedTTL: 5m
unauthorizedTTL: 30s
timeout: 3s
subjectAccessReviewVersion: v1
failurePolicy: NoOpinion
connectionInfo:
type: InClusterConfig
```
## Privilege escalation via workload creation or edits {#privilege-escalation-via-pod-creation}
Users who can create/edit pods in a namespace, either directly or through a [controller](/docs/concepts/architecture/controller/)
@ -241,4 +378,3 @@ This should be considered when deciding on your RBAC controls.
* To learn more about Authentication, see **Authentication** in [Controlling Access to the Kubernetes API](/docs/concepts/security/controlling-access/).
* To learn more about Admission Control, see [Using Admission Controllers](/docs/reference/access-authn-authz/admission-controllers/).

View File

@ -371,7 +371,7 @@ you like. If you want to add a note for human consumption, use the
{{< feature-state for_k8s_version="v1.27" state="alpha" >}}
{{< note >}}
In Kubernetes {{< skew currentVersion >}}, you must enable the `ClusterTrustBundles`
In Kubernetes {{< skew currentVersion >}}, you must enable the `ClusterTrustBundle`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
_and_ the `certificates.k8s.io/v1alpha1`
{{< glossary_tooltip text="API group" term_id="api-group" >}} in order to use
@ -472,6 +472,12 @@ such as role-based access control.
To distinguish them from signer-linked ClusterTrustBundles, the names of
signer-unlinked ClusterTrustBundles **must not** contain a colon (`:`).
### Accessing ClusterTrustBundles from pods {#ctb-projection}
{{<feature-state for_k8s_version="v1.29" state="alpha" >}}
The contents of ClusterTrustBundles can be injected into the container filesystem, similar to ConfigMaps and Secrets. See the [clusterTrustBundle projected volume source](/docs/concepts/storage/projected-volumes#clustertrustbundle) for more details.
<!-- TODO this should become a task page -->
## How to issue a certificate for a user {#normal-user}

View File

@ -1,9 +1,7 @@
---
reviewers:
- bprashanth
- davidopp
- lavalamp
- liggitt
- enj
title: Managing Service Accounts
content_type: concept
weight: 50
@ -140,6 +138,62 @@ using [TokenRequest](/docs/reference/kubernetes-api/authentication-resources/tok
to obtain short-lived API access tokens is recommended instead.
{{< /note >}}
## Auto-generated legacy ServiceAccount token clean up {#auto-generated-legacy-serviceaccount-token-clean-up}
Before version 1.24, Kubernetes automatically generated Secret-based tokens for
ServiceAccounts. To distinguish between automatically generated tokens and
manually created ones, Kubernetes checks for a reference from the
ServiceAccount's secrets field. If the Secret is referenced in the `secrets`
field, it is considered an auto-generated legacy token. Otherwise, it is
considered a manually created legacy token. For example:
```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: build-robot
namespace: default
secrets:
- name: build-robot-secret # usually NOT present for a manually generated token
```
Beginning from version 1.29, legacy ServiceAccount tokens that were generated
automatically will be marked as invalid if they remain unused for a certain
period of time (set to default at one year). Tokens that continue to be unused
for this defined period (again, by default, one year) will subsequently be
purged by the control plane.
If users use an invalidated auto-generated token, the token validator will
1. add an audit annotation for the key-value pair
`authentication.k8s.io/legacy-token-invalidated: <secret name>/<namepace>`,
1. increment the `invalid_legacy_auto_token_uses_total` metric count,
1. update the Secret label `kubernetes.io/legacy-token-last-used` with the new
date,
1. return an error indicating that the token has been invalidated.
When receiving this validation error, users can update the Secret to remove the
`kubernetes.io/legacy-token-invalid-since` label to temporarily allow use of
this token.
Here's an example of an auto-generated legacy token that has been marked with the
`kubernetes.io/legacy-token-last-used` and `kubernetes.io/legacy-token-invalid-since`
labels:
```yaml
apiVersion: v1
kind: Secret
metadata:
name: build-robot-secret
namespace: default
labels:
kubernetes.io/legacy-token-last-used: 2022-10-24
kubernetes.io/legacy-token-invalid-since: 2023-10-25
annotations:
kubernetes.io/service-account.name: build-robot
type: kubernetes.io/service-account-token
```
## Control plane details
### ServiceAccount controller
@ -193,6 +247,51 @@ it does the following when a Pod is created:
1. If the spec of the incoming Pod doesn't already contain any `imagePullSecrets`, then the
admission controller adds `imagePullSecrets`, copying them from the `ServiceAccount`.
### Legacy ServiceAccount token tracking controller
{{< feature-state for_k8s_version="v1.28" state="stable" >}}
This controller generates a ConfigMap called
`kube-system/kube-apiserver-legacy-service-account-token-tracking` in the
`kube-system` namespace. The ConfigMap records the timestamp when legacy service
account tokens began to be monitored by the system.
### Legacy ServiceAccount token cleaner
{{< feature-state for_k8s_version="v1.29" state="beta" >}}
The legacy ServiceAccount token cleaner runs as part of the
`kube-controller-manager` and checks every 24 hours to see if any auto-generated
legacy ServiceAccount token has not been used in a *specified amount of time*.
If so, the cleaner marks those tokens as invalid.
The cleaner works by first checking the ConfigMap created by the control plane
(provided that `LegacyServiceAccountTokenTracking` is enabled). If the current
time is a *specified amount of time* after the date in the ConfigMap, the
cleaner then loops through the list of Secrets in the cluster and evaluates each
Secret that has the type `kubernetes.io/service-account-token`.
If a Secret meets all of the following conditions, the cleaner marks it as
invalid:
- The Secret is auto-generated, meaning that it is bi-directionally referenced
by a ServiceAccount.
- The Secret is not currently mounted by any pods.
- The Secret has not been used in a *specified amount of time* since it was
created or since it was last used.
The cleaner marks a Secret invalid by adding a label called
`kubernetes.io/legacy-token-invalid-since` to the Secret, with the current date
as the value. If an invalid Secret is not used in a *specified amount of time*,
the cleaner will delete it.
{{< note >}}
All the *specified amount of time* above defaults to one year. The cluster
administrator can configure this value through the
`--legacy-service-account-token-clean-up-period` command line argument for the
`kube-controller-manager` component.
{{< /note >}}
### TokenRequest API
{{< feature-state for_k8s_version="v1.22" state="stable" >}}
@ -300,6 +399,12 @@ token: ...
If you launch a new Pod into the `examplens` namespace, it can use the `myserviceaccount`
service-account-token Secret that you just created.
{{< caution >}}
Do not reference manually created Secrets in the `secrets` field of a
ServiceAccount. Or the manually created Secrets will be cleaned if it is not used for a long
time. Please refer to [auto-generated legacy ServiceAccount token clean up](#auto-generated-legacy-serviceaccount-token-clean-up).
{{< /caution >}}
## Delete/invalidate a ServiceAccount token {#delete-token}
If you know the name of the Secret that contains the token you want to remove:

View File

@ -120,6 +120,9 @@ In the following table:
| `CronJobControllerV2` | `false` | Alpha | 1.20 | 1.20 |
| `CronJobControllerV2` | `true` | Beta | 1.21 | 1.21 |
| `CronJobControllerV2` | `true` | GA | 1.22 | 1.23 |
| `CronJobTimeZone` | `false` | Alpha | 1.24 | 1.24 |
| `CronJobTimeZone` | `true` | Beta | 1.25 | 1.26 |
| `CronJobTimeZone` | `true` | GA | 1.27 | 1.28 |
| `CustomPodDNS` | `false` | Alpha | 1.9 | 1.9 |
| `CustomPodDNS` | `true` | Beta| 1.10 | 1.13 |
| `CustomPodDNS` | `true` | GA | 1.14 | 1.16 |
@ -153,6 +156,10 @@ In the following table:
| `DisableAcceleratorUsageMetrics` | `false` | Alpha | 1.19 | 1.19 |
| `DisableAcceleratorUsageMetrics` | `true` | Beta | 1.20 | 1.24 |
| `DisableAcceleratorUsageMetrics` | `true` | GA | 1.25 | 1.27 |
| `DownwardAPIHugePages` | `false` | Alpha | 1.20 | 1.20 |
| `DownwardAPIHugePages` | `false` | Beta | 1.21 | 1.21 |
| `DownwardAPIHugePages` | `true` | Beta | 1.22 | 1.26 |
| `DownwardAPIHugePages` | `true` | GA | 1.27 | 1.28 |
| `DryRun` | `false` | Alpha | 1.12 | 1.12 |
| `DryRun` | `true` | Beta | 1.13 | 1.18 |
| `DryRun` | `true` | GA | 1.19 | 1.27 |
@ -200,6 +207,9 @@ In the following table:
| `ExternalPolicyForExternalIP` | `true` | GA | 1.18 | 1.22 |
| `GCERegionalPersistentDisk` | `true` | Beta | 1.10 | 1.12 |
| `GCERegionalPersistentDisk` | `true` | GA | 1.13 | 1.16 |
| `GRPCContainerProbe` | `false` | Alpha | 1.23 | 1.23 |
| `GRPCContainerProbe` | `true` | Beta | 1.24 | 1.26 |
| `GRPCContainerProbe` | `true` | GA | 1.27 | 1.28 |
| `GenericEphemeralVolume` | `false` | Alpha | 1.19 | 1.20 |
| `GenericEphemeralVolume` | `true` | Beta | 1.21 | 1.22 |
| `GenericEphemeralVolume` | `true` | GA | 1.23 | 1.24 |
@ -228,6 +238,8 @@ In the following table:
| `IngressClassNamespacedParams` | `true` | GA | 1.23 | 1.24 |
| `Initializers` | `false` | Alpha | 1.7 | 1.13 |
| `Initializers` | - | Deprecated | 1.14 | 1.14 |
| `JobMutableNodeSchedulingDirectives` | `true` | Beta | 1.23 | 1.26 |
| `JobMutableNodeSchedulingDirectives` | `true` | GA | 1.27 | 1.28 |
| `KMSv1` | `true` | Deprecated | 1.28 | |
| `KubeletConfigFile` | `false` | Alpha | 1.8 | 1.9 |
| `KubeletConfigFile` | - | Deprecated | 1.10 | 1.10 |
@ -240,6 +252,8 @@ In the following table:
| `LegacyNodeRoleBehavior` | `false` | Alpha | 1.16 | 1.18 |
| `LegacyNodeRoleBehavior` | `true` | Beta | 1.19 | 1.20 |
| `LegacyNodeRoleBehavior` | `false` | GA | 1.21 | 1.22 |
| `LegacyServiceAccountTokenNoAutoGeneration` | `true` | Beta | 1.24 | 1.25 |
| `LegacyServiceAccountTokenNoAutoGeneration` | `true` | GA | 1.26 | 1.28 |
| `LocalStorageCapacityIsolation` | `false` | Alpha | 1.7 | 1.9 |
| `LocalStorageCapacityIsolation` | `true` | Beta | 1.10 | 1.24 |
| `LocalStorageCapacityIsolation` | `true` | GA | 1.25 | 1.26 |
@ -303,6 +317,9 @@ In the following table:
| `ResourceQuotaScopeSelectors` | `false` | Alpha | 1.11 | 1.11 |
| `ResourceQuotaScopeSelectors` | `true` | Beta | 1.12 | 1.16 |
| `ResourceQuotaScopeSelectors` | `true` | GA | 1.17 | 1.18 |
| `RetroactiveDefaultStorageClass` | `false` | Alpha | 1.25 | 1.25 |
| `RetroactiveDefaultStorageClass` | `true` | Beta | 1.26 | 1.27 |
| `RetroactiveDefaultStorageClass` | `true` | GA | 1.28 | 1.28 |
| `RootCAConfigMap` | `false` | Alpha | 1.13 | 1.19 |
| `RootCAConfigMap` | `true` | Beta | 1.20 | 1.20 |
| `RootCAConfigMap` | `true` | GA | 1.21 | 1.22 |
@ -393,6 +410,9 @@ In the following table:
| `TokenRequestProjection` | `false` | Alpha | 1.11 | 1.11 |
| `TokenRequestProjection` | `true` | Beta | 1.12 | 1.19 |
| `TokenRequestProjection` | `true` | GA | 1.20 | 1.21 |
| `TopologyManager` | `false` | Alpha | 1.16 | 1.17 |
| `TopologyManager` | `true` | Beta | 1.18 | 1.26 |
| `TopologyManager` | `true` | GA | 1.27 | 1.28 |
| `UserNamespacesStatelessPodsSupport` | `false` | Alpha | 1.25 | 1.27 |
| `ValidateProxyRedirects` | `false` | Alpha | 1.12 | 1.13 |
| `ValidateProxyRedirects` | `true` | Beta | 1.14 | 1.21 |
@ -591,10 +611,6 @@ In the following table:
[Configure volume permission and ownership change policy for Pods](/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods)
for more details.
- `CronJobControllerV2`: Use an alternative implementation of the
{{< glossary_tooltip text="CronJob" term_id="cronjob" >}} controller. Otherwise,
version 1 of the same controller is selected.
- `ControllerManagerLeaderMigration`: Enables Leader Migration for
[kube-controller-manager](/docs/tasks/administer-cluster/controller-manager-leader-migration/#initial-leader-migration-configuration) and
[cloud-controller-manager](/docs/tasks/administer-cluster/controller-manager-leader-migration/#deploy-cloud-controller-manager)
@ -602,6 +618,12 @@ In the following table:
controllers from the kube-controller-manager into an external controller-manager
(e.g. the cloud-controller-manager) in an HA cluster without downtime.
- `CronJobControllerV2`: Use an alternative implementation of the
{{< glossary_tooltip text="CronJob" term_id="cronjob" >}} controller. Otherwise,
version 1 of the same controller is selected.
- `CronJobTimeZone`: Allow the use of the `timeZone` optional field in [CronJobs](/docs/concepts/workloads/controllers/cron-jobs/)
- `CustomPodDNS`: Enable customizing the DNS settings for a Pod using its `dnsConfig` property.
Check [Pod's DNS Config](/docs/concepts/services-networking/dns-pod-service/#pods-dns-config)
for more details.
@ -636,6 +658,9 @@ In the following table:
- `DisableAcceleratorUsageMetrics`:
[Disable accelerator metrics collected by the kubelet](/docs/concepts/cluster-administration/system-metrics/#disable-accelerator-metrics).
- `DownwardAPIHugePages`: Enables usage of hugepages in
[downward API](/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information).
- `DryRun`: Enable server-side [dry run](/docs/reference/using-api/api-concepts/#dry-run) requests
so that validation, merging, and mutation can be tested without committing.
@ -695,6 +720,9 @@ In the following table:
- `GCERegionalPersistentDisk`: Enable the regional PD feature on GCE.
- `GRPCContainerProbe`: Enables the gRPC probe method for {Liveness,Readiness,Startup}Probe.
See [Configure Liveness, Readiness and Startup Probes](/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-grpc-liveness-probe).
- `GenericEphemeralVolume`: Enables ephemeral, inline volumes that support all features
of normal volumes (can be provided by third-party storage vendors, storage capacity tracking,
restore from snapshot, etc.).
@ -731,6 +759,9 @@ In the following table:
- `Initializers`: Allow asynchronous coordination of object creation using the
Initializers admission plugin.
- `JobMutableNodeSchedulingDirectives`: Allows updating node scheduling directives in
the pod template of [Job](/docs/concepts/workloads/controllers/job).
- `KubeletConfigFile`: Enable loading kubelet configuration from
a file specified using a config file.
See [setting kubelet parameters via a config file](/docs/tasks/administer-cluster/kubelet-config-file/)
@ -746,6 +777,9 @@ In the following table:
node disruption will ignore the `node-role.kubernetes.io/master` label in favor of the
feature-specific labels provided by `NodeDisruptionExclusion` and `ServiceNodeExclusion`.
- `LegacyServiceAccountTokenNoAutoGeneration`: Stop auto-generation of Secret-based
[service account tokens](/docs/concepts/security/service-accounts/#get-a-token).
- `LocalStorageCapacityIsolation`: Enable the consumption of
[local ephemeral storage](/docs/concepts/configuration/manage-resources-containers/)
and also the `sizeLimit` property of an
@ -818,6 +852,8 @@ In the following table:
- `ResourceQuotaScopeSelectors`: Enable resource quota scope selectors.
- `RetroactiveDefaultStorageClass`: Allow assigning StorageClass to unbound PVCs retroactively.
- `RootCAConfigMap`: Configure the `kube-controller-manager` to publish a
{{< glossary_tooltip text="ConfigMap" term_id="configmap" >}} named `kube-root-ca.crt`
to every namespace. This ConfigMap contains a CA bundle used for verifying connections
@ -920,6 +956,10 @@ In the following table:
- `TokenRequestProjection`: Enable the injection of service account tokens into a Pod through a
[`projected` volume](/docs/concepts/storage/volumes/#projected).
- `TopologyManager`: Enable a mechanism to coordinate fine-grained hardware resource
assignments for different components in Kubernetes. See
[Control Topology Management Policies on a node](/docs/tasks/administer-cluster/topology-manager/).
- `UserNamespacesStatelessPodsSupport`: Enable user namespace support for stateless Pods. This flag was renamed on newer releases to `UserNamespacesSupport`.
- `ValidateProxyRedirects`: This flag controls whether the API server should validate that redirects

View File

@ -55,8 +55,6 @@ For a reference to old feature gates that are removed, please refer to
| Feature | Default | Stage | Since | Until |
|---------|---------|-------|-------|-------|
| `APIListChunking` | `false` | Alpha | 1.8 | 1.8 |
| `APIListChunking` | `true` | Beta | 1.9 | |
| `APIPriorityAndFairness` | `false` | Alpha | 1.18 | 1.19 |
| `APIPriorityAndFairness` | `true` | Beta | 1.20 | |
| `APIResponseCompression` | `false` | Alpha | 1.7 | 1.15 |
@ -79,12 +77,12 @@ For a reference to old feature gates that are removed, please refer to
| `CRDValidationRatcheting` | `false` | Alpha | 1.28 | |
| `CSIMigrationPortworx` | `false` | Alpha | 1.23 | 1.24 |
| `CSIMigrationPortworx` | `false` | Beta | 1.25 | |
| `CSINodeExpandSecret` | `false` | Alpha | 1.25 | 1.26 |
| `CSINodeExpandSecret` | `true` | Beta | 1.27 | |
| `CSIVolumeHealth` | `false` | Alpha | 1.21 | |
| `CloudControllerManagerWebhook` | `false` | Alpha | 1.27 | |
| `CloudDualStackNodeIPs` | `false` | Alpha | 1.27 | |
| `ClusterTrustBundle` | `false` | Alpha | 1.27 | |
| `CloudDualStackNodeIPs` | `false` | Alpha | 1.27 | 1.28 |
| `CloudDualStackNodeIPs` | `true` | Beta | 1.29 | |
| `ClusterTrustBundle` | false | Alpha | 1.27 | |
| `ClusterTrustBundleProjection` | `false` | Alpha | 1.29 | |
| `ComponentSLIs` | `false` | Alpha | 1.26 | 1.26 |
| `ComponentSLIs` | `true` | Beta | 1.27 | |
| `ConsistentListFromCache` | `false` | Alpha | 1.28 | |
@ -93,11 +91,10 @@ For a reference to old feature gates that are removed, please refer to
| `CronJobsScheduledAnnotation` | `true` | Beta | 1.28 | |
| `CrossNamespaceVolumeDataSource` | `false` | Alpha| 1.26 | |
| `CustomCPUCFSQuotaPeriod` | `false` | Alpha | 1.12 | |
| `CustomResourceValidationExpressions` | `false` | Alpha | 1.23 | 1.24 |
| `CustomResourceValidationExpressions` | `true` | Beta | 1.25 | |
| `DevicePluginCDIDevices` | `false` | Alpha | 1.28 | |
| `DisableCloudProviders` | `false` | Alpha | 1.22 | |
| `DisableKubeletCloudCredentialProviders` | `false` | Alpha | 1.23 | |
| `DisableNodeKubeProxyVersion` | `false` | Alpha | 1.29 | |
| `DynamicResourceAllocation` | `false` | Alpha | 1.26 | |
| `ElasticIndexedJob` | `true` | Beta | 1.27 | |
| `EventedPLEG` | `false` | Alpha | 1.26 | 1.26 |
@ -118,15 +115,12 @@ For a reference to old feature gates that are removed, please refer to
| `InTreePluginOpenStackUnregister` | `false` | Alpha | 1.21 | |
| `InTreePluginPortworxUnregister` | `false` | Alpha | 1.23 | |
| `InTreePluginvSphereUnregister` | `false` | Alpha | 1.21 | |
| `JobBackoffLimitPerIndex` | `false` | Alpha | 1.28 | |
| `JobBackoffLimitPerIndex` | `false` | Alpha | 1.28 | 1.28 |
| `JobBackoffLimitPerIndex` | `true` | Beta | 1.29 | |
| `JobPodFailurePolicy` | `false` | Alpha | 1.25 | 1.25 |
| `JobPodFailurePolicy` | `true` | Beta | 1.26 | |
| `JobPodReplacementPolicy` | `false` | Alpha | 1.28 | |
| `JobReadyPods` | `false` | Alpha | 1.23 | 1.23 |
| `JobReadyPods` | `true` | Beta | 1.24 | |
| `KMSv2` | `false` | Alpha | 1.25 | 1.26 |
| `KMSv2` | `true` | Beta | 1.27 | |
| `KMSv2KDF` | `false` | Beta | 1.28 | |
| `JobPodReplacementPolicy` | `false` | Alpha | 1.28 | 1.28 |
| `JobPodReplacementPolicy` | `true` | Beta | 1.29 | |
| `KubeProxyDrainingTerminatingNodes` | `false` | Alpha | 1.28 | |
| `KubeletCgroupDriverFromCRI` | `false` | Alpha | 1.28 | |
| `KubeletInUserNamespace` | `false` | Alpha | 1.22 | |
@ -134,12 +128,15 @@ For a reference to old feature gates that are removed, please refer to
| `KubeletPodResourcesGet` | `false` | Alpha | 1.27 | |
| `KubeletTracing` | `false` | Alpha | 1.25 | 1.26 |
| `KubeletTracing` | `true` | Beta | 1.27 | |
| `LegacyServiceAccountTokenCleanUp` | `false` | Alpha | 1.28 | |
| `LegacyServiceAccountTokenCleanUp` | `false` | Alpha | 1.28 | 1.28 |
| `LegacyServiceAccountTokenCleanUp` | `true` | Beta | 1.29 | |
| `LoadBalancerIPMode` | `false` | Alpha | 1.29 | |
| `LocalStorageCapacityIsolationFSQuotaMonitoring` | `false` | Alpha | 1.15 | - |
| `LogarithmicScaleDown` | `false` | Alpha | 1.21 | 1.21 |
| `LogarithmicScaleDown` | `true` | Beta | 1.22 | |
| `LoggingAlphaOptions` | `false` | Alpha | 1.24 | - |
| `LoggingBetaOptions` | `true` | Beta | 1.24 | - |
| `MatchLabelKeysInPodAffinity` | `false` | Alpha | 1.29 | - |
| `MatchLabelKeysInPodTopologySpread` | `false` | Alpha | 1.25 | 1.26 |
| `MatchLabelKeysInPodTopologySpread` | `true` | Beta | 1.27 | - |
| `MaxUnavailableStatefulSet` | `false` | Alpha | 1.24 | |
@ -149,7 +146,6 @@ For a reference to old feature gates that are removed, please refer to
| `MinDomainsInPodTopologySpread` | `false` | Alpha | 1.24 | 1.24 |
| `MinDomainsInPodTopologySpread` | `false` | Beta | 1.25 | 1.26 |
| `MinDomainsInPodTopologySpread` | `true` | Beta | 1.27 | |
| `MultiCIDRRangeAllocator` | `false` | Alpha | 1.25 | |
| `MultiCIDRServiceAllocator` | `false` | Alpha | 1.27 | |
| `NewVolumeManagerReconstruction` | `false` | Beta | 1.27 | 1.27 |
| `NewVolumeManagerReconstruction` | `true` | Beta | 1.28 | |
@ -162,37 +158,44 @@ For a reference to old feature gates that are removed, please refer to
| `OpenAPIEnums` | `true` | Beta | 1.24 | |
| `PDBUnhealthyPodEvictionPolicy` | `false` | Alpha | 1.26 | 1.26 |
| `PDBUnhealthyPodEvictionPolicy` | `true` | Beta | 1.27 | |
| `PersistentVolumeLastPhaseTransistionTime` | `false` | Alpha | 1.28 | |
| `PersistentVolumeLastPhaseTransistionTime` | `false` | Alpha | 1.28 | 1.28 |
| `PersistentVolumeLastPhaseTransistionTime` | `true` | Beta | 1.29 | |
| `PodAndContainerStatsFromCRI` | `false` | Alpha | 1.23 | |
| `PodDeletionCost` | `false` | Alpha | 1.21 | 1.21 |
| `PodDeletionCost` | `true` | Beta | 1.22 | |
| `PodDisruptionConditions` | `false` | Alpha | 1.25 | 1.25 |
| `PodDisruptionConditions` | `true` | Beta | 1.26 | |
| `PodHostIPs` | `false` | Alpha | 1.28 | |
| `PodHostIPs` | `false` | Alpha | 1.28 | 1.28 |
| `PodHostIPs` | `true` | Beta | 1.29 | |
| `PodIndexLabel` | `true` | Beta | 1.28 | |
| `PodReadyToStartContainersCondition` | `false` | Alpha | 1.28 | |
| `PodLifecycleSleepAction` | `false` | Alpha | 1.29 | |
| `PodReadyToStartContainersCondition` | `false` | Alpha | 1.28 | 1.28 |
| `PodReadyToStartContainersCondition` | `true` | Beta | 1.29 | |
| `PodSchedulingReadiness` | `false` | Alpha | 1.26 | 1.26 |
| `PodSchedulingReadiness` | `true` | Beta | 1.27 | |
| `ProcMountType` | `false` | Alpha | 1.12 | |
| `QOSReserved` | `false` | Alpha | 1.11 | |
| `ReadWriteOncePod` | `false` | Alpha | 1.22 | 1.26 |
| `ReadWriteOncePod` | `true` | Beta | 1.27 | |
| `RecoverVolumeExpansionFailure` | `false` | Alpha | 1.23 | |
| `RemainingItemCount` | `false` | Alpha | 1.15 | 1.15 |
| `RemainingItemCount` | `true` | Beta | 1.16 | |
| `RotateKubeletServerCertificate` | `false` | Alpha | 1.7 | 1.11 |
| `RotateKubeletServerCertificate` | `true` | Beta | 1.12 | |
| `RuntimeClassInImageCriApi` | `false` | Alpha | 1.29 | |
| `SELinuxMountReadWriteOncePod` | `false` | Alpha | 1.25 | 1.26 |
| `SELinuxMountReadWriteOncePod` | `false` | Beta | 1.27 | 1.27 |
| `SELinuxMountReadWriteOncePod` | `true` | Beta | 1.28 | |
| `SchedulerQueueingHints` | `true` | Beta | 1.28 | |
| `SchedulerQueueingHints` | `true` | Beta | 1.28 | 1.28 |
| `SchedulerQueueingHints` | `false` | Beta | 1.29 | |
| `SecurityContextDeny` | `false` | Alpha | 1.27 | |
| `ServiceNodePortStaticSubrange` | `false` | Alpha | 1.27 | 1.27 |
| `ServiceNodePortStaticSubrange` | `true` | Beta | 1.28 | |
| `SidecarContainers` | `false` | Alpha | 1.28 | |
| `SeparateTaintEvictionController` | `true` | Beta | 1.29 | |
| `ServiceAccountTokenJTI` | `false` | Alpha | 1.29 | |
| `ServiceAccountTokenNodeBinding` | `false` | Alpha | 1.29 | |
| `ServiceAccountTokenNodeBindingValidation` | `false` | Alpha | 1.29 | |
| `ServiceAccountTokenPodNodeInfo` | `false` | Alpha | 1.29 | |
| `SidecarContainers` | `false` | Alpha | 1.28 | 1.28 |
| `SidecarContainers` | `true` | Beta | 1.29 | |
| `SizeMemoryBackedVolumes` | `false` | Alpha | 1.20 | 1.21 |
| `SizeMemoryBackedVolumes` | `true` | Beta | 1.22 | |
| `SkipReadOnlyValidationGCE` | `false` | Alpha | 1.28 | |
| `StableLoadBalancerNodeSet` | `true` | Beta | 1.27 | |
| `StatefulSetAutoDeletePVC` | `false` | Alpha | 1.23 | 1.26 |
| `StatefulSetAutoDeletePVC` | `false` | Beta | 1.27 | |
@ -209,12 +212,16 @@ For a reference to old feature gates that are removed, please refer to
| `TopologyManagerPolicyBetaOptions` | `true` | Beta | 1.28 | |
| `TopologyManagerPolicyOptions` | `false` | Alpha | 1.26 | 1.27 |
| `TopologyManagerPolicyOptions` | `true` | Beta | 1.28 | |
| `TranslateStreamCloseWebsocketRequests` | `false` | Alpha | 1.29 | |
| `UnauthenticatedHTTP2DOSMitigation` | `false` | Beta | 1.28 | |
| `UnauthenticatedHTTP2DOSMitigation` | `true` | Beta | 1.29 | |
| `UnknownVersionInteroperabilityProxy` | `false` | Alpha | 1.28 | |
| `UserNamespacesPodSecurityStandards` | `false` | Alpha | 1.29 | |
| `UserNamespacesSupport` | `false` | Alpha | 1.28 | |
| `ValidatingAdmissionPolicy` | `false` | Alpha | 1.26 | 1.27 |
| `ValidatingAdmissionPolicy` | `false` | Beta | 1.28 | |
| `VolumeCapacityPriority` | `false` | Alpha | 1.21 | |
| `VolumeAttributesClass` | `false` | Alpha | 1.29 | |
| `WatchList` | `false` | Alpha | 1.27 | |
| `WinDSR` | `false` | Alpha | 1.14 | |
| `WinOverlay` | `false` | Alpha | 1.14 | 1.19 |
@ -228,6 +235,9 @@ For a reference to old feature gates that are removed, please refer to
| Feature | Default | Stage | Since | Until |
|---------|---------|-------|-------|-------|
| `APIListChunking` | `false` | Alpha | 1.8 | 1.8 |
| `APIListChunking` | `true` | Beta | 1.9 | 1.28 |
| `APIListChunking` | `true` | GA | 1.29 | - |
| `APISelfSubjectReview` | `false` | Alpha | 1.26 | 1.26 |
| `APISelfSubjectReview` | `true` | Beta | 1.27 | 1.27 |
| `APISelfSubjectReview` | `true` | GA | 1.28 | - |
@ -244,18 +254,20 @@ For a reference to old feature gates that are removed, please refer to
| `CSIMigrationvSphere` | `false` | Beta | 1.19 | 1.24 |
| `CSIMigrationvSphere` | `true` | Beta | 1.25 | 1.25 |
| `CSIMigrationvSphere` | `true` | GA | 1.26 | - |
| `CSINodeExpandSecret` | `false` | Alpha | 1.25 | 1.26 |
| `CSINodeExpandSecret` | `true` | Beta | 1.27 | 1.28 |
| `CSINodeExpandSecret` | `true` | GA | 1.29 | |
| `ComponentSLIs` | `false` | Alpha | 1.26 | 1.26 |
| `ComponentSLIs` | `true` | Beta | 1.27 | 1.28|
| `ComponentSLIs` | `true` | GA | 1.29 | - |
| `ConsistentHTTPGetHandlers` | `true` | GA | 1.25 | - |
| `CronJobTimeZone` | `false` | Alpha | 1.24 | 1.24 |
| `CronJobTimeZone` | `true` | Beta | 1.25 | 1.26 |
| `CronJobTimeZone` | `true` | GA | 1.27 | - |
| `CustomResourceValidationExpressions` | `false` | Alpha | 1.23 | 1.24 |
| `CustomResourceValidationExpressions` | `true` | Beta | 1.25 | 1.28 |
| `CustomResourceValidationExpressions` | `true` | GA | 1.29 | - |
| `DaemonSetUpdateSurge` | `false` | Alpha | 1.21 | 1.21 |
| `DaemonSetUpdateSurge` | `true` | Beta | 1.22 | 1.24 |
| `DaemonSetUpdateSurge` | `true` | GA | 1.25 | |
| `DefaultHostNetworkHostPortsInPodTemplates` | `false` | Deprecated | 1.28 | |
| `DownwardAPIHugePages` | `false` | Alpha | 1.20 | 1.20 |
| `DownwardAPIHugePages` | `false` | Beta | 1.21 | 1.21 |
| `DownwardAPIHugePages` | `true` | Beta | 1.22 | 1.26 |
| `DownwardAPIHugePages` | `true` | GA | 1.27 | |
| `EfficientWatchResumption` | `false` | Alpha | 1.20 | 1.20 |
| `EfficientWatchResumption` | `true` | Beta | 1.21 | 1.23 |
| `EfficientWatchResumption` | `true` | GA | 1.24 | |
@ -265,29 +277,31 @@ For a reference to old feature gates that are removed, please refer to
| `ExpandedDNSConfig` | `true` | GA | 1.28 | |
| `ExperimentalHostUserNamespaceDefaulting` | `false` | Beta | 1.5 | 1.27 |
| `ExperimentalHostUserNamespaceDefaulting` | `false` | Deprecated | 1.28 | |
| `GRPCContainerProbe` | `false` | Alpha | 1.23 | 1.23 |
| `GRPCContainerProbe` | `true` | Beta | 1.24 | 1.26 |
| `GRPCContainerProbe` | `true` | GA | 1.27 | |
| `IPTablesOwnershipCleanup` | `false` | Alpha | 1.25 | 1.26 |
| `IPTablesOwnershipCleanup` | `true` | Beta | 1.27 | 1.27 |
| `IPTablesOwnershipCleanup` | `true` | GA | 1.28 | |
| `InTreePluginRBDUnregister` | `false` | Alpha | 1.23 | 1.27 |
| `InTreePluginRBDUnregister` | `false` | Deprecated | 1.28 | |
| `JobMutableNodeSchedulingDirectives` | `true` | Beta | 1.23 | 1.26 |
| `JobMutableNodeSchedulingDirectives` | `true` | GA | 1.27 | |
| `JobReadyPods` | `false` | Alpha | 1.23 | 1.23 |
| `JobReadyPods` | `true` | Beta | 1.24 | 1.28 |
| `JobReadyPods` | `true` | GA | 1.29 | |
| `JobTrackingWithFinalizers` | `false` | Alpha | 1.22 | 1.22 |
| `JobTrackingWithFinalizers` | `false` | Beta | 1.23 | 1.24 |
| `JobTrackingWithFinalizers` | `true` | Beta | 1.25 | 1.25 |
| `JobTrackingWithFinalizers` | `true` | GA | 1.26 | |
| `KMSv1` | `true` | Deprecated | 1.28 | |
| `KMSv1` | `true` | Deprecated | 1.28 | 1.28 |
| `KMSv1` | `false` | Deprecated | 1.29 | |
| `KMSv2` | `false` | Alpha | 1.25 | 1.26 |
| `KMSv2` | `true` | Beta | 1.27 | 1.28 |
| `KMSv2` | `true` | GA | 1.29 | |
| `KMSv2KDF` | `false` | Beta | 1.28 | 1.28 |
| `KMSv2KDF` | `true` | GA | 1.29 | |
| `KubeletPodResources` | `false` | Alpha | 1.13 | 1.14 |
| `KubeletPodResources` | `true` | Beta | 1.15 | 1.27 |
| `KubeletPodResources` | `true` | GA | 1.28 | |
| `KubeletPodResourcesGetAllocatable` | `false` | Alpha | 1.21 | 1.22 |
| `KubeletPodResourcesGetAllocatable` | `true` | Beta | 1.23 | 1.27 |
| `KubeletPodResourcesGetAllocatable` | `true` | GA | 1.28 | |
| `LegacyServiceAccountTokenNoAutoGeneration` | `true` | Beta | 1.24 | 1.25 |
| `LegacyServiceAccountTokenNoAutoGeneration` | `true` | GA | 1.26 | |
| `LegacyServiceAccountTokenTracking` | `false` | Alpha | 1.26 | 1.26 |
| `LegacyServiceAccountTokenTracking` | `true` | Beta | 1.27 | 1.27 |
| `LegacyServiceAccountTokenTracking` | `true` | GA | 1.28 | |
@ -307,12 +321,12 @@ For a reference to old feature gates that are removed, please refer to
| `ProxyTerminatingEndpoints` | `false` | Alpha | 1.22 | 1.25 |
| `ProxyTerminatingEndpoints` | `true` | Beta | 1.26 | 1.27 |
| `ProxyTerminatingEndpoints` | `true` | GA | 1.28 | |
| `ReadWriteOncePod` | `false` | Alpha | 1.22 | 1.26 |
| `ReadWriteOncePod` | `true` | Beta | 1.27 | 1.28 |
| `ReadWriteOncePod` | `true` | GA | 1.29 | |
| `RemoveSelfLink` | `false` | Alpha | 1.16 | 1.19 |
| `RemoveSelfLink` | `true` | Beta | 1.20 | 1.23 |
| `RemoveSelfLink` | `true` | GA | 1.24 | |
| `RetroactiveDefaultStorageClass` | `false` | Alpha | 1.25 | 1.25 |
| `RetroactiveDefaultStorageClass` | `true` | Beta | 1.26 | 1.27 |
| `RetroactiveDefaultStorageClass` | `true` | GA | 1.28 | |
| `SeccompDefault` | `false` | Alpha | 1.22 | 1.24 |
| `SeccompDefault` | `true` | Beta | 1.25 | 1.26 |
| `SeccompDefault` | `true` | GA | 1.27 | - |
@ -322,9 +336,17 @@ For a reference to old feature gates that are removed, please refer to
| `ServerSideFieldValidation` | `false` | Alpha | 1.23 | 1.24 |
| `ServerSideFieldValidation` | `true` | Beta | 1.25 | 1.26 |
| `ServerSideFieldValidation` | `true` | GA | 1.27 | - |
| `TopologyManager` | `false` | Alpha | 1.16 | 1.17 |
| `TopologyManager` | `true` | Beta | 1.18 | 1.26 |
| `TopologyManager` | `true` | GA | 1.27 | - |
| `ServiceIPStaticSubrange` | `false` | Alpha | 1.24 | 1.24 |
| `ServiceIPStaticSubrange` | `true` | Beta | 1.25 | 1.25 |
| `ServiceIPStaticSubrange` | `true` | GA | 1.26 | - |
| `ServiceInternalTrafficPolicy` | `false` | Alpha | 1.21 | 1.21 |
| `ServiceInternalTrafficPolicy` | `true` | Beta | 1.22 | 1.25 |
| `ServiceInternalTrafficPolicy` | `true` | GA | 1.26 | - |
| `ServiceNodePortStaticSubrange` | `false` | Alpha | 1.27 | 1.27 |
| `ServiceNodePortStaticSubrange` | `true` | Beta | 1.28 | 1.28 |
| `ServiceNodePortStaticSubrange` | `true` | GA | 1.29 | - |
| `SkipReadOnlyValidationGCE` | `false` | Alpha | 1.28 | 1.28 |
| `SkipReadOnlyValidationGCE` | `true` | Deprecated | 1.29 | |
| `WatchBookmark` | `false` | Alpha | 1.15 | 1.15 |
| `WatchBookmark` | `true` | Beta | 1.16 | 1.16 |
| `WatchBookmark` | `true` | GA | 1.17 | - |
@ -435,7 +457,8 @@ Each feature gate is designed for enabling/disabling a specific feature:
- `CloudDualStackNodeIPs`: Enables dual-stack `kubelet --node-ip` with external cloud providers.
See [Configure IPv4/IPv6 dual-stack](/docs/concepts/services-networking/dual-stack/#configure-ipv4-ipv6-dual-stack)
for more details.
- `ClusterTrustBundle`: Enable ClusterTrustBundle objects and kubelet integration.
- `ClusterTrustBundle`: Enable ClusterTrustBundle objects.
- `ClusterTrustBundleProjection`: [`clusterTrustBundle` projected volume sources](/docs/concepts/storage/projected-volumes#clustertrustbundle).
- `ComponentSLIs`: Enable the `/metrics/slis` endpoint on Kubernetes components like
kubelet, kube-scheduler, kube-proxy, kube-controller-manager, cloud-controller-manager
allowing you to scrape health check metrics.
@ -477,8 +500,7 @@ Each feature gate is designed for enabling/disabling a specific feature:
component flag.
- `DisableKubeletCloudCredentialProviders`: Disable the in-tree functionality in kubelet
to authenticate to a cloud provider container registry for image pull credentials.
- `DownwardAPIHugePages`: Enables usage of hugepages in
[downward API](/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information).
- `DisableNodeKubeProxyVersion`: Disable setting the `kubeProxyVersion` field of the Node.
- `DynamicResourceAllocation`: Enables support for resources with custom parameters and a lifecycle
that is independent of a Pod.
- `ElasticIndexedJob`: Enables Indexed Jobs to be scaled up or down by mutating both
@ -602,15 +624,19 @@ Each feature gate is designed for enabling/disabling a specific feature:
- `KubeletTracing`: Add support for distributed tracing in the kubelet.
When enabled, kubelet CRI interface and authenticated http servers are instrumented to generate
OpenTelemetry trace spans.
See [Traces for Kubernetes System Components](/docs/concepts/cluster-administration/system-traces/)
for more details.
See [Traces for Kubernetes System Components](/docs/concepts/cluster-administration/system-traces) for more details.
- `LegacyServiceAccountTokenNoAutoGeneration`: Stop auto-generation of Secret-based
[service account tokens](/docs/concepts/security/service-accounts/#get-a-token).
- `LegacyServiceAccountTokenCleanUp`: Enable cleaning up Secret-based
- `LegacyServiceAccountTokenCleanUp`: Enable invalidating auto-generated Secret-based
[service account tokens](/docs/concepts/security/service-accounts/#get-a-token)
when they are not used in a specified time (default to be one year).
when they have not been used in a specified time (defaults to one year). Clean up
the auto-generated Secret-based tokens if they have been invalidated for a specified time
(defaults to one year).
- `LegacyServiceAccountTokenTracking`: Track usage of Secret-based
[service account tokens](/docs/concepts/security/service-accounts/#get-a-token).
- `LoadBalancerIPMode`: Allows setting `ipMode` for Services where `type` is set to `LoadBalancer`.
See [Specifying IPMode of load balancer status](/docs/concepts/services-networking/service/#load-balancer-ip-mode)
for more information.
- `LocalStorageCapacityIsolationFSQuotaMonitoring`: When `LocalStorageCapacityIsolation`
is enabled for
[local ephemeral storage](/docs/concepts/configuration/manage-resources-containers/)
@ -622,6 +648,8 @@ Each feature gate is designed for enabling/disabling a specific feature:
based on logarithmic bucketing of pod timestamps.
- `LoggingAlphaOptions`: Allow fine-tuing of experimental, alpha-quality logging options.
- `LoggingBetaOptions`: Allow fine-tuing of experimental, beta-quality logging options.
- `MatchLabelKeysInPodAffinity`: Enable the `matchLabelKeys` and `mismatchLabelKeys` field for
[pod (anti)affinity](/docs/concepts/scheduling-eviction/assign-pod-node/).
- `MatchLabelKeysInPodTopologySpread`: Enable the `matchLabelKeys` field for
[Pod topology spread constraints](/docs/concepts/scheduling-eviction/topology-spread-constraints/).
- `MaxUnavailableStatefulSet`: Enables setting the `maxUnavailable` field for the
@ -636,8 +664,8 @@ Each feature gate is designed for enabling/disabling a specific feature:
[Pod topology spread constraints](/docs/concepts/scheduling-eviction/topology-spread-constraints/).
- `MinimizeIPTablesRestore`: Enables new performance improvement logics
in the kube-proxy iptables mode.
- `MultiCIDRRangeAllocator`: Enables the MultiCIDR range allocator.
- `MultiCIDRServiceAllocator`: Track IP address allocations for Service cluster IPs using IPAddress objects.
- `MultiCIDRServiceAllocator`: Allow to dynamically configure the cluster Service IP ranges using
ServiceCIDR objects and track IP address allocations for Service cluster IPs using IPAddress objects.
- `NewVolumeManagerReconstruction`: Enables improved discovery of mounted volumes during kubelet
startup. Since this code has been significantly refactored, we allow to opt-out in case kubelet
gets stuck at the startup or is not unmounting volumes from terminated Pods. Note that this
@ -678,10 +706,8 @@ Each feature gate is designed for enabling/disabling a specific feature:
the pod is being deleted due to a disruption.
- `PodHostIPs`: Enable the `status.hostIPs` field for pods and the {{< glossary_tooltip term_id="downward-api" text="downward API" >}}.
The field lets you expose host IP addresses to workloads.
- `PodIndexLabel`: Enables the Job controller and StatefulSet controller to add the pod index as a label
when creating new pods. See [Job completion mode docs](/docs/concepts/workloads/controllers/job/#completion-mode)
and [StatefulSet pod index label docs](/docs/concepts/workloads/controllers/statefulset/#pod-index-label)
for more details.
- `PodIndexLabel`: Enables the Job controller and StatefulSet controller to add the pod index as a label when creating new pods. See [Job completion mode docs](/docs/concepts/workloads/controllers/job#completion-mode) and [StatefulSet pod index label docs](/docs/concepts/workloads/controllers/statefulset/#pod-index-label) for more details.
- `PodLifecycleSleepAction`: Enables the `sleep` action in Container lifecycle hooks.
- `PodReadyToStartContainersCondition`: Enable the kubelet to mark the [PodReadyToStartContainers](/docs/concepts/workloads/pods/pod-lifecycle/#pod-has-network)
condition on pods. This was previously (1.25-1.27) known as `PodHasNetworkCondition`.
- `PodSchedulingReadiness`: Enable setting `schedulingGates` field to control a Pod's
@ -710,10 +736,11 @@ Each feature gate is designed for enabling/disabling a specific feature:
objects and collections. This field has been deprecated since the Kubernetes v1.16
release. When this feature is enabled, the `.metadata.selfLink` field remains part of
the Kubernetes API, but is always unset.
- `RetroactiveDefaultStorageClass`: Allow assigning StorageClass to unbound PVCs retroactively.
- `RotateKubeletServerCertificate`: Enable the rotation of the server TLS certificate on the kubelet.
See [kubelet configuration](/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/#kubelet-configuration)
for more details.
- `RuntimeClassInImageCriApi` : Enables images to be pulled based on the [runtime class]
(/docs/concepts/containers/runtime-class/) of the pods that reference them.
- `SELinuxMountReadWriteOncePod`: Speeds up container startup by allowing kubelet to mount volumes
for a Pod directly with the correct SELinux label instead of changing each file on the volumes
recursively. The initial implementation focused on ReadWriteOncePod volumes.
@ -726,11 +753,22 @@ Each feature gate is designed for enabling/disabling a specific feature:
for all workloads.
The seccomp profile is specified in the `securityContext` of a Pod and/or a Container.
- `SecurityContextDeny`: This gate signals that the `SecurityContextDeny` admission controller is deprecated.
- `SeparateTaintEvictionController`: Enables running `TaintEvictionController`,
that performs [Taint-based Evictions](/docs/concepts/scheduling-eviction/taint-and-toleration/#taint-based-evictions),
in a controller separated from `NodeLifecycleController`. When this feature is
enabled, users can optionally disable Taint-based Eviction setting the
`--controllers=-taint-eviction-controller` flag on the `kube-controller-manager`.
- `ServerSideApply`: Enables the [Sever Side Apply (SSA)](/docs/reference/using-api/server-side-apply/)
feature on the API Server.
- `ServerSideFieldValidation`: Enables server-side field validation. This means the validation
of resource schema is performed at the API server side rather than the client side
(for example, the `kubectl create` or `kubectl apply` command line).
- `ServiceAccountTokenJTI`: Controls whether JTIs (UUIDs) are embedded into generated service account tokens,
and whether these JTIs are recorded into the Kubernetes audit log for future requests made by these tokens.
- `ServiceAccountTokenNodeBinding`: Controls whether the apiserver allows binding service account tokens to Node objects.
- `ServiceAccountTokenNodeBindingValidation`: Controls whether the apiserver will validate a Node reference in service account tokens.
- `ServiceAccountTokenPodNodeInfo`: Controls whether the apiserver embeds the node name and uid
for the associated node when issuing service account tokens bound to Pod objects.
- `ServiceNodePortStaticSubrange`: Enables the use of different port allocation
strategies for NodePort Services. For more details, see
[reserve NodePort ranges to avoid collisions](/docs/concepts/services-networking/service/#avoid-nodeport-collisions).
@ -767,18 +805,28 @@ Each feature gate is designed for enabling/disabling a specific feature:
This feature gate guards *a group* of topology manager options whose quality level is beta.
This feature gate will never graduate to stable.
- `TopologyManagerPolicyOptions`: Allow fine-tuning of topology manager policies,
- `TranslateStreamCloseWebsocketRequests`: Allow WebSocket streaming of the
remote command sub-protocol (`exec`, `cp`, `attach`) from clients requesting
version 5 (v5) of the sub-protocol.
- `UnauthenticatedHTTP2DOSMitigation`: Enables HTTP/2 Denial of Service (DoS)
mitigations for unauthenticated clients.
Kubernetes v1.28.0 through v1.28.2 do not include this feature gate.
mitigations for unauthenticated clients.
Kubernetes v1.28.0 through v1.28.2 do not include this feature gate.
- `UnknownVersionInteroperabilityProxy`: Proxy resource requests to the correct peer kube-apiserver when
multiple kube-apiservers exist at varied versions.
See [Mixed version proxy](/docs/concepts/architecture/mixed-version-proxy/) for more information.
- `UserNamespacesPodSecurityStandards`: Enable Pod Security Standards policies relaxation for pods
that run with namespaces. You must set the value of this feature gate consistently across all nodes in
your cluster, and you must also enable `UserNamespacesSupport` to use this feature.
See [User Namespaces](/docs/concepts/workloads/pods/user-namespaces/#integration-with-pod-security-admission-checks) for more details.
- `UserNamespacesSupport`: Enable user namespace support for Pods.
Before Kubernetes v1.28, this feature gate was named `UserNamespacesStatelessPodsSupport`.
- `ValidatingAdmissionPolicy`: Enable [ValidatingAdmissionPolicy](/docs/reference/access-authn-authz/validating-admission-policy/)
support for CEL validations be used in Admission Control.
- `VolumeCapacityPriority`: Enable support for prioritizing nodes in different
topologies based on available PV capacity.
- `VolumeAttributesClass`: Enable support for VolumeAttributesClasses.
See [Volume Attributes Classes](/docs/concepts/storage/volume-attributes-classes/)
for more information.
- `WatchBookmark`: Enable support for watch bookmark events.
- `WatchList` : Enable support for [streaming initial state of objects in watch requests](/docs/reference/using-api/api-concepts/#streaming-lists).
- `WinDSR`: Allows kube-proxy to create DSR loadbalancers for Windows.

File diff suppressed because it is too large Load Diff

View File

@ -9,7 +9,7 @@ weight: 20
<!-- overview -->
{{< feature-state for_k8s_version="v1.27" state="beta" >}}
{{< feature-state for_k8s_version="v1.29" state="stable" >}}
By default, Kubernetes {{< skew currentVersion >}} publishes Service Level Indicator (SLI) metrics
for each Kubernetes component binary. This metric endpoint is exposed on the serving

View File

@ -370,10 +370,10 @@ kubectl [flags]
</tr>
<tr>
<td colspan="2">KUBECTL_INTERACTIVE_DELETE</td>
<td colspan="2">KUBECTL_REMOTE_COMMAND_WEBSOCKETS</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">When set to true, the --interactive flag in the kubectl delete command will be activated, allowing users to preview and confirm resources before proceeding to delete by passing this flag.
<td></td><td style="line-height: 130%; word-wrap: break-word;">When set to true, the kubectl exec, cp, and attach commands will attempt to stream using the websockets protocol. If the upgrade to websockets fails, the commands will fallback to use the current SPDY protocol.
</td>
</tr>

View File

@ -1043,6 +1043,23 @@ last saw a request where the client authenticated using the service account toke
If a legacy token was last used before the cluster gained the feature (added in Kubernetes v1.26),
then the label isn't set.
### kubernetes.io/legacy-token-invalid-since
Type: Label
Example: `kubernetes.io/legacy-token-invalid-since: 2023-10-27`
Used on: Secret
The control plane automatically adds this label to auto-generated Secrets that
have the type `kubernetes.io/service-account-token`, provided that you have the
`LegacyServiceAccountTokenCleanUp` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
enabled. Kubernetes {{< skew currentVersion >}} enables that behavior by default.
This label marks the Secret-based token as invalid for authentication. The value
of this label records the date (ISO 8601 format, UTC time zone) when the control
plane detects that the auto-generated Secret has not been used for a specified
duration (defaults to one year).
### endpointslice.kubernetes.io/managed-by {#endpointslicekubernetesiomanaged-by}
Type: Label

View File

@ -14,6 +14,18 @@ The `kube-proxy` component is responsible for implementing a _virtual IP_
mechanism for {{< glossary_tooltip term_id="service" text="Services">}}
of `type` other than
[`ExternalName`](/docs/concepts/services-networking/service/#externalname).
Each instance of kube-proxy watches the Kubernetes {{< glossary_tooltip
term_id="control-plane" text="control plane" >}} for the addition and
removal of Service and EndpointSlice {{< glossary_tooltip
term_id="object" text="objects" >}}. For each Service, kube-proxy
calls appropriate APIs (depending on the kube-proxy mode) to configure
the node to capture traffic to the Service's `clusterIP` and `port`,
and redirect that traffic to one of the Service's endpoints
(usually a Pod, but possibly an arbitrary user-provided IP address). A control
loop ensures that the rules on each node are reliably synchronized with
the Service and EndpointSlice state as indicated by the API server.
{{< figure src="/images/docs/services-iptables-overview.svg" title="Virtual IP mechanism for Services, using iptables mode" class="diagram-medium" >}}
A question that pops up every now and then is why Kubernetes relies on
proxying to forward inbound traffic to backends. What about other
@ -57,11 +69,14 @@ The kube-proxy starts up in different modes, which are determined by its configu
On Linux nodes, the available modes for kube-proxy are:
[`iptables`](#proxy-mode-iptables)
: A mode where the kube-proxy configures packet forwarding rules using iptables, on Linux.
: A mode where the kube-proxy configures packet forwarding rules using iptables.
[`ipvs`](#proxy-mode-ipvs)
: a mode where the kube-proxy configures packet forwarding rules using ipvs.
[`nftables`](#proxy-mode-nftables)
: a mode where the kube-proxy configures packet forwarding rules using nftables.
There is only one mode available for kube-proxy on Windows:
[`kernelspace`](#proxy-mode-kernelspace)
@ -71,32 +86,10 @@ There is only one mode available for kube-proxy on Windows:
_This proxy mode is only available on Linux nodes._
In this mode, kube-proxy watches the Kubernetes
{{< glossary_tooltip term_id="control-plane" text="control plane" >}} for the addition and
removal of Service and EndpointSlice {{< glossary_tooltip term_id="object" text="objects." >}}
For each Service, it installs
iptables rules, which capture traffic to the Service's `clusterIP` and `port`,
and redirect that traffic to one of the Service's
backend sets. For each endpoint, it installs iptables rules which
select a backend Pod.
By default, kube-proxy in iptables mode chooses a backend at random.
Using iptables to handle traffic has a lower system overhead, because traffic
is handled by Linux netfilter without the need to switch between userspace and the
kernel space. This approach is also likely to be more reliable.
If kube-proxy is running in iptables mode and the first Pod that's selected
does not respond, the connection fails. This is different from the old `userspace`
mode: in that scenario, kube-proxy would detect that the connection to the first
Pod had failed and would automatically retry with a different backend Pod.
You can use Pod [readiness probes](/docs/concepts/workloads/pods/pod-lifecycle/#container-probes)
to verify that backend Pods are working OK, so that kube-proxy in iptables mode
only sees backends that test out as healthy. Doing this means you avoid
having traffic sent via kube-proxy to a Pod that's known to have failed.
{{< figure src="/images/docs/services-iptables-overview.svg" title="Virtual IP mechanism for Services, using iptables mode" class="diagram-medium" >}}
In this mode, kube-proxy configures packet forwarding rules using the
iptables API of the kernel netfilter subsystem. For each endpoint, it
installs iptables rules which, by default, select a backend Pod at
random.
#### Example {#packet-processing-iptables}
@ -122,8 +115,10 @@ through a load-balancer, though in those cases the client IP address does get al
#### Optimizing iptables mode performance
In large clusters (with tens of thousands of Pods and Services), the
iptables mode of kube-proxy may take a long time to update the rules
In iptables mode, kube-proxy creates a few iptables rules for every
Service, and a few iptables rules for each endpoint IP address. In
clusters with tens of thousands of Pods and Services, this means tens
of thousands of iptables rules, and kube-proxy may take a long time to update the rules
in the kernel when Services (or their EndpointSlices) change. You can adjust the syncing
behavior of kube-proxy via options in the [`iptables` section](/docs/reference/config-api/kube-proxy-config.v1alpha1/#kubeproxy-config-k8s-io-v1alpha1-KubeProxyIPTablesConfiguration)
of the
@ -204,18 +199,15 @@ and is likely to hurt functionality more than it improves performance.
_This proxy mode is only available on Linux nodes._
In `ipvs` mode, kube-proxy watches Kubernetes Services and EndpointSlices,
calls `netlink` interface to create IPVS rules accordingly and synchronizes
IPVS rules with Kubernetes Services and EndpointSlices periodically.
This control loop ensures that IPVS status matches the desired state.
When accessing a Service, IPVS directs traffic to one of the backend Pods.
In `ipvs` mode, kube-proxy uses the kernel IPVS and iptables APIs to
create rules to redirect traffic from Service IPs to endpoint IPs.
The IPVS proxy mode is based on netfilter hook function that is similar to
iptables mode, but uses a hash table as the underlying data structure and works
in the kernel space.
That means kube-proxy in IPVS mode redirects traffic with lower latency than
kube-proxy in iptables mode, with much better performance when synchronizing
proxy rules. Compared to the other proxy modes, IPVS mode also supports a
proxy rules. Compared to the iptables proxy mode, IPVS mode also supports a
higher throughput of network traffic.
IPVS provides more options for balancing traffic to backend Pods;
@ -263,11 +255,28 @@ the node before starting kube-proxy.
When kube-proxy starts in IPVS proxy mode, it verifies whether IPVS
kernel modules are available. If the IPVS kernel modules are not detected, then kube-proxy
falls back to running in iptables proxy mode.
exits with an error.
{{< /note >}}
{{< figure src="/images/docs/services-ipvs-overview.svg" title="Virtual IP address mechanism for Services, using IPVS mode" class="diagram-medium" >}}
### `nftables` proxy mode {#proxy-mode-nftables}
{{< feature-state for_k8s_version="v1.29" state="alpha" >}}
_This proxy mode is only available on Linux nodes._
In this mode, kube-proxy configures packet forwarding rules using the
nftables API of the kernel netfilter subsystem. For each endpoint, it
installs nftables rules which, by default, select a backend Pod at
random.
The nftables API is the successor to the iptables API, and although it
is designed to provide better performance and scalability than
iptables, the kube-proxy nftables mode is still under heavy
development as of {{< skew currentVersion >}} and is not necessarily
expected to outperform the other Linux modes at this time.
### `kernelspace` proxy mode {#proxy-mode-kernelspace}
_This proxy mode is only available on Windows nodes._
@ -344,9 +353,9 @@ ensure that no two Services can collide. Kubernetes does that by allocating each
Service its own IP address from within the `service-cluster-ip-range`
CIDR range that is configured for the {{< glossary_tooltip term_id="kube-apiserver" text="API Server" >}}.
#### IP address allocation tracking
### IP address allocation tracking
To ensure each Service receives a unique IP, an internal allocator atomically
To ensure each Service receives a unique IP address, an internal allocator atomically
updates a global allocation map in {{< glossary_tooltip term_id="etcd" >}}
prior to creating each Service. The map object must exist in the registry for
Services to get IP address assignments, otherwise creations will
@ -355,28 +364,37 @@ fail with a message indicating an IP address could not be allocated.
In the control plane, a background controller is responsible for creating that
map (needed to support migrating from older versions of Kubernetes that used
in-memory locking). Kubernetes also uses controllers to check for invalid
assignments (e.g. due to administrator intervention) and for cleaning up allocated
assignments (for example: due to administrator intervention) and for cleaning up allocated
IP addresses that are no longer used by any Services.
#### IP address allocation tracking using the Kubernetes API {#ip-address-objects}
{{< feature-state for_k8s_version="v1.27" state="alpha" >}}
If you enable the `MultiCIDRServiceAllocator`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) and the
[`networking.k8s.io/v1alpha1` API group](/docs/tasks/administer-cluster/enable-disable-api/),
the control plane replaces the existing etcd allocator with a new one, using IPAddress
objects instead of an internal global allocation map. The ClusterIP address
associated to each Service will have a referenced IPAddress object.
the control plane replaces the existing etcd allocator with a revised implementation
that uses IPAddress and ServiceCIDR objects instead of an internal global allocation map.
Each cluster IP address associated to a Service then references an IPAddress object.
The background controller is also replaced by a new one to handle the new IPAddress
objects and the migration from the old allocator model.
Enabling the feature gate also replaces a background controller with an alternative
that handles the IPAddress objects and supports migration from the old allocator model.
Kubernetes {{< skew currentVersion >}} does not support migrating from IPAddress
objects to the internal allocation map.
One of the main benefits of the new allocator is that it removes the size limitations
for the `service-cluster-ip-range`, there is no limitations for IPv4 and for IPv6
users can use masks equal or larger than /64 (previously it was /108).
One of the main benefits of the revised allocator is that it removes the size limitations
for the IP address range that can be used for the cluster IP address of Services.
With `MultiCIDRServiceAllocator` enabled, there are no limitations for IPv4, and for IPv6
you can use IP address netmasks that are a /64 or smaller (as opposed to /108 with the
legacy implementation).
Users now will be able to inspect the IP addresses assigned to their Services, and
Kubernetes extensions such as the [Gateway](https://gateway-api.sigs.k8s.io/) API, can use this new
IPAddress object kind to enhance the Kubernetes networking capabilities, going beyond the limitations of
the built-in Service API.
Making IP address allocations available via the API means that you as a cluster administrator
can allow users to inspect the IP addresses assigned to their Services.
Kubernetes extensions, such as the [Gateway API](/docs/concepts/services-networking/gateway/),
can use the IPAddress API to extend Kubernetes' inherent networking capabilities.
Here is a brief example of a user querying for IP addresses:
```shell
kubectl get services
@ -394,7 +412,45 @@ NAME PARENTREF
2001:db8:1:2::a services/kube-system/kube-dns
```
#### IP address ranges for Service virtual IP addresses {#service-ip-static-sub-range}
Kubernetes also allow users to dynamically define the available IP ranges for Services using
ServiceCIDR objects. During bootstrap, a default ServiceCIDR object named `kubernetes` is created
from the value of the `--service-cluster-ip-range` command line argument to kube-apiserver:
```shell
kubectl get servicecidrs
```
```
NAME CIDRS AGE
kubernetes 10.96.0.0/28 17m
```
Users can create or delete new ServiceCIDR objects to manage the available IP ranges for Services:
```shell
cat <<'EOF' | kubectl apply -f -
apiVersion: networking.k8s.io/v1alpha1
kind: ServiceCIDR
metadata:
name: newservicecidr
spec:
cidrs:
- 10.96.0.0/24
EOF
```
```
servicecidr.networking.k8s.io/newcidr1 created
```
```shell
kubectl get servicecidrs
```
```
NAME CIDRS AGE
kubernetes 10.96.0.0/28 17m
newservicecidr 10.96.0.0/24 7m
```
### IP address ranges for Service virtual IP addresses {#service-ip-static-sub-range}
{{< feature-state for_k8s_version="v1.26" state="stable" >}}

View File

@ -0,0 +1,92 @@
<!--
The file is auto-generated from the Go source code of the component using a generic
[generator](https://github.com/kubernetes-sigs/reference-docs/). To learn how
to generate the reference documentation, please read
[Contributing to the reference documentation](/docs/contribute/generate-ref-docs/).
To update the reference content, please follow the
[Contributing upstream](/docs/contribute/generate-ref-docs/contribute-upstream/)
guide. You can file document formatting bugs against the
[reference-docs](https://github.com/kubernetes-sigs/reference-docs/) project.
-->
Renew the certificate embedded in the kubeconfig file for the super-admin
### Synopsis
Renew the certificate embedded in the kubeconfig file for the super-admin.
Renewals run unconditionally, regardless of certificate expiration date; extra attributes such as SANs will be based on the existing file/certificates, there is no need to resupply them.
Renewal by default tries to use the certificate authority in the local PKI managed by kubeadm; as alternative it is possible to use K8s certificate API for certificate renewal, or as a last option, to generate a CSR request.
After renewal, in order to make changes effective, is required to restart control-plane components and eventually re-distribute the renewed certificate in case the file is used elsewhere.
```
kubeadm certs renew super-admin.conf [flags]
```
### Options
<table style="width: 100%; table-layout: fixed;">
<colgroup>
<col span="1" style="width: 10px;" />
<col span="1" />
</colgroup>
<tbody>
<tr>
<td colspan="2">--cert-dir string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: "/etc/kubernetes/pki"</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"><p>The path where to save the certificates</p></td>
</tr>
<tr>
<td colspan="2">--config string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"><p>Path to a kubeadm configuration file.</p></td>
</tr>
<tr>
<td colspan="2">-h, --help</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"><p>help for admin.conf</p></td>
</tr>
<tr>
<td colspan="2">--kubeconfig string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: "/etc/kubernetes/admin.conf"</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"><p>The kubeconfig file to use when talking to the cluster. If the flag is not set, a set of standard locations can be searched for an existing kubeconfig file.</p></td>
</tr>
</tbody>
</table>
### Options inherited from parent commands
<table style="width: 100%; table-layout: fixed;">
<colgroup>
<col span="1" style="width: 10px;" />
<col span="1" />
</colgroup>
<tbody>
<tr>
<td colspan="2">--rootfs string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"><p>[EXPERIMENTAL] The path to the 'real' host root filesystem.</p></td>
</tr>
</tbody>
</table>

View File

@ -0,0 +1,121 @@
<!--
The file is auto-generated from the Go source code of the component using a generic
[generator](https://github.com/kubernetes-sigs/reference-docs/). To learn how
to generate the reference documentation, please read
[Contributing to the reference documentation](/docs/contribute/generate-ref-docs/).
To update the reference content, please follow the
[Contributing upstream](/docs/contribute/generate-ref-docs/contribute-upstream/)
guide. You can file document formatting bugs against the
[reference-docs](https://github.com/kubernetes-sigs/reference-docs/) project.
-->
Generate a kubeconfig file for the super-admin
### Synopsis
Generate a kubeconfig file for the super-admin.
```
kubeadm init phase kubeconfig super-admin [flags]
```
### Options
<table style="width: 100%; table-layout: fixed;">
<colgroup>
<col span="1" style="width: 10px;" />
<col span="1" />
</colgroup>
<tbody>
<tr>
<td colspan="2">--apiserver-advertise-address string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"><p>The IP address the API Server will advertise it's listening on. If not set the default network interface will be used.</p></td>
</tr>
<tr>
<td colspan="2">--apiserver-bind-port int32&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: 6443</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"><p>Port for the API Server to bind to.</p></td>
</tr>
<tr>
<td colspan="2">--cert-dir string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: "/etc/kubernetes/pki"</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"><p>The path where to save and store the certificates.</p></td>
</tr>
<tr>
<td colspan="2">--config string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"><p>Path to a kubeadm configuration file.</p></td>
</tr>
<tr>
<td colspan="2">--control-plane-endpoint string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"><p>Specify a stable IP address or DNS name for the control plane.</p></td>
</tr>
<tr>
<td colspan="2">--dry-run</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"><p>Don't apply any changes; just output what would be done.</p></td>
</tr>
<tr>
<td colspan="2">-h, --help</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"><p>help for admin</p></td>
</tr>
<tr>
<td colspan="2">--kubeconfig-dir string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: "/etc/kubernetes"</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"><p>The path where to save the kubeconfig file.</p></td>
</tr>
<tr>
<td colspan="2">--kubernetes-version string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: "stable-1"</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"><p>Choose a specific Kubernetes version for the control plane.</p></td>
</tr>
</tbody>
</table>
### Options inherited from parent commands
<table style="width: 100%; table-layout: fixed;">
<colgroup>
<col span="1" style="width: 10px;" />
<col span="1" />
</colgroup>
<tbody>
<tr>
<td colspan="2">--rootfs string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"><p>[EXPERIMENTAL] The path to the 'real' host root filesystem.</p></td>
</tr>
</tbody>
</table>

View File

@ -64,6 +64,7 @@ in a majority of cases, and the most intuitive location; other constants paths a
- `controller-manager.conf`
- `scheduler.conf`
- `admin.conf` for the cluster admin and kubeadm itself
- `super-admin.conf` for the cluster super-admin that can bypass RBAC
- Names of certificates and key files :
@ -209,12 +210,21 @@ Kubeadm generates kubeconfig files with identities for control plane components:
This client cert should have the CN `system:kube-scheduler`, as defined by default
[RBAC core components roles](/docs/reference/access-authn-authz/rbac/#core-component-roles)
Additionally, a kubeconfig file for kubeadm itself and the admin is generated and saved into the
`/etc/kubernetes/admin.conf` file. The "admin" here is defined as the actual person(s) that is
administering the cluster and wants to have full control (**root**) over the cluster. The
embedded client certificate for admin should be in the `system:masters` organization, as defined
by default [RBAC user facing role bindings](/docs/reference/access-authn-authz/rbac/#user-facing-roles).
It should also include a CN. Kubeadm uses the `kubernetes-admin` CN.
Additionally, a kubeconfig file for kubeadm as an administrative entity is generated and stored
in `/etc/kubernetes/admin.conf`. This file includes a certificate with
`Subject: O = kubeadm:cluster-admins, CN = kubernetes-admin`. `kubeadm:cluster-admins`
is a group managed by kubeadm. It is bound to the `cluster-admin` ClusterRole during `kubeadm init`,
by using the `super-admin.conf` file, which does not require RBAC.
This `admin.conf` file must remain on control plane nodes and not be shared with additional users.
During `kubeadm init` another kubeconfig file is generated and stored in `/etc/kubernetes/super-admin.conf`.
This file includes a certificate with `Subject: O = system:masters, CN = kubernetes-super-admin`.
`system:masters` is a super user group that bypasses RBAC and makes `super-admin.conf` useful in case
of an emergency where a cluster is locked due to RBAC misconfiguration.
The `super-admin.conf` file can be stored in a safe location and not shared with additional users.
See [RBAC user facing role bindings](/docs/reference/access-authn-authz/rbac/#user-facing-roles)
for additional information RBAC and built-in ClusterRoles and groups.
Please note that:

View File

@ -34,6 +34,7 @@ For more details see [Manual certificate renewal](/docs/tasks/administer-cluster
{{< tab name="etcd-server" include="generated/kubeadm_certs_renew_etcd-server.md" />}}
{{< tab name="front-proxy-client" include="generated/kubeadm_certs_renew_front-proxy-client.md" />}}
{{< tab name="scheduler.conf" include="generated/kubeadm_certs_renew_scheduler.conf.md" />}}
{{< tab name="super-admin.conf" include="generated/kubeadm_certs_renew_super-admin.conf.md" />}}
{{< /tabs >}}
## kubeadm certs certificate-key {#cmd-certs-certificate-key}

View File

@ -58,6 +58,7 @@ You can create all required kubeconfig files by calling the `all` subcommand or
{{< tab name="kubelet" include="generated/kubeadm_init_phase_kubeconfig_kubelet.md" />}}
{{< tab name="controller-manager" include="generated/kubeadm_init_phase_kubeconfig_controller-manager.md" />}}
{{< tab name="scheduler" include="generated/kubeadm_init_phase_kubeconfig_scheduler.md" />}}
{{< tab name="super-admin" include="generated/kubeadm_init_phase_kubeconfig_super-admin.md" />}}
{{< /tabs >}}
## kubeadm init phase control-plane {#cmd-phase-control-plane}

View File

@ -32,8 +32,9 @@ following steps:
arguments, lowercased if necessary.
1. Writes kubeconfig files in `/etc/kubernetes/` for the kubelet, the controller-manager and the
scheduler to use to connect to the API server, each with its own identity, as well as an
additional kubeconfig file for administration named `admin.conf`.
scheduler to use to connect to the API server, each with its own identity. Also
additional kubeconfig files are written, for kubeadm as administrative entity (`admin.conf`)
and for a super admin user that can bypass RBAC (`super-admin.conf`).
1. Generates static Pod manifests for the API server,
controller-manager and scheduler. In case an external etcd is not provided,
@ -157,9 +158,9 @@ List of feature gates:
{{< table caption="kubeadm feature gates" >}}
Feature | Default | Alpha | Beta | GA
:-------|:--------|:------|:-----|:----
`EtcdLearnerMode` | `true` | 1.27 | 1.29 | -
`PublicKeysECDSA` | `false` | 1.19 | - | -
`RootlessControlPlane` | `false` | 1.22 | - | -
`EtcdLearnerMode` | `false` | 1.27 | - | -
{{< /table >}}
{{< note >}}
@ -168,6 +169,10 @@ Once a feature gate goes GA its value becomes locked to `true` by default.
Feature gate descriptions:
`EtcdLearnerMode`
: With this feature gate enabled, when joining a new control plane node, a new etcd member will be created
as a learner and promoted to a voting member only after the etcd data are fully aligned.
`PublicKeysECDSA`
: Can be used to create a cluster that uses ECDSA certificates instead of the default RSA algorithm.
Renewal of existing ECDSA certificates is also supported using `kubeadm certs renew`, but you cannot
@ -179,14 +184,10 @@ for `kube-apiserver`, `kube-controller-manager`, `kube-scheduler` and `etcd` to
If the flag is not set, those components run as root. You can change the value of this feature gate before
you upgrade to a newer version of Kubernetes.
`EtcdLearnerMode`
: With this feature gate enabled, when joining a new control plane node, a new etcd member will be created
as a learner and promoted to a voting member only after the etcd data are fully aligned.
List of deprecated feature gates:
{{< table caption="kubeadm deprecated feature gates" >}}
Feature | Default
Feature | Default
:-------|:--------
`UpgradeAddonsBeforeControlPlane` | `false`
{{< /table >}}
@ -212,12 +213,16 @@ List of removed feature gates:
{{< table caption="kubeadm removed feature gates" >}}
Feature | Alpha | Beta | GA | Removed
:-------|:------|:-----|:---|:-------
`UnversionedKubeletConfigMap` | 1.22 | 1.23 | 1.25 | 1.26
`IPv6DualStack` | 1.16 | 1.21 | 1.23 | 1.24
`UnversionedKubeletConfigMap` | 1.22 | 1.23 | 1.25 | 1.26
{{< /table >}}
Feature gate descriptions:
`IPv6DualStack`
: This flag helps to configure components dual stack when the feature is in progress. For more details on Kubernetes
dual-stack support see [Dual-stack support with kubeadm](/docs/setup/production-environment/tools/kubeadm/dual-stack-support/).
`UnversionedKubeletConfigMap`
: This flag controls the name of the {{< glossary_tooltip text="ConfigMap" term_id="configmap" >}} where kubeadm stores
kubelet configuration data. With this flag not specified or set to `true`, the ConfigMap is named `kubelet-config`.
@ -228,10 +233,6 @@ or `kubeadm upgrade apply`), kubeadm respects the value of `UnversionedKubeletCo
(during `kubeadm join`, `kubeadm reset`, `kubeadm upgrade ...`), kubeadm attempts to use unversioned ConfigMap name first;
if that does not succeed, kubeadm falls back to using the legacy (versioned) name for that ConfigMap.
`IPv6DualStack`
: This flag helps to configure components dual stack when the feature is in progress. For more details on Kubernetes
dual-stack support see [Dual-stack support with kubeadm](/docs/setup/production-environment/tools/kubeadm/dual-stack-support/).
### Adding kube-proxy parameters {#kube-proxy}
For information about kube-proxy parameters in the kubeadm configuration see:
@ -291,7 +292,7 @@ for etcd and CoreDNS.
#### Custom sandbox (pause) images {#custom-pause-image}
To set a custom image for these you need to configure this in your
To set a custom image for these you need to configure this in your
{{< glossary_tooltip text="container runtime" term_id="container-runtime" >}}
to use the image.
Consult the documentation for your container runtime to find out how to change this setting;
@ -386,8 +387,9 @@ DNS name or an address of a load balancer.
kubeadm certs certificate-key
```
Once the cluster is up, you can grab the admin credentials from the control-plane node
at `/etc/kubernetes/admin.conf` and use that to talk to the cluster.
Once the cluster is up, you can use the `/etc/kubernetes/admin.conf` file from
a control-plane node to talk to the cluster with administrator credentials or
[Generating kubeconfig files for additional users](/docs/tasks/administer-cluster/kubeadm/kubeadm-certs#kubeconfig-additional-users).
Note that this style of bootstrap has some relaxed security guarantees because
it does not allow the root CA hash to be validated with

View File

@ -317,7 +317,7 @@ The `content-encoding` header indicates that the response is compressed with `gz
## Retrieving large results sets in chunks
{{< feature-state for_k8s_version="v1.9" state="beta" >}}
{{< feature-state for_k8s_version="v1.29" state="stable" >}}
On large clusters, retrieving the collection of some resource types may result in
very large responses that can impact the server and client. For instance, a cluster
@ -325,9 +325,7 @@ may have tens of thousands of Pods, each of which is equivalent to roughly 2 KiB
encoded JSON. Retrieving all pods across all namespaces may result in a very large
response (10-20MB) and consume a large amount of server resources.
Provided that you don't explicitly disable the `APIListChunking`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/), the
Kubernetes API server supports the ability to break a single large collection request
The Kubernetes API server supports the ability to break a single large collection request
into many smaller chunks while preserving the consistency of the total request. Each
chunk can be returned sequentially which reduces both the total size of the request and
allows user-oriented clients to display results incrementally to improve responsiveness.

View File

@ -20,6 +20,19 @@ deprecated API versions to newer and more stable API versions.
## Removed APIs by release
### v1.32
The **v1.32** release will stop serving the following deprecated API versions:
#### Flow control resources {#flowcontrol-resources-v132}
The **flowcontrol.apiserver.k8s.io/v1beta3** API version of FlowSchema and PriorityLevelConfiguration will no longer be served in v1.32.
* Migrate manifests and API clients to use the **flowcontrol.apiserver.k8s.io/v1** API version, available since v1.29.
* All existing persisted objects are accessible via the new API
* Notable changes in **flowcontrol.apiserver.k8s.io/v1**:
* The PriorityLevelConfiguration `spec.limited.nominalConcurrencyShares` field only defaults to 30 when unspecified, and an explicit value of 0 is not changed to 30.
### v1.29
The **v1.29** release will stop serving the following deprecated API versions:
@ -28,8 +41,10 @@ The **v1.29** release will stop serving the following deprecated API versions:
The **flowcontrol.apiserver.k8s.io/v1beta2** API version of FlowSchema and PriorityLevelConfiguration will no longer be served in v1.29.
* Migrate manifests and API clients to use the **flowcontrol.apiserver.k8s.io/v1beta3** API version, available since v1.26.
* Migrate manifests and API clients to use the **flowcontrol.apiserver.k8s.io/v1** API version, available since v1.29, or the **flowcontrol.apiserver.k8s.io/v1beta3** API version, available since v1.26.
* All existing persisted objects are accessible via the new API
* Notable changes in **flowcontrol.apiserver.k8s.io/v1**:
* The PriorityLevelConfiguration `spec.limited.assuredConcurrencyShares` field is renamed to `spec.limited.nominalConcurrencyShares` and only defaults to 30 when unspecified, and an explicit value of 0 is not changed to 30.
* Notable changes in **flowcontrol.apiserver.k8s.io/v1beta3**:
* The PriorityLevelConfiguration `spec.limited.assuredConcurrencyShares` field is renamed to `spec.limited.nominalConcurrencyShares`

View File

@ -95,6 +95,12 @@ Required certificates:
| kube-apiserver-kubelet-client | kubernetes-ca | system:masters | client | |
| front-proxy-client | kubernetes-front-proxy-ca | | client | |
{{< note >}}
Instead of using the super-user group `system:masters` for `kube-apiserver-kubelet-client`
a less privileged group can be used. kubeadm uses the `kubeadm:cluster-admins` group for
that purpose.
{{< /note >}}
[1]: any other IP or DNS name you contact your cluster on (as used by [kubeadm](/docs/reference/setup-tools/kubeadm/)
the load balancer stable IP and/or DNS name, `kubernetes`, `kubernetes.default`, `kubernetes.default.svc`,
`kubernetes.default.svc.cluster`, `kubernetes.default.svc.cluster.local`)
@ -184,12 +190,13 @@ you need to provide if you are generating all of your own keys and certificates:
You must manually configure these administrator account and service accounts:
| filename | credential name | Default CN | O (in Subject) |
|-------------------------|----------------------------|-------------------------------------|----------------|
| admin.conf | default-admin | kubernetes-admin | system:masters |
| kubelet.conf | default-auth | system:node:`<nodeName>` (see note) | system:nodes |
| controller-manager.conf | default-controller-manager | system:kube-controller-manager | |
| scheduler.conf | default-scheduler | system:kube-scheduler | |
| filename | credential name | Default CN | O (in Subject) |
|-------------------------|----------------------------|-------------------------------------|------------------------|
| admin.conf | default-admin | kubernetes-admin | `<admin-group>` |
| super-admin.conf | default-super-admin | kubernetes-super-admin | system:masters |
| kubelet.conf | default-auth | system:node:`<nodeName>` (see note) | system:nodes |
| controller-manager.conf | default-controller-manager | system:kube-controller-manager | |
| scheduler.conf | default-scheduler | system:kube-scheduler | |
{{< note >}}
The value of `<nodeName>` for `kubelet.conf` **must** match precisely the value of the node name
@ -197,6 +204,22 @@ provided by the kubelet as it registers with the apiserver. For further details,
[Node Authorization](/docs/reference/access-authn-authz/node/).
{{< /note >}}
{{< note >}}
In the above example `<admin-group>` is implementation specific. Some tools sign the
certificate in the default `admin.conf` to be part of the `system:masters` group.
`system:masters` is a break-glass, super user group can bypass the authorization
layer of Kubernetes, such as RBAC. Also some tools do not generate a separate
`super-admin.conf` with a certificate bound to this super user group.
kubeadm generates two separate administrator certificates in kubeconfig files.
One is in `admin.conf` and has `Subject: O = kubeadm:cluster-admins, CN = kubernetes-admin`.
`kubeadm:cluster-admins` is a custom group bound to the `cluster-admin` ClusterRole.
This file is generated on all kubeadm managed control plane machines.
Another is in `super-admin.conf` that has `Subject: O = system:masters, CN = kubernetes-super-admin`.
This file is generated only on the node where `kubeadm init` was called.
{{< /note >}}
1. For each config, generate an x509 cert/key pair with the given CN and O.
1. Run `kubectl` as follows for each config:
@ -213,6 +236,7 @@ These files are used as follows:
| filename | command | comment |
|-------------------------|-------------------------|-----------------------------------------------------------------------|
| admin.conf | kubectl | Configures administrator user for the cluster |
| super-admin.conf | kubectl | Configures super administrator user for the cluster |
| kubelet.conf | kubelet | One required for each node in the cluster. |
| controller-manager.conf | kube-controller-manager | Must be added to manifest in `manifests/kube-controller-manager.yaml` |
| scheduler.conf | kube-scheduler | Must be added to manifest in `manifests/kube-scheduler.yaml` |
@ -221,6 +245,7 @@ The following files illustrate full paths to the files listed in the previous ta
```
/etc/kubernetes/admin.conf
/etc/kubernetes/super-admin.conf
/etc/kubernetes/kubelet.conf
/etc/kubernetes/controller-manager.conf
/etc/kubernetes/scheduler.conf

View File

@ -265,11 +265,19 @@ export KUBECONFIG=/etc/kubernetes/admin.conf
```
{{< warning >}}
Kubeadm signs the certificate in the `admin.conf` to have `Subject: O = system:masters, CN = kubernetes-admin`.
`system:masters` is a break-glass, super user group that bypasses the authorization layer (e.g. RBAC).
Do not share the `admin.conf` file with anyone and instead grant users custom permissions by generating
them a kubeconfig file using the `kubeadm kubeconfig user` command. For more details see
[Generating kubeconfig files for additional users](/docs/tasks/administer-cluster/kubeadm/kubeadm-certs#kubeconfig-additional-users).
The kubeconfig file `admin.conf` that `kubeadm init` generates contains a certificate with
`Subject: O = kubeadm:cluster-admins, CN = kubernetes-admin`. The group `kubeadm:cluster-admins`
is bound to the built-in `cluster-admin` ClusterRole.
Do not share the `admin.conf` file with anyone.
`kubeadm init` generates another kubeconfig file `super-admin.conf` that contains a certificate with
`Subject: O = system:masters, CN = kubernetes-super-admin`.
`system:masters` is a break-glass, super user group that bypasses the authorization layer (for example RBAC).
Do not share the `super-admin.conf` file with anyone. It is recommended to move the file to a safe location.
See
[Generating kubeconfig files for additional users](/docs/tasks/administer-cluster/kubeadm/kubeadm-certs#kubeconfig-additional-users)
on how to use `kubeadm kubeconfig user` to generate kubeconfig files for additional users.
{{< /warning >}}
Make a record of the `kubeadm join` command that `kubeadm init` outputs. You
@ -605,7 +613,7 @@ version as kubeadm or one version older.
Example:
* kubeadm is at {{< skew currentVersion >}}
* kubelet on the host must be at {{< skew currentVersion >}} or {{< skew currentVersionAddMinor -1 >}}
* kubelet on the host must be at {{< skew currentVersion >}}, {{< skew currentVersionAddMinor -1 >}}, {{< skew currentVersionAddMinor -2 >}} or {{< skew currentVersionAddMinor -3 >}}
### kubeadm's skew against kubeadm

View File

@ -0,0 +1,187 @@
---
title: Change the Access Mode of a PersistentVolume to ReadWriteOncePod
content_type: task
weight: 90
min-kubernetes-server-version: v1.22
---
<!-- overview -->
This page shows how to change the access mode on an existing PersistentVolume to
use `ReadWriteOncePod`.
## {{% heading "prerequisites" %}}
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
{{< note >}}
The `ReadWriteOncePod` access mode graduated to stable in the Kubernetes v1.29
release. If you are running a version of Kubernetes older than v1.29, you might
need to enable a feature gate. Check the documentation for your version of
Kubernetes.
{{< /note >}}
{{< note >}}
The `ReadWriteOncePod` access mode is only supported for
{{< glossary_tooltip text="CSI" term_id="csi" >}} volumes.
To use this volume access mode you will need to update the following
[CSI sidecars](https://kubernetes-csi.github.io/docs/sidecar-containers.html)
to these versions or greater:
* [csi-provisioner:v3.0.0+](https://github.com/kubernetes-csi/external-provisioner/releases/tag/v3.0.0)
* [csi-attacher:v3.3.0+](https://github.com/kubernetes-csi/external-attacher/releases/tag/v3.3.0)
* [csi-resizer:v1.3.0+](https://github.com/kubernetes-csi/external-resizer/releases/tag/v1.3.0)
{{< /note >}}
## Why should I use `ReadWriteOncePod`?
Prior to Kubernetes v1.22, the `ReadWriteOnce` access mode was commonly used to
restrict PersistentVolume access for workloads that required single-writer
access to storage. However, this access mode had a limitation: it restricted
volume access to a single *node*, allowing multiple pods on the same node to
read from and write to the same volume simultaneously. This could pose a risk
for applications that demand strict single-writer access for data safety.
If ensuring single-writer access is critical for your workloads, consider
migrating your volumes to `ReadWriteOncePod`.
<!-- steps -->
## Migrating existing PersistentVolumes
If you have existing PersistentVolumes, they can be migrated to use
`ReadWriteOncePod`. Only migrations from `ReadWriteOnce` to `ReadWriteOncePod`
are supported.
In this example, there is already a `ReadWriteOnce` "cat-pictures-pvc"
PersistentVolumeClaim that is bound to a "cat-pictures-pv" PersistentVolume,
and a "cat-pictures-writer" Deployment that uses this PersistentVolumeClaim.
{{< note >}}
If your storage plugin supports
[Dynamic provisioning](/docs/concepts/storage/dynamic-provisioning/),
the "cat-picutres-pv" will be created for you, but its name may differ. To get
your PersistentVolume's name run:
```shell
kubectl get pvc cat-pictures-pvc -o jsonpath='{.spec.volumeName}'
```
{{< /note >}}
And you can view the PVC before you make changes. Either view the manifest
locally, or run `kubectl get pvc <name-of-pvc> -o yaml`. The output is similar
to:
```yaml
# cat-pictures-pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: cat-pictures-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
```
Here's an example Deployment that relies on that PersistentVolumeClaim:
```yaml
# cat-pictures-writer-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: cat-pictures-writer
spec:
replicas: 3
selector:
matchLabels:
app: cat-pictures-writer
template:
metadata:
labels:
app: cat-pictures-writer
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
volumeMounts:
- name: cat-pictures
mountPath: /mnt
volumes:
- name: cat-pictures
persistentVolumeClaim:
claimName: cat-pictures-pvc
readOnly: false
```
As a first step, you need to edit your PersistentVolume's
`spec.persistentVolumeReclaimPolicy` and set it to `Retain`. This ensures your
PersistentVolume will not be deleted when you delete the corresponding
PersistentVolumeClaim:
```shell
kubectl patch pv cat-pictures-pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
```
Next you need to stop any workloads that are using the PersistentVolumeClaim
bound to the PersistentVolume you want to migrate, and then delete the
PersistentVolumeClaim. Avoid making any other changes to the
PersistentVolumeClaim, such as volume resizes, until after the migration is
complete.
Once that is done, you need to clear your PersistentVolume's `spec.claimRef.uid`
to ensure PersistentVolumeClaims can bind to it upon recreation:
```shell
kubectl scale --replicas=0 deployment cat-pictures-writer
kubectl delete pvc cat-pictures-pvc
kubectl patch pv cat-pictures-pv -p '{"spec":{"claimRef":{"uid":""}}}'
```
After that, replace the PersistentVolume's list of valid access modes to be
(only) `ReadWriteOncePod`:
```shell
kubectl patch pv cat-pictures-pv -p '{"spec":{"accessModes":["ReadWriteOncePod"]}}'
```
{{< note >}}
The `ReadWriteOncePod` access mode cannot be combined with other access modes.
Make sure `ReadWriteOncePod` is the only access mode on the PersistentVolume
when updating, otherwise the request will fail.
{{< /note >}}
Next you need to modify your PersistentVolumeClaim to set `ReadWriteOncePod` as
the only access mode. You should also set the PersistentVolumeClaim's
`spec.volumeName` to the name of your PersistentVolume to ensure it binds to
this specific PersistentVolume.
Once this is done, you can recreate your PersistentVolumeClaim and start up your
workloads:
```shell
# IMPORTANT: Make sure to edit your PVC in cat-pictures-pvc.yaml before applying. You need to:
# - Set ReadWriteOncePod as the only access mode
# - Set spec.volumeName to "cat-pictures-pv"
kubectl apply -f cat-pictures-pvc.yaml
kubectl apply -f cat-pictures-writer-deployment.yaml
```
Lastly you may edit your PersistentVolume's `spec.persistentVolumeReclaimPolicy`
and set to it back to `Delete` if you previously changed it.
```shell
kubectl patch pv cat-pictures-pv -p '{"spec":{"persistentVolumeReclaimPolicy":"Delete"}}'
```
## {{% heading "whatsnext" %}}
* Learn more about [PersistentVolumes](/docs/concepts/storage/persistent-volumes/).
* Learn more about [PersistentVolumeClaims](/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims).
* Learn more about [Configuring a Pod to Use a PersistentVolume for Storage](/docs/tasks/configure-pod-container/configure-persistent-volume-storage/)

View File

@ -248,7 +248,7 @@ The following table describes each available provider.
</td>
</tr>
<tr>
<th rowspan="2" scope="row"><tt>kms</tt> v2 <em>(beta)</em></th>
<th rowspan="2" scope="row"><tt>kms</tt> v2 </th>
<td>Uses envelope encryption scheme with DEK per API server.</td>
<td>Strongest</td>
<td>Fast</td>
@ -259,14 +259,10 @@ The following table describes each available provider.
Data is encrypted by data encryption keys (DEKs) using AES-GCM; DEKs
are encrypted by key encryption keys (KEKs) according to configuration
in Key Management Service (KMS).
Kubernetes defaults to generating a new DEK at API server startup, which is then
reused for object encryption.
If you enable the <tt>KMSv2KDF</tt>
<a href="/docs/reference/command-line-tools-reference/feature-gates/">feature gate</a>,
Kubernetes instead generates a new DEK per encryption from a secret seed.
Whichever approach you configure, the DEK or seed is also rotated whenever the KEK is rotated.<br/>
Kubernetes generates a new DEK per encryption from a secret seed.
The seed is rotated whenever the KEK is rotated.<br/>
A good choice if using a third party tool for key management.
Available in beta from Kubernetes v1.27.
Available as stable from Kubernetes v1.29.
<br />
Read how to <a href="/docs/tasks/administer-cluster/kms-provider#configuring-the-kms-provider-kms-v2">configure the KMS V2 provider</a>.
</td>
@ -538,4 +534,3 @@ To allow automatic reloading, configure the API server to run with:
* Read about [decrypting data that are already stored at rest](/docs/tasks/administer-cluster/decrypt-data/)
* Learn more about the [EncryptionConfiguration configuration API (v1)](/docs/reference/config-api/apiserver-encryption.v1/).

View File

@ -9,9 +9,17 @@ weight: 370
<!-- overview -->
This page shows how to configure a Key Management Service (KMS) provider and plugin to enable secret data encryption.
In Kubernetes {{< skew currentVersion >}} there are two versions of KMS at-rest encryption.
You should use KMS v2 if feasible because KMS v1 is deprecated (since Kubernetes v1.28).
However, you should also read and observe the **Caution** notices in this page that highlight specific
cases when you must not use KMS v2. KMS v2 offers significantly better performance characteristics than KMS v1.
You should use KMS v2 if feasible because KMS v1 is deprecated (since Kubernetes v1.28) and disabled by default (since Kubernetes v1.29).
KMS v2 offers significantly better performance characteristics than KMS v1.
{{< caution >}}
This documentation is for the generally available implementation of KMS v2 (and for the
deprecated version 1 implementation).
If you are using any control plane components older than Kubernetes v1.29, please check
the equivalent page in the documentation for the version of Kubernetes that your cluster
is running. Earlier releases of Kubernetes had different behavior that may be relevant
for information security.
{{< /caution >}}
## {{% heading "prerequisites" %}}
@ -24,7 +32,7 @@ you have selected. Kubernetes recommends using KMS v2.
(if you are running a different version of Kubernetes that also supports the v2 KMS
API, switch to the documentation for that version of Kubernetes).
- If you selected KMS API v1 to support clusters prior to version v1.27
or if you have a legacy KMS plugin that only supports KMS v1,
or if you have a legacy KMS plugin that only supports KMS v1,
any supported Kubernetes version will work. This API is deprecated as of Kubernetes v1.28.
Kubernetes does not recommend the use of this API.
@ -35,80 +43,36 @@ you have selected. Kubernetes recommends using KMS v2.
* Kubernetes version 1.10.0 or later is required
* For version 1.29 and later, the v1 implementation of KMS is disabled by default.
To enable the feature, set `--feature-gates=KMSv1=true` to configure a KMS v1 provider.
* Your cluster must use etcd v3 or later
### KMS v2
{{< feature-state for_k8s_version="v1.27" state="beta" >}}
{{< feature-state for_k8s_version="v1.29" state="stable" >}}
* For version 1.25 and 1.26, enabling the feature via kube-apiserver feature gate is required.
Set `--feature-gates=KMSv2=true` to configure a KMS v2 provider.
For environments where all API servers are running version 1.28 or later, and you do not require the ability
to downgrade to Kubernetes v1.27, you can enable the `KMSv2KDF` feature gate (a beta feature) for more
robust data encryption key generation. The Kubernetes project recommends enabling KMS v2 KDF if those
preconditions are met.
* Your cluster must use etcd v3 or later
{{< caution >}}
The KMS v2 API and implementation changed in incompatible ways in-between the alpha release in v1.25
and the beta release in v1.27. Attempting to upgrade from old versions with the alpha feature
enabled will result in data loss.
---
Running mixed API server versions with some servers at v1.27, and others at v1.28 _with the
`KMSv2KDF` feature gate enabled_ is **not supported** - and is likely to result in data loss.
{{< /caution >}}
<!-- steps -->
## KMS encryption and per-object encryption keys
The KMS encryption provider uses an envelope encryption scheme to encrypt data in etcd.
The data is encrypted using a data encryption key (DEK).
The DEKs are encrypted with a key encryption key (KEK) that is stored and managed in a remote KMS.
With KMS v1, a new DEK is generated for each encryption.
If you use the (deprecated) v1 implementation of KMS, a new DEK is generated for each encryption.
With KMS v2, there are two ways for the API server to generate a DEK.
Kubernetes defaults to generating a new DEK at API server startup, which is then reused
for resource encryption. However, if you use KMS v2 _and_ enable the `KMSv2KDF`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/), then
Kubernetes instead generates a new DEK **per encryption**: the API server uses a
With KMS v2, a new DEK is generated **per encryption**: the API server uses a
_key derivation function_ to generate single use data encryption keys from a secret seed
combined with some random data.
Whichever approach you configure, the DEK or seed is also rotated whenever the KEK is rotated
(see `Understanding key_id and Key Rotation` section below for more details).
The seed is rotated whenever the KEK is rotated
(see the _Understanding key_id and Key Rotation_ section below for more details).
The KMS provider uses gRPC to communicate with a specific KMS plugin over a UNIX domain socket.
The KMS plugin, which is implemented as a gRPC server and deployed on the same host(s)
as the Kubernetes control plane, is responsible for all communication with the remote KMS.
{{< caution >}}
If you are running virtual machine (VM) based nodes that leverage VM state store with this feature,
using KMS v2 is **insecure** and an information security risk unless you also explicitly enable
the `KMSv2KDF`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
With KMS v2, the API server uses AES-GCM with a 12 byte nonce (8 byte atomic counter and 4 bytes random data) for encryption.
The following issues could occur if the VM is saved and restored:
1. The counter value may be lost or corrupted if the VM is saved in an inconsistent state or restored improperly.
This can lead to a situation where the same counter value is used twice, resulting in the same nonce being used
for two different messages.
2. If the VM is restored to a previous state, the counter value may be set back to its previous value,
resulting in the same nonce being used again.
Although both of these cases are partially mitigated by the 4 byte random nonce, this can compromise
the security of the encryption.
If you have enabled the `KMSv2KDF`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) _and_ are using KMS v2
(not KMS v1), the API server generates single use data encryption keys from a secret seed.
This eliminates the need for a counter based nonce while avoiding nonce collision concerns.
It also removes any specific concerns with using KMS v2 and VM state store.
{{< /caution >}}
## Configuring the KMS provider
To configure a KMS provider on the API server, include a provider of type `kms` in the
@ -197,10 +161,14 @@ Then use the functions and data structures in the stub file to develop the serve
##### KMS v2 {#developing-a-kms-plugin-gRPC-server-notes-kms-v2}
* KMS plugin version: `v2beta1`
* KMS plugin version: `v2`
In response to procedure call `Status`, a compatible KMS plugin should return `v2beta1` as `StatusResponse.version`,
In response to the `Status` remote procedure call, a compatible KMS plugin should return its KMS compatibility
version as `StatusResponse.version`. That status response should also include
"ok" as `StatusResponse.healthz` and a `key_id` (remote KMS KEK ID) as `StatusResponse.key_id`.
The Kubernetes project recommends you make your plugin
compatible with the stable `v2` KMS API. Kubernetes {{< skew currentVersion >}} also supports the
`v2beta1` API for KMS; future Kubernetes releases are likely to continue supporting that beta version.
The API server polls the `Status` procedure call approximately every minute when everything is healthy,
and every 10 seconds when the plugin is not healthy. Plugins must take care to optimize this call as it will be
@ -258,20 +226,20 @@ Then use the functions and data structures in the stub file to develop the serve
API server restart is required to perform KEK rotation.
{{< caution >}}
Because you don't control the number of writes performed with the DEK,
Because you don't control the number of writes performed with the DEK,
the Kubernetes project recommends rotating the KEK at least every 90 days.
{{< /caution >}}
* protocol: UNIX domain socket (`unix`)
The plugin is implemented as a gRPC server that listens at UNIX domain socket.
The plugin deployment should create a file on the file system to run the gRPC unix domain socket connection.
The API server (gRPC client) is configured with the KMS provider (gRPC server) unix
domain socket endpoint in order to communicate with it.
An abstract Linux socket may be used by starting the endpoint with `/@`, i.e. `unix:///@foo`.
Care must be taken when using this type of socket as they do not have concept of ACL
(unlike traditional file based sockets).
However, they are subject to Linux networking namespace, so will only be accessible to
The plugin is implemented as a gRPC server that listens at UNIX domain socket.
The plugin deployment should create a file on the file system to run the gRPC unix domain socket connection.
The API server (gRPC client) is configured with the KMS provider (gRPC server) unix
domain socket endpoint in order to communicate with it.
An abstract Linux socket may be used by starting the endpoint with `/@`, i.e. `unix:///@foo`.
Care must be taken when using this type of socket as they do not have concept of ACL
(unlike traditional file based sockets).
However, they are subject to Linux networking namespace, so will only be accessible to
containers within the same pod unless host networking is used.
### Integrating a KMS plugin with the remote KMS
@ -363,10 +331,6 @@ The following table summarizes the health check endpoints for each KMS version:
These healthcheck endpoint paths are hard coded and generated/controlled by the server. The indices for individual healthchecks corresponds to the order in which the KMS encryption config is processed.
At a high level, restarting an API server when a KMS plugin is unhealthy is unlikely to make the situation better.
It can make the situation significantly worse by throwing away the API server's DEK cache. Thus the general
recommendation is to ignore the API server KMS healthz checks for liveness purposes, i.e. `/livez?exclude=kms-providers`.
Until the steps defined in [Ensuring all secrets are encrypted](#ensuring-all-secrets-are-encrypted) are performed, the `providers` list should end with the `identity: {}` provider to allow unencrypted data to be read. Once all resources are encrypted, the `identity` provider should be removed to prevent the API server from honoring unencrypted data.
For details about the `EncryptionConfiguration` format, please check the

View File

@ -98,6 +98,12 @@ read-write by a single Node. It defines the [StorageClass name](/docs/concepts/s
`manual` for the PersistentVolume, which will be used to bind
PersistentVolumeClaim requests to this PersistentVolume.
{{< note >}}
This example uses the `ReadWriteOnce` access mode, for simplicity. For
production use, the Kubernetes project recommends using the `ReadWriteOncePod`
access mode instead.
{{< /note >}}
Create the PersistentVolume:
```shell

View File

@ -1,6 +1,6 @@
---
reviewers:
- bprashanth
- enj
- liggitt
- thockin
title: Configure Service Accounts for Pods
@ -184,6 +184,16 @@ ServiceAccount. You can request a specific token duration using the `--duration`
command line argument to `kubectl create token` (the actual duration of the issued
token might be shorter, or could even be longer).
When the `ServiceAccountTokenNodeBinding` and `ServiceAccountTokenNodeBindingValidation`
features are enabled and the `KUBECTL_NODE_BOUND_TOKENS` enviroment variable is set to `true`,
it is possible to create a service account token that is directly bound to a `Node`:
```shell
KUBECTL_NODE_BOUND_TOKENS=true kubectl create token build-robot --bound-object-kind Node --bound-object-name node-001 --bound-object-uid 123...456
```
The token will be valid until it expires or either the assocaited `Node` or service account are deleted.
{{< note >}}
Versions of Kubernetes before v1.22 automatically created long term credentials for
accessing the Kubernetes API. This older mechanism was based on creating token Secrets
@ -408,6 +418,39 @@ You can configure this behavior for the `spec` of a Pod using a
[projected volume](/docs/concepts/storage/volumes/#projected) type called
`ServiceAccountToken`.
The token from this projected volume is a {{<glossary_tooltip term_id="jwt" text="JSON Web Token">}} (JWT).
The JSON payload of this token follows a well defined schema - an example payload for a pod bound token:
```yaml
{
"aud": [ # matches the requested audiences, or the API server's default audiences when none are explicitly requested
"https://kubernetes.default.svc"
],
"exp": 1731613413,
"iat": 1700077413,
"iss": "https://kubernetes.default.svc", # matches the first value passed to the --service-account-issuer flag
"jti": "ea28ed49-2e11-4280-9ec5-bc3d1d84661a", # ServiceAccountTokenJTI feature must be enabled for the claim to be present
"kubernetes.io": {
"namespace": "kube-system",
"node": { # ServiceAccountTokenPodNodeInfo feature must be enabled for the API server to add this node reference claim
"name": "127.0.0.1",
"uid": "58456cb0-dd00-45ed-b797-5578fdceaced"
},
"pod": {
"name": "coredns-69cbfb9798-jv9gn",
"uid": "778a530c-b3f4-47c0-9cd5-ab018fb64f33"
},
"serviceaccount": {
"name": "coredns",
"uid": "a087d5a0-e1dd-43ec-93ac-f13d89cd13af"
},
"warnafter": 1700081020
},
"nbf": 1700077413,
"sub": "system:serviceaccount:kube-system:coredns"
}
```
### Launch a Pod using service account token projection
To provide a Pod with a token with an audience of `vault` and a validity duration

View File

@ -749,8 +749,12 @@ validations are not supported by ratcheting under the implementation in Kubernet
- `not`
- any validations in a descendent of one of these fields
- `x-kubernetes-validations`
For Kubernetes {{< skew currentVersion >}}, CRD validation rules](#validation-rules) are ignored by
ratcheting. This may change in later Kubernetes releases.
For Kubernetes 1.28, CRD validation rules](#validation-rules) are ignored by
ratcheting. Starting with Alpha 2 in Kubernetes 1.29, `x-kubernetes-validations`
are ratcheted.
Transition Rules are never ratcheted: only errors raised by rules that do not
use `oldSelf` will be automatically ratcheted if their values are unchanged.
- `x-kubernetes-list-type`
Errors arising from changing the list type of a subschema will not be
ratcheted. For example adding `set` onto a list with duplicates will always
@ -767,19 +771,13 @@ validations are not supported by ratcheting under the implementation in Kubernet
- `additionalProperties`
To remove a previously specified `additionalProperties` validation will not be
ratcheted.
- `metadata`
Errors arising from changes to fields within an object's `metadata` are not
ratcheted.
### Validation rules
{{< feature-state state="beta" for_k8s_version="v1.25" >}}
Validation rules are in beta since 1.25 and the `CustomResourceValidationExpressions`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled by default to
validate custom resource based on _validation rules_. You can disable this feature by explicitly
setting the `CustomResourceValidationExpressions` feature gate to `false`, for the
[kube-apiserver](/docs/reference/command-line-tools-reference/kube-apiserver/) component. This
feature is only available if the schema is a [structural schema](#specifying-a-structural-schema).
{{< feature-state state="stable" for_k8s_version="v1.29" >}}
Validation rules use the [Common Expression Language (CEL)](https://github.com/google/cel-spec)
to validate custom resource values. Validation rules are included in
@ -1177,6 +1175,34 @@ The `fieldPath` field does not support indexing arrays numerically.
Setting `fieldPath` is optional.
#### The `optionalOldSelf` field {#field-optional-oldself}
{{< feature-state state="alpha" for_k8s_version="v1.29" >}}
The feature [CRDValidationRatcheting](#validation-ratcheting) must be enabled in order to
make use of this field.
The `optionalOldSelf` field is a boolean field that alters the behavior of [Transition Rules](#transition-rules) described
below. Normally, a transition rule will not evaluate if `oldSelf` cannot be determined:
during object creation or when a new value is introduced in an update.
If `optionalOldSelf` is set to true, then transition rules will always be
evaluated and the type of `oldSelf` be changed to a CEL [`Optional`](https://pkg.go.dev/github.com/google/cel-go/cel#OptionalTypes) type.
`optionalOldSelf` is useful in cases where schema authors would like a more
control tool [than provided by the default equality based behavior of ][#validation-ratcheting]
to introduce newer, usually stricter constraints on new values, while still
allowing old values to be "grandfathered" or ratcheted using the older validation.
Example Usage:
| CEL | Description |
|-----------------------------------------|-------------|
| `self.foo == "foo" || (oldSelf.hasValue() && oldSelf.value().foo != "foo")` | Ratcheted rule. Once a value is set to "foo", it must stay foo. But if it existed before the "foo" constraint was introduced, it may use any value |
| [oldSelf.orValue(""), self].all(x, ["OldCase1", "OldCase2"].exists(case, x == case)) || ["NewCase1", "NewCase2"].exists(case, self == case) || ["NewCase"].has(self)` | "Ratcheted validation for removed enum cases if oldSelf used them" |
| oldSelf.optMap(o, o.size()).orValue(0) < 4 || self.size() >= 4 | Ratcheted validation of newly increased minimum map or list size |
#### Validation functions {#available-validation-functions}
Functions available include:

View File

@ -0,0 +1,184 @@
---
reviewers:
- thockin
- dwinship
min-kubernetes-server-version: v1.29
title: Extend Service IP Ranges
content_type: task
---
<!-- overview -->
{{< feature-state state="alpha" for_k8s_version="v1.29" >}}
This document shares how to extend the existing Service IP range assigned to a cluster.
## {{% heading "prerequisites" %}}
{{< include "task-tutorial-prereqs.md" >}}
{{< version-check >}}
<!-- steps -->
## API
Kubernetes clusters with kube-apiservers that have enabled the `MultiCIDRServiceAllocator`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) and the `networking.k8s.io/v1alpha1` API,
will create a new ServiceCIDR object that takes the well-known name `kubernetes`, and that uses an IP address range
based on the value of the `--service-cluster-ip-range` command line argument to kube-apiserver.
```sh
kubectl get servicecidr
```
```
NAME CIDRS AGE
kubernetes 10.96.0.0/28 17d
```
The well-known `kubernetes` Service, that exposes the kube-apiserver endpoint to the Pods, calculates
the first IP address from the default ServiceCIDR range and uses that IP address as its
cluster IP address.
```sh
kubectl get service kubernetes
```
```
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 17d
```
The default Service, in this case, uses the ClusterIP 10.96.0.1, that has the corresponding IPAddress object.
```sh
kubectl get ipaddress 10.96.0.1
```
```
NAME PARENTREF
10.96.0.1 services/default/kubernetes
```
The ServiceCIDRs are protected with {{<glossary_tooltip text="finalizers" term_id="finalizer">}}, to avoid leaving Service ClusterIPs orphans;
the finalizer is only removed if there is another subnet that contains the existing IPAddresses or
there are no IPAddresses belonging to the subnet.
## Extend the number of available IPs for Services
There are cases that users will need to increase the number addresses available to Services, previously, increasing the Service range was a disruptive operation that could also cause data loss. With this new feature users only need to add a new ServiceCIDR to increase the number of available addresses.
### Adding a new ServiceCIDR
On a cluster with a 10.96.0.0/28 range for Services, there is only 2^(32-28) - 2 = 14 IP addresses available. The `kubernetes.default` Service is always created; for this example, that leaves you with only 13 possible Services.
```sh
for i in $(seq 1 13); do kubectl create service clusterip "test-$i" --tcp 80 -o json | jq -r .spec.clusterIP; done
```
```
10.96.0.11
10.96.0.5
10.96.0.12
10.96.0.13
10.96.0.14
10.96.0.2
10.96.0.3
10.96.0.4
10.96.0.6
10.96.0.7
10.96.0.8
10.96.0.9
error: failed to create ClusterIP service: Internal error occurred: failed to allocate a serviceIP: range is full
```
You can increase the number of IP addresses available for Services, by creating a new ServiceCIDR
that extends or adds new IP address ranges.
```sh
cat <EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1alpha1
kind: ServiceCIDR
metadata:
name: newcidr1
spec:
cidrs:
- 10.96.0.0/24
EOF
```
```
servicecidr.networking.k8s.io/newcidr1 created
```
and this will allow you to create new Services with ClusterIPs that will be picked from this new range.
```sh
for i in $(seq 13 16); do kubectl create service clusterip "test-$i" --tcp 80 -o json | jq -r .spec.clusterIP; done
```
```
10.96.0.48
10.96.0.200
10.96.0.121
10.96.0.144
```
### Deleting a ServiceCIDR
You cannot delete a ServiceCIDR if there are IPAddresses that depend on the ServiceCIDR.
```sh
kubectl delete servicecidr newcidr1
```
```
servicecidr.networking.k8s.io "newcidr1" deleted
```
Kubernetes uses a finalizer on the ServiceCIDR to track this dependent relationship.
```sh
kubectl get servicecidr newcidr1 -o yaml
```
```
apiVersion: networking.k8s.io/v1alpha1
kind: ServiceCIDR
metadata:
creationTimestamp: "2023-10-12T15:11:07Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2023-10-12T15:12:45Z"
finalizers:
- networking.k8s.io/service-cidr-finalizer
name: newcidr1
resourceVersion: "1133"
uid: 5ffd8afe-c78f-4e60-ae76-cec448a8af40
spec:
cidrs:
- 10.96.0.0/24
status:
conditions:
- lastTransitionTime: "2023-10-12T15:12:45Z"
message: There are still IPAddresses referencing the ServiceCIDR, please remove
them or create a new ServiceCIDR
reason: OrphanIPAddress
status: "False"
type: Ready
```
By removing the Services containing the IP addresses that are blocking the deletion of the ServiceCIDR
```sh
for i in $(seq 13 16); do kubectl delete service "test-$i" ; done
```
```
service "test-13" deleted
service "test-14" deleted
service "test-15" deleted
service "test-16" deleted
```
the control plane notices the removal. The control plane then removes its finalizer,
so that the ServiceCIDR that was pending deletion will actually be removed.
```sh
kubectl get servicecidr newcidr1
```
```
Error from server (NotFound): servicecidrs.networking.k8s.io "newcidr1" not found
```

View File

@ -0,0 +1,28 @@
apiVersion: v1
kind: Pod
metadata:
name: sa-ctb-name-test
spec:
containers:
- name: container-test
image: busybox
command: ["sleep", "3600"]
volumeMounts:
- name: token-vol
mountPath: "/root-certificates"
readOnly: true
serviceAccountName: default
volumes:
- name: root-certificates-vol
projected:
sources:
- clusterTrustBundle:
name: example
path: example-roots.pem
- clusterTrustBundle:
signerName: "example.com/mysigner"
labelSelector:
matchLabels:
version: live
path: mysigner-roots.pem
optional: true

View File

@ -1,4 +1,4 @@
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
apiVersion: flowcontrol.apiserver.k8s.io/v1
kind: FlowSchema
metadata:
name: health-for-strangers

View File

@ -1,4 +1,4 @@
apiVersion: flowcontrol.apiserver.k8s.io/v1beta3
apiVersion: flowcontrol.apiserver.k8s.io/v1
kind: FlowSchema
metadata:
name: list-events-default-service-account

View File

@ -2,6 +2,14 @@
# This file helps to populate the /releases page, and is also parsed to find out the
# latest patch version for a minor release.
schedules:
- release: 1.29
releaseDate: 2023-12-13
next:
release: 1.29.1
cherryPickDeadline: 2024-01-12
targetDate: 2024-01-17
maintenanceModeStartDate: 2024-12-28
endOfLifeDate: 2025-02-28
- release: 1.28
releaseDate: 2023-08-15
next:

View File

@ -142,9 +142,9 @@ time_format_default = "January 02, 2006 at 3:04 PM PST"
description = "Production-Grade Container Orchestration"
showedit = true
latest = "v1.28"
latest = "v1.29"
version = "v1.28"
version = "v1.29"
githubbranch = "main"
docsbranch = "main"
deprecated = false
@ -184,35 +184,35 @@ js = [
]
[[params.versions]]
version = "v1.28"
githubbranch = "v1.28.0"
version = "v1.29"
githubbranch = "v1.29.0"
docsbranch = "main"
url = "https://kubernetes.io"
[[params.versions]]
version = "v1.28"
githubbranch = "v1.28.4"
docsbranch = "release-1.28"
url = "https://v1-28.docs.kubernetes.io"
[[params.versions]]
version = "v1.27"
githubbranch = "v1.27.4"
githubbranch = "v1.27.8"
docsbranch = "release-1.27"
url = "https://v1-27.docs.kubernetes.io"
[[params.versions]]
version = "v1.26"
githubbranch = "v1.26.7"
githubbranch = "v1.26.11"
docsbranch = "release-1.26"
url = "https://v1-26.docs.kubernetes.io"
[[params.versions]]
version = "v1.25"
githubbranch = "v1.25.12"
githubbranch = "v1.25.16"
docsbranch = "release-1.25"
url = "https://v1-25.docs.kubernetes.io"
[[params.versions]]
version = "v1.24"
githubbranch = "v1.24.16"
docsbranch = "release-1.24"
url = "https://v1-24.docs.kubernetes.io"
# User interface configuration
[params.ui]
# Enable to show the side bar menu in its compact state.