Merge branch 'main' into update-page-weights

pull/38920/head
Abigail McCarthy 2023-02-22 10:38:34 -05:00 committed by GitHub
commit 9efe14f6a7
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
623 changed files with 26378 additions and 8935 deletions

View File

@ -44,6 +44,7 @@ aliases:
- divya-mohan0209
- kbhawkey
- mehabhalodiya
- mengjiao-liu
- natalisucks
- nate-double-u
- onlydole
@ -104,11 +105,9 @@ aliases:
- atoato88
- bells17
- kakts
- makocchi-git
- ptux
- t-inu
sig-docs-ko-owners: # Admins for Korean content
- ClaudiaJKang
- gochist
- ianychoi
- jihoon-seo
@ -116,7 +115,6 @@ aliases:
- yoonian
- ysyukr
sig-docs-ko-reviews: # PR reviews for Korean content
- ClaudiaJKang
- gochist
- ianychoi
- jihoon-seo
@ -146,6 +144,7 @@ aliases:
- chenxuc
- howieyuen
# idealhack
- kinzhi
- mengjiao-liu
- my-git9
# pigletfly
@ -160,9 +159,7 @@ aliases:
- devlware
- edsoncelio
- femrtnz
- jailton
- jcjesus
- jhonmike
- rikatz
- stormqueen1990
- yagonobre
@ -170,9 +167,8 @@ aliases:
- devlware
- edsoncelio
- femrtnz
- jailton
- jcjesus
- jhonmike
- mrerlison
- rikatz
- stormqueen1990
- yagonobre
@ -196,9 +192,7 @@ aliases:
- mfilocha
- nvtkaszpir
sig-docs-uk-owners: # Admins for Ukrainian content
- anastyakulyk
- Arhell
- butuzov
- MaxymVlasov
sig-docs-uk-reviews: # PR reviews for Ukrainian content
- Arhell

View File

@ -16,7 +16,7 @@ Hugo(Extended version)を使用してWebサイトをローカルで実行する
このリポジトリを使用するには、以下をローカルにインストールする必要があります。
- [npm](https://www.npmjs.com/)
- [Go](https://golang.org/)
- [Go](https://go.dev/)
- [Hugo(Extended version)](https://gohugo.io/)
- [Docker](https://www.docker.com/)などのコンテナランタイム

View File

@ -13,7 +13,7 @@ Você pode executar o website localmente utilizando o Hugo (versão Extended), o
Para usar este repositório, você precisa instalar:
- [npm](https://www.npmjs.com/)
- [Go](https://golang.org/)
- [Go](https://go.dev/)
- [Hugo (versão Extended)](https://gohugo.io/)
- Um container runtime, por exemplo [Docker](https://www.docker.com/).

View File

@ -878,3 +878,17 @@ div.alert > em.javascript-required {
color: #fff;
background: #326de6;
}
// Adjust Bing search result page
#bing-results-container {
padding: 1em;
}
#bing-pagination-container {
padding: 1em;
margin-bottom: 1em;
a.bing-page-anchor {
padding: 0.5em;
margin: 0.25em;
}
}

View File

@ -30,7 +30,9 @@ Whether testing locally or running a global enterprise, Kubernetes flexibility g
{{% blocks/feature image="suitcase" %}}
#### Run K8s Anywhere
Kubernetes is open source giving you the freedom to take advantage of on-premises, hybrid, or public cloud infrastructure, letting you effortlessly move workloads to where it matters to you.
Kubernetes is open source giving you the freedom to take advantage of on-premises, hybrid, or public cloud infrastructure, letting you effortlessly move workloads to where it matters to you.
To download Kubernetes, visit the [download](/releases/download/) section.
{{% /blocks/feature %}}
@ -43,12 +45,12 @@ Kubernetes is open source giving you the freedom to take advantage of on-premise
<button id="desktopShowVideoButton" onclick="kub.showVideo()">Watch Video</button>
<br>
<br>
<a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america" button id="desktopKCButton">Attend KubeCon North America on October 24-28, 2022</a>
<a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/" button id="desktopKCButton">Attend KubeCon + CloudNativeCon Europe on April 18-21, 2023</a>
<br>
<br>
<br>
<br>
<a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/" button id="desktopKCButton">Attend KubeCon Europe on April 17-21, 2023</a>
<a href="https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/" button id="desktopKCButton">Attend KubeCon + CloudNativeCon North America on November 6-9, 2023</a>
</div>
<div id="videoPlayer">
<iframe data-url="https://www.youtube.com/embed/H06qrNmGqyE?autoplay=1" frameborder="0" allowfullscreen></iframe>

View File

@ -16,7 +16,7 @@ To give you a flavor, here are four Kubernetes features that came from our exper
1) [Pods](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/pods.md). A pod is the unit of scheduling in Kubernetes. It is a resource envelope in which one or more containers run. Containers that are part of the same pod are guaranteed to be scheduled together onto the same machine, and can share state via local volumes.
1) [Pods](/docs/concepts/workloads/pods/). A pod is the unit of scheduling in Kubernetes. It is a resource envelope in which one or more containers run. Containers that are part of the same pod are guaranteed to be scheduled together onto the same machine, and can share state via local volumes.
@ -24,15 +24,15 @@ Borg has a similar abstraction, called an alloc (short for “resource allocatio
2) [Services](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/services.md). Although Borgs primary role is to manage the lifecycles of tasks and machines, the applications that run on Borg benefit from many other cluster services, including naming and load balancing. Kubernetes supports naming and load balancing using the service abstraction: a service has a name and maps to a dynamic set of pods defined by a label selector (see next section). Any container in the cluster can connect to the service using the service name. Under the covers, Kubernetes automatically load-balances connections to the service among the pods that match the label selector, and keeps track of where the pods are running as they get rescheduled over time due to failures.
2) [Services](/docs/concepts/services-networking/service/). Although Borgs primary role is to manage the lifecycles of tasks and machines, the applications that run on Borg benefit from many other cluster services, including naming and load balancing. Kubernetes supports naming and load balancing using the service abstraction: a service has a name and maps to a dynamic set of pods defined by a label selector (see next section). Any container in the cluster can connect to the service using the service name. Under the covers, Kubernetes automatically load-balances connections to the service among the pods that match the label selector, and keeps track of where the pods are running as they get rescheduled over time due to failures.
3) [Labels](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/labels.md). A container in Borg is usually one replica in a collection of identical or nearly identical containers that correspond to one tier of an Internet service (e.g. the front-ends for Google Maps) or to the workers of a batch job (e.g. a MapReduce). The collection is called a Job, and each replica is called a Task. While the Job is a very useful abstraction, it can be limiting. For example, users often want to manage their entire service (composed of many Jobs) as a single entity, or to uniformly manage several related instances of their service, for example separate canary and stable release tracks. At the other end of the spectrum, users frequently want to reason about and control subsets of tasks within a Job -- the most common example is during rolling updates, when different subsets of the Job need to have different configurations.
3) [Labels](/docs/concepts/overview/working-with-objects/labels/). A container in Borg is usually one replica in a collection of identical or nearly identical containers that correspond to one tier of an Internet service (e.g. the front-ends for Google Maps) or to the workers of a batch job (e.g. a MapReduce). The collection is called a Job, and each replica is called a Task. While the Job is a very useful abstraction, it can be limiting. For example, users often want to manage their entire service (composed of many Jobs) as a single entity, or to uniformly manage several related instances of their service, for example separate canary and stable release tracks. At the other end of the spectrum, users frequently want to reason about and control subsets of tasks within a Job -- the most common example is during rolling updates, when different subsets of the Job need to have different configurations.
Kubernetes supports more flexible collections than Borg by organizing pods using labels, which are arbitrary key/value pairs that users attach to pods (and in fact to any object in the system). Users can create groupings equivalent to Borg Jobs by using a “job:\<jobname\>” label on their pods, but they can also use additional labels to tag the service name, service instance (production, staging, test), and in general, any subset of their pods. A label query (called a “label selector”) is used to select which set of pods an operation should be applied to. Taken together, labels and [replication controllers](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/replication-controller.md) allow for very flexible update semantics, as well as for operations that span the equivalent of Borg Jobs.
Kubernetes supports more flexible collections than Borg by organizing pods using labels, which are arbitrary key/value pairs that users attach to pods (and in fact to any object in the system). Users can create groupings equivalent to Borg Jobs by using a “job:\<jobname\>” label on their pods, but they can also use additional labels to tag the service name, service instance (production, staging, test), and in general, any subset of their pods. A label query (called a “label selector”) is used to select which set of pods an operation should be applied to. Taken together, labels and [replication controllers](/docs/concepts/workloads/controllers/replicationcontroller/) allow for very flexible update semantics, as well as for operations that span the equivalent of Borg Jobs.

View File

@ -143,7 +143,7 @@ When a default StorageClass exists and a user creates a PersistentVolumeClaim wi
Kubernetes 1.4 maintains backwards compatibility with the alpha version of the dynamic provisioning feature to allow for a smoother transition to the beta version. The alpha behavior is triggered by the existance of the alpha dynamic provisioning annotation (volume. **alpha**.kubernetes.io/storage-class). Keep in mind that if the beta annotation (volume. **beta**.kubernetes.io/storage-class) is present, it takes precedence, and triggers the beta behavior.
Kubernetes 1.4 maintains backwards compatibility with the alpha version of the dynamic provisioning feature to allow for a smoother transition to the beta version. The alpha behavior is triggered by the existence of the alpha dynamic provisioning annotation (volume. **alpha**.kubernetes.io/storage-class). Keep in mind that if the beta annotation (volume. **beta**.kubernetes.io/storage-class) is present, it takes precedence, and triggers the beta behavior.

View File

@ -192,7 +192,7 @@ To modify/add your own DAGs, you can use `kubectl cp` to upload local files into
# Get Involved
This feature is just the beginning of multiple major efforts to improves Apache Airflow integration into Kubernetes. The Kubernetes Operator has been merged into the [1.10 release branch of Airflow](https://github.com/apache/incubator-airflow/tree/v1-10-test) (the executor in experimental mode), along with a fully k8s native scheduler called the Kubernetes Executor (article to come). These features are still in a stage where early adopters/contributers can have a huge influence on the future of these features.
This feature is just the beginning of multiple major efforts to improves Apache Airflow integration into Kubernetes. The Kubernetes Operator has been merged into the [1.10 release branch of Airflow](https://github.com/apache/incubator-airflow/tree/v1-10-test) (the executor in experimental mode), along with a fully k8s native scheduler called the Kubernetes Executor (article to come). These features are still in a stage where early adopters/contributors can have a huge influence on the future of these features.
For those interested in joining these efforts, I'd recommend checkint out these steps:

View File

@ -460,7 +460,7 @@ Now you can configure your DHCP. Basically you should set the `next-server` and
I use ISC-DHCP server, and here is an example `dhcpd.conf`:
```
shared-network ltsp-netowrk {
shared-network ltsp-network {
subnet 10.9.0.0 netmask 255.255.0.0 {
authoritative;
default-lease-time -1;

View File

@ -8,7 +8,7 @@ date: 2018-12-12
Kubernetes provides great primitives for deploying applications to a cluster: it can be as simple as `kubectl create -f app.yaml`. Deploy apps across multiple clusters has never been that simple. How should app workloads be distributed? Should the app resources be replicated into all clusters, replicated into selected clusters, or partitioned into clusters? How is access to the clusters managed? What happens if some of the resources that a user wants to distribute pre-exist, in some or all of the clusters, in some form?
In SIG Multicluster, our journey has revealed that there are multiple possible models to solve these problems and there probably is no single best-fit, all-scenario solution. [Federation](/docs/concepts/cluster-administration/federation/), however, is the single biggest Kubernetes open source sub-project, and has seen the maximum interest and contribution from the community in this problem space. The project initially reused the Kubernetes API to do away with any added usage complexity for an existing Kubernetes user. This approach was not viable, because of the problems summarised below:
In SIG Multicluster, our journey has revealed that there are multiple possible models to solve these problems and there probably is no single best-fit, all-scenario solution. [Kubernetes Cluster Federation (KubeFed for short)](https://github.com/kubernetes-sigs/kubefed), however, is the single biggest Kubernetes open source sub-project, and has seen the maximum interest and contribution from the community in this problem space. The project initially reused the Kubernetes API to do away with any added usage complexity for an existing Kubernetes user. This approach was not viable, because of the problems summarised below:
* Difficulties in re-implementing the Kubernetes API at the cluster level, as federation-specific extensions were stored in annotations.
* Limited flexibility in federated types, placement and reconciliation, due to 1:1 emulation of the Kubernetes API.

View File

@ -129,7 +129,7 @@ spec:
spec:
containers:
- name: test-container
image: k8s.gcr.io/busybox
image: registry.k8s.io/busybox # updated after publication (previously used k8s.gcr.io/busybox)
command:
- "/bin/sh"
args:

View File

@ -27,7 +27,7 @@ Our goal is for Kubernetes docs to be a trustworthy guide to Kubernetes features
### Re-homing content
Some content will be removed that readers may find helpful. To make sure readers have continous access to information, we're giving stakeholders until the [1.19 release deadline for docs](https://github.com/kubernetes/sig-release/tree/master/releases/release-1.19), **July 9th, 2020** to re-home any content slated for removal.
Some content will be removed that readers may find helpful. To make sure readers have continuous access to information, we're giving stakeholders until the [1.19 release deadline for docs](https://github.com/kubernetes/sig-release/tree/master/releases/release-1.19), **July 9th, 2020** to re-home any content slated for removal.
Over the next few months you'll see less third party content in the docs as contributors open PRs to remove content.

View File

@ -520,7 +520,7 @@ And the real strength of WSL2 integration, the port `8443` once open on WSL2 dis
Working on the command line is always good and very insightful. However, when dealing with Kubernetes we might want, at some point, to have a visual overview.
For that, Minikube embeded the [Kubernetes Dashboard](https://github.com/kubernetes/dashboard). Thanks to it, running and accessing the Dashboard is very simple:
For that, Minikube embedded the [Kubernetes Dashboard](https://github.com/kubernetes/dashboard). Thanks to it, running and accessing the Dashboard is very simple:
```bash
# Enable the Dashboard service

View File

@ -55,7 +55,7 @@ The team has made progress in the last few months that is well worth celebrating
- The K8s-Infrastructure Working Group released an automated billing report that they start every meeting off by reviewing as a group.
- DNS for k8s.io and kubernetes.io are also fully [community-owned](https://groups.google.com/g/kubernetes-dev/c/LZTYJorGh7c/m/u-ydk-yNEgAJ), with community members able to [file issues](https://github.com/kubernetes/k8s.io/issues/new?assignees=&labels=wg%2Fk8s-infra&template=dns-request.md&title=DNS+REQUEST%3A+%3Cyour-dns-record%3E) to manage records.
- The container registry [k8s.gcr.io](https://github.com/kubernetes/k8s.io/tree/main/k8s.gcr.io) is also fully community-owned and available for all Kubernetes subprojects to use.
- The container registry [registry.k8s.io](https://github.com/kubernetes/k8s.io/tree/main/registry.k8s.io) is also fully community-owned and available for all Kubernetes subprojects to use.
_Note:_ The container registry has changed to registry.k8s.io. Updated on August 25, 2022.
- The Kubernetes [publishing-bot](https://github.com/kubernetes/publishing-bot) responsible for keeping k8s.io/kubernetes/staging repositories published to their own top-level repos (For example: [kubernetes/api](https://github.com/kubernetes/api)) runs on a community-owned cluster.
- The gcsweb.k8s.io service used to provide anonymous access to GCS buckets for kubernetes artifacts runs on a community-owned cluster.

View File

@ -198,7 +198,7 @@ GUINEVERE SAENGER: I would want Jorge to be really on top of making sure that ev
Greater communication of timelines and just giving people more time and space to be able to get in their changes, or at least, seemingly give them more time and space by sending early warnings, is going to be helpful. Of course, he's going to have a slightly longer release, too, than I did. This might be related to a unique Q4 challenge. Overall, I would encourage him to take more breaks, to rely more on his release shadows, and split out the work in a fashion that allows everyone to have a turn and everyone to have a break as well.
**ADAM GLICK: What would your advice be to someone who is hearing your experience and is inspired to get involved with the Kubernetes release or contributer process?**
**ADAM GLICK: What would your advice be to someone who is hearing your experience and is inspired to get involved with the Kubernetes release or contributor process?**
GUINEVERE SAENGER: Those are two separate questions. So let me tackle the Kubernetes release question first. Kubernetes [SIG Release](https://github.com/kubernetes/sig-release/#readme) has, in my opinion, a really excellent onboarding program for new members. We have what is called the [Release Team Shadow Program](https://github.com/kubernetes/sig-release/blob/master/release-team/shadows.md). We also have the Release Engineering Shadow Program, or the Release Management Shadow Program. Those are two separate subprojects within SIG Release. And each subproject has a team of roles, and each role can have two to four shadows that are basically people who are part of that role team, and they are learning that role as they are doing it.

View File

@ -81,7 +81,7 @@ If the `ServerSideFieldValidation` feature gate is enabled starting 1.23, users
With the feature gate enabled, we also introduce the `fieldValidation` query parameter so that users can specify the desired behavior of the server on a per request basis. Valid values for the `fieldValidation` query parameter are:
- Ignore (default when feature gate is disabled, same as pre-1.23 behavior of dropping/ignoring unkonwn fields)
- Ignore (default when feature gate is disabled, same as pre-1.23 behavior of dropping/ignoring unknown fields)
- Warn (default when feature gate is enabled).
- Strict (this will fail the request with an Invalid Request error)

View File

@ -5,6 +5,7 @@ linkTitle: "Dockershim Removal FAQ"
date: 2022-02-17
slug: dockershim-faq
aliases: [ '/dockershim' ]
evergreen: true
---
**This supersedes the original

View File

@ -32,7 +32,7 @@ Caleb is also a co-organizer of the [CloudNative NZ](https://www.meetup.com/clou
## [Dylan Graham](https://github.com/DylanGraham)
Dylan Graham is a cloud engineer from Adeliade, Australia. He has been contributing to the upstream Kubernetes project since 2018.
Dylan Graham is a cloud engineer from Adelaide, Australia. He has been contributing to the upstream Kubernetes project since 2018.
He stated that being a part of such a large-scale project was initially overwhelming, but that the community's friendliness and openness assisted him in getting through it.

View File

@ -115,7 +115,8 @@ metadata:
spec:
containers:
- name: agnhost
image: k8s.gcr.io/e2e-test-images/agnhost:2.35
# image changed since publication (previously used registry "k8s.gcr.io")
image: registry.k8s.io/e2e-test-images/agnhost:2.35
command: ["/agnhost", "grpc-health-checking"]
ports:
- containerPort: 5000

View File

@ -18,7 +18,7 @@ case where you're using the `OrderedReady` Pod management policy for a StatefulS
Here are some examples:
- I am using a StatefulSet to orchestrate a multi-instance, cache based application where the size of the cache is large. The cache
starts cold and requires some siginificant amount of time before the container can start. There could be more initial startup tasks
starts cold and requires some significant amount of time before the container can start. There could be more initial startup tasks
that are required. A RollingUpdate on this StatefulSet would take a lot of time before the application is fully updated. If the
StatefulSet supported updating more than one pod at a time, it would result in a much faster update.
@ -50,7 +50,8 @@ spec:
app: nginx
spec:
containers:
- image: k8s.gcr.io/nginx-slim:0.8
# image changed since publication (previously used registry "k8s.gcr.io")
- image: registry.k8s.io/nginx-slim:0.8
imagePullPolicy: IfNotPresent
name: nginx
updateStrategy:
@ -66,7 +67,7 @@ If you enable the new feature and you don't specify a value for `maxUnavailable`
I'll run through a scenario based on that example manifest to demonstrate how this feature works. I will deploy a StatefulSet that
has 5 replicas, with `maxUnavailable` set to 2 and `partition` set to 0.
I can trigger a rolling update by changing the image to `k8s.gcr.io/nginx-slim:0.9`. Once I initiate the rolling update, I can
I can trigger a rolling update by changing the image to `registry.k8s.io/nginx-slim:0.9`. Once I initiate the rolling update, I can
watch the pods update 2 at a time as the current value of maxUnavailable is 2. The below output shows a span of time and is not
complete. The maxUnavailable can be an absolute number (for example, 2) or a percentage of desired Pods (for example, 10%). The
absolute number is calculated from percentage by rounding up to the nearest integer.

View File

@ -145,7 +145,7 @@ workstream within the Gateway API subproject focused on Gateway API for Mesh
Management and Administration.
This group will deliver [enhancement
proposals](https://gateway-api.sigs.k8s.io/v1beta1/contributing/gep/) consisting
proposals](https://gateway-api.sigs.k8s.io/geps/overview/) consisting
of resources, additions, and modifications to the Gateway API specification for
mesh and mesh-adjacent use-cases.

View File

@ -89,7 +89,7 @@ To use cgroup v2 with Kubernetes, you must meet the following requirements:
* The kubelet and the container runtime are configured to use the [systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver)
The kubelet and container runtime use a [cgroup driver](/docs/setup/production-environment/container-runtimes#cgroup-drivers)
to set cgroup paramaters. When using cgroup v2, it's strongly recommended that both
to set cgroup parameters. When using cgroup v2, it's strongly recommended that both
the kubelet and your container runtime use the
[systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver),
so that there's a single cgroup manager on the system. To configure the kubelet

View File

@ -438,7 +438,7 @@ kubectl apply -f crds/stable.example.com_appendonlylists.yaml
customresourcedefinition.apiextensions.k8s.io/appendonlylists.stable.example.com created
```
Creating an inital list with one element inside should succeed without problem:
Creating an initial list with one element inside should succeed without problem:
```shell
kubectl apply -f - <<EOF
---

View File

@ -30,7 +30,7 @@ cloud computing resources.
In this release we want to recognise the importance of all these building blocks on which Kubernetes
is developed and used, while at the same time raising awareness on the importance of taking the
energy consumption footprint into account: environmental sustainability is an inescapable concern of
creators and users of any software solution, and the environmental footprint of sofware, like
creators and users of any software solution, and the environmental footprint of software, like
Kubernetes, an area which we believe will play a significant role in future releases.
As a community, we always work to make each new release process better than before (in this release,

View File

@ -54,7 +54,7 @@ closed and the storage will be unmounted.
HostProcess and Linux privileged containers enable similar scenarios but differ
greatly in their implementation (hence the naming difference). HostProcess containers
have their own [PodSecurityContext](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.25/#windowssecuritycontextoptions-v1-core) fields.
have their own [PodSecurityContext](/docs/reference/generated/kubernetes-api/v1.25/#windowssecuritycontextoptions-v1-core) fields.
Those used to configure Linux privileged containers **do not** apply. Enabling privileged access to a Windows host is a
fundamentally different process than with Linux so the configuration and
capabilities of each differ significantly. Below is a diagram detailing the
@ -110,7 +110,7 @@ Please note that within a Pod, you can't mix HostProcess containers with normal
- Work through [Create a Windows HostProcess Pod](/docs/tasks/configure-pod-container/create-hostprocess-pod/)
- Read about Kubernetes [Pod Security Standards](/docs/concepts/security/pod-security-standards/) and [Pod Security Admission](docs/concepts/security/pod-security-admission/)
- Read about Kubernetes [Pod Security Standards](/docs/concepts/security/pod-security-standards/) and [Pod Security Admission](/docs/concepts/security/pod-security-admission/)
- Read the enhancement proposal [Windows Privileged Containers and Host Networking Mode](https://github.com/kubernetes/enhancements/tree/master/keps/sig-windows/1981-windows-privileged-container-support) (KEP-1981)

View File

@ -60,12 +60,14 @@ kind: ValidatingAdmissionPolicyBinding
metadata:
name: "demo-binding-test.example.com"
spec:
policy: "demo-policy.example.com"
policyName: "demo-policy.example.com"
matchResources:
namespaceSelector:
- key: environment,
operator: In,
values: ["test"]
matchExpressions:
- key: environment
operator: In
values:
- test
```
This `ValidatingAdmissionPolicyBinding` resource binds the above policy only to
@ -115,14 +117,16 @@ kind: ValidatingAdmissionPolicyBinding
metadata:
name: "demo-binding-production.example.com"
spec:
policy: "demo-policy.example.com"
paramsRef:
policyName: "demo-policy.example.com"
paramRef:
name: "demo-params-production.example.com"
matchResources:
namespaceSelector:
- key: environment,
operator: In,
values: ["production"]
matchExpressions:
- key: environment
operator: In
values:
- production
```
```yaml

View File

@ -90,7 +90,7 @@ This dependency made the tracking of Job status unreliable, because Pods can be
deleted from the API for a number of reasons, including:
- The garbage collector removing orphan Pods when a Node goes down.
- The garbage collector removing terminated Pods when they reach a threshold.
- The Kubernetes scheduler preempting a Pod to accomodate higher priority Pods.
- The Kubernetes scheduler preempting a Pod to accommodate higher priority Pods.
- The taint manager evicting a Pod that doesn't tolerate a `NoExecute` taint.
- External controllers, not included as part of Kubernetes, or humans deleting
Pods.

View File

@ -107,38 +107,42 @@ If you want to test the feature whilst it's alpha, you need to enable the releva
If you would like to see the feature in action and verify it works fine in your cluster here's what you can try:
1. Define a basic PersistentVolumeClaim:
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-1
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
```
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-1
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
```
2. Create the PersistentVolumeClaim when there is no default StorageClass. The PVC won't provision or bind (unless there is an existing, suitable PV already present) and will remain in <code>Pending</code> state.
```
$ kc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-1 Pending
```
```
$ kc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-1 Pending
```
3. Configure one StorageClass as default.
```
$ kc patch sc -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/my-storageclass patched
```
```
$ kc patch sc -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/my-storageclass patched
```
4. Verify that PersistentVolumeClaims is now provisioned correctly and was updated retroactively with new default StorageClass.
```
$ kc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-1 Bound pvc-06a964ca-f997-4780-8627-b5c3bf5a87d8 1Gi RWO my-storageclass 87m
```
```
$ kc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-1 Bound pvc-06a964ca-f997-4780-8627-b5c3bf5a87d8 1Gi RWO my-storageclass 87m
```
### New metrics

View File

@ -103,7 +103,7 @@ spec:
resources:
requests:
memory: "256Mi"
cpu: "0.2"
cpu: "0.2"
limits:
memory: ".5Gi"
cpu: "0.5"

View File

@ -0,0 +1,84 @@
---
layout: blog
title: Consider All Microservices Vulnerable — And Monitor Their Behavior
date: 2023-01-20
slug: security-behavior-analysis
---
**Author:**
David Hadas (IBM Research Labs)
_This post warns Devops from a false sense of security. Following security best practices when developing and configuring microservices do not result in non-vulnerable microservices. The post shows that although all deployed microservices are vulnerable, there is much that can be done to ensure microservices are not exploited. It explains how analyzing the behavior of clients and services from a security standpoint, named here **"Security-Behavior Analytics"**, can protect the deployed vulnerable microservices. It points to [Guard](http://knative.dev/security-guard), an open source project offering security-behavior monitoring and control of Kubernetes microservices presumed vulnerable._
As cyber attacks continue to intensify in sophistication, organizations deploying cloud services continue to grow their cyber investments aiming to produce safe and non-vulnerable services. However, the year-by-year growth in cyber investments does not result in a parallel reduction in cyber incidents. Instead, the number of cyber incidents continues to grow annually. Evidently, organizations are doomed to fail in this struggle - no matter how much effort is made to detect and remove cyber weaknesses from deployed services, it seems offenders always have the upper hand.
Considering the current spread of offensive tools, sophistication of offensive players, and ever-growing cyber financial gains to offenders, any cyber strategy that relies on constructing a non-vulnerable, weakness-free service in 2023 is clearly too naïve. It seems the only viable strategy is to:
&#x27A5; **Admit that your services are vulnerable!**
In other words, consciously accept that you will never create completely invulnerable services. If your opponents find even a single weakness as an entry-point, you lose! Admitting that in spite of your best efforts, all your services are still vulnerable is an important first step. Next, this post discusses what you can do about it...
## How to protect microservices from being exploited
Being vulnerable does not necessarily mean that your service will be exploited. Though your services are vulnerable in some ways unknown to you, offenders still need to identify these vulnerabilities and then exploit them. If offenders fail to exploit your service vulnerabilities, you win! In other words, having a vulnerability that cant be exploited, represents a risk that cant be realized.
{{< figure src="security_behavior_figure_1.svg" alt="Image of an example of offender gaining foothold in a service" class="diagram-large" caption="Figure 1. An Offender gaining foothold in a vulnerable service" >}}
The above diagram shows an example in which the offender does not yet have a foothold in the service; that is, it is assumed that your service does not run code controlled by the offender on day 1. In our example the service has vulnerabilities in the API exposed to clients. To gain an initial foothold the offender uses a malicious client to try and exploit one of the service API vulnerabilities. The malicious client sends an exploit that triggers some unplanned behavior of the service.
More specifically, lets assume the service is vulnerable to an SQL injection. The developer failed to sanitize the user input properly, thereby allowing clients to send values that would change the intended behavior. In our example, if a client sends a query string with key “username” and value of _“tom or 1=1”_, the client will receive the data of all users. Exploiting this vulnerability requires the client to send an irregular string as the value. Note that benign users will not be sending a string with spaces or with the equal sign character as a username, instead they will normally send legal usernames which for example may be defined as a short sequence of characters a-z. No legal username can trigger service unplanned behavior.
In this simple example, one can already identify several opportunities to detect and block an attempt to exploit the vulnerability (un)intentionally left behind by the developer, making the vulnerability unexploitable. First, the malicious client behavior differs from the behavior of benign clients, as it sends irregular requests. If such a change in behavior is detected and blocked, the exploit will never reach the service. Second, the service behavior in response to the exploit differs from the service behavior in response to a regular request. Such behavior may include making subsequent irregular calls to other services such as a data store, taking irregular time to respond, and/or responding to the malicious client with an irregular response (for example, containing much more data than normally sent in case of benign clients making regular requests). Service behavioral changes, if detected, will also allow blocking the exploit in different stages of the exploitation attempt.
More generally:
- Monitoring the behavior of clients can help detect and block exploits against service API vulnerabilities. In fact, deploying efficient client behavior monitoring makes many vulnerabilities unexploitable and others very hard to achieve. To succeed, the offender needs to create an exploit undetectable from regular requests.
- Monitoring the behavior of services can help detect services as they are being exploited regardless of the attack vector used. Efficient service behavior monitoring limits what an attacker may be able to achieve as the offender needs to ensure the service behavior is undetectable from regular service behavior.
Combining both approaches may add a protection layer to the deployed vulnerable services, drastically decreasing the probability for anyone to successfully exploit any of the deployed vulnerable services. Next, let us identify four use cases where you need to use security-behavior monitoring.
## Use cases
One can identify the following four different stages in the life of any service from a security standpoint. In each stage, security-behavior monitoring is required to meet different challenges:
Service State | Use case | What do you need in order to cope with this use case?
------------- | ------------- | -----------------------------------------
**Normal** | **No known vulnerabilities:** The service owner is normally not aware of any known vulnerabilities in the service image or configuration. Yet, it is reasonable to assume that the service has weaknesses. | **Provide generic protection against any unknown, zero-day, service vulnerabilities** - Detect/block irregular patterns sent as part of incoming client requests that may be used as exploits.
**Vulnerable** | **An applicable CVE is published:** The service owner is required to release a new non-vulnerable revision of the service. Research shows that in practice this process of removing a known vulnerability may take many weeks to accomplish (2 months on average). | **Add protection based on the CVE analysis** - Detect/block incoming requests that include specific patterns that may be used to exploit the discovered vulnerability. Continue to offer services, although the service has a known vulnerability.
**Exploitable** | **A known exploit is published:** The service owner needs a way to filter incoming requests that contain the known exploit. | **Add protection based on a known exploit signature** - Detect/block incoming client requests that carry signatures identifying the exploit. Continue to offer services, although the presence of an exploit.
**Misused** | **An offender misuses pods backing the service:** The offender can follow an attack pattern enabling him/her to misuse pods. The service owner needs to restart any compromised pods while using non compromised pods to continue offering the service. Note that once a pod is restarted, the offender needs to repeat the attack pattern before he/she may again misuse it. | **Identify and restart instances of the component that is being misused** - At any given time, some backing pods may be compromised and misused, while others behave as designed. Detect/remove the misused pods while allowing other pods to continue servicing client requests.
Fortunately, microservice architecture is well suited to security-behavior monitoring as discussed next.
## Security-Behavior of microservices versus monoliths {#microservices-vs-monoliths}
Kubernetes is often used to support workloads designed with microservice architecture. By design, microservices aim to follow the UNIX philosophy of "Do One Thing And Do It Well". Each microservice has a bounded context and a clear interface. In other words, you can expect the microservice clients to send relatively regular requests and the microservice to present a relatively regular behavior as a response to these requests. Consequently, a microservice architecture is an excellent candidate for security-behavior monitoring.
{{< figure src="security_behavior_figure_2.svg" alt="Image showing why microservices are well suited for security-behavior monitoring" class="diagram-large" caption="Figure 2. Microservices are well suited for security-behavior monitoring" >}}
The diagram above clarifies how dividing a monolithic service to a set of microservices improves our ability to perform security-behavior monitoring and control. In a monolithic service approach, different client requests are intertwined, resulting in a diminished ability to identify irregular client behaviors. Without prior knowledge, an observer of the intertwined client requests will find it hard to distinguish between types of requests and their related characteristics. Further, internal client requests are not exposed to the observer. Lastly, the aggregated behavior of the monolithic service is a compound of the many different internal behaviors of its components, making it hard to identify irregular service behavior.
In a microservice environment, each microservice is expected by design to offer a more well-defined service and serve better defined type of requests. This makes it easier for an observer to identify irregular client behavior and irregular service behavior. Further, a microservice design exposes the internal requests and internal services which offer more security-behavior data to identify irregularities by an observer. Overall, this makes the microservice design pattern better suited for security-behavior monitoring and control.
## Security-Behavior monitoring on Kubernetes
Kubernetes deployments seeking to add Security-Behavior may use [Guard](http://knative.dev/security-guard), developed under the CNCF project Knative. Guard is integrated into the full Knative automation suite that runs on top of Kubernetes. Alternatively, **you can deploy Guard as a standalone tool** to protect any HTTP-based workload on Kubernetes.
See:
- [Guard](https://github.com/knative-sandbox/security-guard) on Github, for using Guard as a standalone tool.
- The Knative automation suite - Read about Knative, in the blog post [Opinionated Kubernetes](https://davidhadas.wordpress.com/2022/08/29/knative-an-opinionated-kubernetes) which describes how Knative simplifies and unifies the way web services are deployed on Kubernetes.
- You may contact Guard maintainers on the [SIG Security](https://kubernetes.slack.com/archives/C019LFTGNQ3) Slack channel or on the Knative community [security](https://knative.slack.com/archives/CBYV1E0TG) Slack channel. The Knative community channel will move soon to the [CNCF Slack](https://communityinviter.com/apps/cloud-native/cncf) under the name `#knative-security`.
The goal of this post is to invite the Kubernetes community to action and introduce Security-Behavior monitoring and control to help secure Kubernetes based deployments. Hopefully, the community as a follow up will:
1. Analyze the cyber challenges presented for different Kubernetes use cases
1. Add appropriate security documentation for users on how to introduce Security-Behavior monitoring and control.
1. Consider how to integrate with tools that can help users monitor and control their vulnerable services.
## Getting involved
You are welcome to get involved and join the effort to develop security behavior monitoring
and control for Kubernetes; to share feedback and contribute to code or documentation;
and to make or suggest improvements of any kind.

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 379 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 421 KiB

View File

@ -0,0 +1,141 @@
---
layout: blog
title: "Spotlight on SIG Instrumentation"
slug: sig-instrumentation-spotlight-2023
date: 2023-02-03
canonicalUrl: https://www.kubernetes.dev/blog/2023/02/03/sig-instrumentation-spotlight-2023/
---
**Author:** Imran Noor Mohamed (Delivery Hero)
Observability requires the right data at the right time for the right consumer
(human or piece of software) to make the right decision. In the context of Kubernetes,
having best practices for cluster observability across all Kubernetes components is crucial.
SIG Instrumentation helps to address this issue by providing best practices and tools
that all other SIGs use to instrument Kubernetes components-like the *API server*,
*scheduler*, *kubelet* and *kube-controller-manager*.
In this SIG Instrumentation spotlight, [Imran Noor Mohamed](https://www.linkedin.com/in/imrannoormohamed/),
SIG ContribEx-Comms tech lead talked with [Elana Hashman](https://twitter.com/ehashdn),
and [Han Kang](https://www.linkedin.com/in/hankang), chairs of SIG Instrumentation,
on how the SIG is organized, what are the current challenges and how anyone can get involved and contribute.
## About SIG Instrumentation
**Imran (INM)**: Hello, thank you for the opportunity of learning more about SIG Instrumentation.
Could you tell us a bit about yourself, your role, and how you got involved in SIG Instrumentation?
**Han (HK)**: I started in SIG Instrumentation in 2018, and became a chair in 2020.
I primarily got involved with SIG instrumentation due to a number of upstream issues
with metrics which ended up affecting GKE in bad ways. As a result, we ended up
launching an initiative to stabilize our metrics and make metrics a proper API.
**Elana (EH)**: I also joined SIG Instrumentation in 2018 and became a chair at the
same time as Han. I was working as a site reliability engineer (SRE) on bare metal
Kubernetes clusters and was working to build out our observability stack.
I encountered some issues with label joins where Kubernetes metrics didnt match
kube-state-metrics ([KSM](https://github.com/kubernetes/kube-state-metrics)) and
started participating in SIG meetings to improve things. I helped test performance
improvements to kube-state-metrics and ultimately coauthored a KEP for overhauling
metrics in the 1.14 release to improve usability.
**Imran (INM)**: Interesting! Does that mean SIG Instrumentation involves a lot of plumbing?
**Han (HK)**: I wouldnt say it involves a ton of plumbing, though it does touch
basically every code base. We have our own dedicated directories for our metrics,
logs, and tracing frameworks which we tend to work out of primarily. We do have to
interact with other SIGs in order to propagate our changes which makes us more of
a horizontal SIG.
**Imran (INM)**: Speaking about interaction and coordination with other SIG could
you describe how the SIGs is organized?
**Elana (EH)**: In SIG Instrumentation, we have two chairs, Han and myself, as well
as two tech leads, David Ashpole and Damien Grisonnet. We all work together as the
SIGs leads in order to run meetings, triage issues and PRs, review and approve KEPs,
plan for each release, present at KubeCon and community meetings, and write our annual
report. Within the SIG we also have a number of important subprojects, each of which is
stewarded by its subproject owners. For example, Marek Siarkowicz is a subproject owner
of [metrics-server](https://github.com/kubernetes-sigs/metrics-server).
Because were a horizontal SIG, some of our projects have a wide scope and require
coordination from a dedicated group of contributors. For example, in order to guide
the Kubernetes migration to structured logging, we chartered the
[Structured Logging](https://github.com/kubernetes/community/blob/master/wg-structured-logging/README.md)
Working Group (WG), organized by Marek and Patrick Ohly. The WG doesnt own any code,
but helps with various components such as the *kubelet*, *scheduler*, etc. in migrating
their code to use structured logs.
**Imran (INM)**: Walking through the
[charter](https://github.com/kubernetes/community/blob/master/sig-instrumentation/charter.md)
alone its clear that SIG Instrumentation has a lot of sub-projects.
Could you highlight some important ones?
**Han (HK)**: We have many different sub-projects and we are in dire need of
people who can come and help shepherd them. Our most important projects in-tree
(that is, within the kubernetes/kubernetes repo) are metrics, tracing, and,
structured logging. Our most important projects out-of-tree are
(a) KSM (kube-state-metrics) and (b) metrics-server.
**Elana (EH)**: Echoing this, we would love to bring on more maintainers for
kube-state-metrics and metrics-server. Our friends at WG Structured Logging are
also looking for contributors. Other subprojects include klog, prometheus-adapter,
and a new subproject that we just launched for collecting high-fidelity, scalable
utilization metrics called [usage-metrics-collector](https://github.com/kubernetes-sigs/usage-metrics-collector).
All are seeking new contributors!
## Current status and ongoing challenges
**Imran (INM)**: For release [1.26](https://github.com/kubernetes/sig-release/tree/master/releases/release-1.26)
we can see that there are a relevant number of metrics, logs, and tracing
[KEPs](https://www.k8s.dev/resources/keps/) in the pipeline. Would you like to
point out important things for last release (maybe alpha & stable milestone candidates?)
**Han (HK)**: We can now generate [documentation](https://kubernetes.io/docs/reference/instrumentation/metrics/)
for every single metric in the main Kubernetes code base! We have a pretty fancy
static analysis pipeline that enables this functionality. Weve also added feature
metrics so that you can look at your metrics to determine which features are enabled
in your cluster at a given time. Lastly, we added a component-sli endpoint, which
should make it easy for people to create availability SLOs for *control-plane* components.
**Elana (EH)**: Weve also been working on tracing KEPs for both the *API server*
and *kubelet*, though neither graduated in 1.26. Im also really excited about the
work Han is doing with WG Reliability to extend and improve our metrics stability framework.
**Imran (INM)**: What do you think are the Kubernetes-specific challenges tackled by
the SIG Instrumentation? What are the future efforts to solve them?
**Han (HK)**: SIG instrumentation suffered a bit in the past from being a horizontal SIG.
We did not have an obvious location to put our code and did not have a good mechanism to
audit metrics that people would randomly add. Weve fixed this over the years and now we
have dedicated spots for our code and a reliable mechanism for auditing new metrics.
We also now offer stability guarantees for metrics. We hope to have full-blown tracing
up and down the kubernetes stack, and metric support via exemplars.
**Elana (EH)**: I think SIG Instrumentation is a really interesting SIG because it
poses different kinds of opportunities to get involved than in other SIGs. You dont
have to be a software developer to contribute to our SIG! All of our components and
subprojects are focused on better understanding Kubernetes and its performance in
production, which allowed me to get involved as one of the few SIG Chairs working as
an SRE at that time. I like that we provide opportunities for newcomers to contribute
through using, testing, and providing feedback on our subprojects, which is a lower
barrier to entry. Because many of these projects are out-of-tree, I think one of our
challenges is to figure out whats in scope for core Kubernetes SIGs instrumentation
subprojects, whats missing, and then fill in the gaps.
## Community and contribution
**Imran (INM)**: Kubernetes values community over products. Any recommendation
for anyone looking into getting involved in SIG Instrumentation work? Where
should they start (new contributor-friendly areas within SIG?)
**Han(HK) and Elana (EH)**: Come to our bi-weekly triage
[meetings](https://github.com/kubernetes/community/tree/master/sig-instrumentation#meetings)!
They arent recorded and are a great place to ask questions and learn about our ongoing work.
We strive to be a friendly community and one of the easiest SIGs to get started with.
You can check out our latest KubeCon NA 2022 [SIG Instrumentation Deep Dive](https://youtu.be/JIzrlWtAA8Y)
to get more insight into our work. We also invite you to join our Slack channel #sig-instrumentation
and feel free to reach out to any of our SIG leads or subproject owners directly.
Thank you so much for your time and insights into the workings of SIG Instrumentation!

View File

@ -0,0 +1,50 @@
---
layout: blog
title: "k8s.gcr.io Image Registry Will Be Frozen From the 3rd of April 2023"
date: 2023-02-06
slug: k8s-gcr-io-freeze-announcement
---
**Authors**: Mahamed Ali (Rackspace Technology)
The Kubernetes project runs a community-owned image registry called `registry.k8s.io` to host its container images. On the 3rd of April 2023, the old registry `k8s.gcr.io` will be frozen and no further images for Kubernetes and related subprojects will be pushed to the old registry.
This registry `registry.k8s.io` replaced the old one and has been generally available for several months. We have published a [blog post](/blog/2022/11/28/registry-k8s-io-faster-cheaper-ga/) about its benefits to the community and the Kubernetes project. This post also announced that future versions of Kubernetes will not be available in the old registry. Now that time has come.
What does this change mean for contributors:
- If you are a maintainer of a subproject, you will need to update your manifests and Helm charts to use the new registry.
What does this change mean for end users:
- 1.27 Kubernetes release will not be published to the old registry.
- Patch releases for 1.24, 1.25, and 1.26 will no longer be published to the old registry from April. Please read the timelines below for details of the final patch releases in the old registry.
- Starting in 1.25, the default image registry has been set to `registry.k8s.io`. This value is overridable in `kubeadm` and `kubelet` but setting it to `k8s.gcr.io` will fail for new releases after April as they wont be present in the old registry.
- If you want to increase the reliability of your cluster and remove dependency on the community-owned registry or you are running Kubernetes in networks where external traffic is restricted, you should consider hosting local image registry mirrors. Some cloud vendors may offer hosted solutions for this.
## Timeline of the changes
- `k8s.gcr.io` will be frozen on the 3rd of April 2023
- 1.27 is expected to be released on the 12th of April 2023
- The last 1.23 release on `k8s.gcr.io` will be 1.23.18 (1.23 goes end-of-life before the freeze)
- The last 1.24 release on `k8s.gcr.io` will be 1.24.12
- The last 1.25 release on `k8s.gcr.io` will be 1.25.8
- The last 1.26 release on `k8s.gcr.io` will be 1.26.3
## What's next
Please make sure your cluster does not have dependencies on old image registry. For example, you can run this command to list the images used by pods:
```shell
kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec.containers[*].image}" |\
tr -s '[[:space:]]' '\n' |\
sort |\
uniq -c
```
There may be other dependencies on the old image registry. Make sure you review any potential dependencies to keep your cluster healthy and up to date.
## Acknowledgments
__Change is hard__, and evolving our image-serving platform is needed to ensure a sustainable future for the project. We strive to make things better for everyone using Kubernetes. Many contributors from all corners of our community have been working long and hard to ensure we are making the best decisions possible, executing plans, and doing our best to communicate those plans.
Thanks to Aaron Crickenberger, Arnaud Meukam, Benjamin Elder, Caleb Woodbine, Davanum Srinivas, Mahamed Ali, and Tim Hockin from SIG K8s Infra, Brian McQueen, and Sergey Kanzhelev from SIG Node, Lubomir Ivanov from SIG Cluster Lifecycle, Adolfo García Veytia, Jeremy Rickard, Sascha Grunert, and Stephen Augustus from SIG Release, Bob Killen and Kaslin Fields from SIG Contribex, Tim Allclair from the Security Response Committee. Also a big thank you to our friends acting as liaisons with our cloud provider partners: Jay Pipes from Amazon and Jon Johnson Jr. from Google.

View File

@ -0,0 +1,36 @@
---
layout: blog
title: "Free Katacoda Kubernetes Tutorials Are Shutting Down"
date: 2023-02-14
slug: kubernetes-katacoda-tutorials-stop-from-2023-03-31
evergreen: true
---
**Author**: Natali Vlatko, SIG Docs Co-Chair for Kubernetes
[Katacoda](https://katacoda.com/kubernetes), the popular learning platform from OReilly that has been helping people learn all about
Java, Docker, Kubernetes, Python, Go, C++, and more, [shut down for public use in June 2022](https://www.oreilly.com/online-learning/leveraging-katacoda-technology.html).
However, tutorials specifically for Kubernetes, linked from the Kubernetes website for our projects
users and contributors, remained available and active after this change. Unfortunately, this will no
longer be the case, and Katacoda tutorials for learning Kubernetes will cease working after March 31st, 2023.
The Kubernetes Project wishes to thank O'Reilly Media for the many years it has supported the community
via the Katacoda learning platform. You can read more about [the decision to shutter katacoda.com](https://www.oreilly.com/online-learning/leveraging-katacoda-technology.html)
on O'Reilly's own site. With this change, well be focusing on the work needed to remove links to
their various tutorials. We have a general issue tracking this topic at [#33936](https://github.com/kubernetes/website/issues/33936) and [GitHub discussion](https://github.com/kubernetes/website/discussions/38878). Were also
interested in researching what other learning platforms could be beneficial for the Kubernetes community,
replacing Katacoda with a link to a platform or service that has a similar user experience. However,
this research will take time, so were actively looking for volunteers to help with this work.
If a replacement is found, it will need to be supported by Kubernetes leadership, specifically,
SIG Contributor Experience, SIG Docs, and the Kubernetes Steering Committee.
The Katacoda shutdown affects 25 tutorial pages, their localizations, as well as the Katacoda
Scenario repository: [github.com/katacoda-scenarios/kubernetes-bootcamp-scenarios](https://github.com/katacoda-scenarios/kubernetes-bootcamp-scenarios). We recommend
that any links, guides, or documentation you have that points to the Katacoda learning platform be
updated immediately to reflect this change. While we have yet to find a replacement learning solution,
the Kubernetes website contains a lot of helpful documentation to support your continued learning and growth.
You can find all of our available documentation tutorials for Kubernetes at https://k8s.io/docs/tutorials/.
If you have any questions regarding the Katacoda shutdown, or subsequent link removal from Kubernetes
tutorial pages, please feel free to comment on the [general issue tracking the shutdown](https://github.com/kubernetes/website/issues/33936),
or visit the #sig-docs channel on the Kubernetes Slack.

View File

@ -103,6 +103,8 @@ updated to newer versions that support cgroup v2. For example:
* If you run [cAdvisor](https://github.com/google/cadvisor) as a stand-alone
DaemonSet for monitoring pods and containers, update it to v0.43.0 or later.
* If you use JDK, prefer to use JDK 11.0.16 and later or JDK 15 and later, which [fully support cgroup v2](https://bugs.openjdk.org/browse/JDK-8230305).
* If you are using the [uber-go/automaxprocs](https://github.com/uber-go/automaxprocs) package, make sure
the version you use is v1.5.1 or higher.
## Identify the cgroup version on Linux Nodes {#check-cgroup-version}

View File

@ -144,7 +144,7 @@ which you can define:
* `MinAge`: the minimum age at which the kubelet can garbage collect a
container. Disable by setting to `0`.
* `MaxPerPodContainer`: the maximum number of dead containers each Pod pair
* `MaxPerPodContainer`: the maximum number of dead containers each Pod
can have. Disable by setting to less than `0`.
* `MaxContainers`: the maximum number of dead containers the cluster can have.
Disable by setting to less than `0`.

View File

@ -6,30 +6,32 @@ weight: 30
<!-- overview -->
Distributed systems often have a need for "leases", which provides a mechanism to lock shared resources and coordinate activity between nodes.
In Kubernetes, the "lease" concept is represented by `Lease` objects in the `coordination.k8s.io` API group, which are used for system-critical
capabilities like node heart beats and component-level leader election.
Distributed systems often have a need for _leases_, which provide a mechanism to lock shared resources
and coordinate activity between members of a set.
In Kubernetes, the lease concept is represented by [Lease](/docs/reference/kubernetes-api/cluster-resources/lease-v1/)
objects in the `coordination.k8s.io` {{< glossary_tooltip text="API Group" term_id="api-group" >}},
which are used for system-critical capabilities such as node heartbeats and component-level leader election.
<!-- body -->
## Node Heart Beats
## Node heartbeats {#node-heart-beats}
Kubernetes uses the Lease API to communicate kubelet node heart beats to the Kubernetes API server.
Kubernetes uses the Lease API to communicate kubelet node heartbeats to the Kubernetes API server.
For every `Node` , there is a `Lease` object with a matching name in the `kube-node-lease`
namespace. Under the hood, every kubelet heart beat is an UPDATE request to this `Lease` object, updating
namespace. Under the hood, every kubelet heartbeat is an **update** request to this `Lease` object, updating
the `spec.renewTime` field for the Lease. The Kubernetes control plane uses the time stamp of this field
to determine the availability of this `Node`.
See [Node Lease objects](/docs/concepts/architecture/nodes/#heartbeats) for more details.
## Leader Election
## Leader election
Leases are also used in Kubernetes to ensure only one instance of a component is running at any given time.
Kubernetes also uses Leases to ensure only one instance of a component is running at any given time.
This is used by control plane components like `kube-controller-manager` and `kube-scheduler` in
HA configurations, where only one instance of the component should be actively running while the other
instances are on stand-by.
## API Server Identity
## API server identity
{{< feature-state for_k8s_version="v1.26" state="beta" >}}
@ -43,22 +45,23 @@ You can inspect Leases owned by each kube-apiserver by checking for lease object
with the name `kube-apiserver-<sha256-hash>`. Alternatively you can use the label selector `k8s.io/component=kube-apiserver`:
```shell
$ kubectl -n kube-system get lease -l k8s.io/component=kube-apiserver
kubectl -n kube-system get lease -l k8s.io/component=kube-apiserver
```
```
NAME HOLDER AGE
kube-apiserver-c4vwjftbvpc5os2vvzle4qg27a kube-apiserver-c4vwjftbvpc5os2vvzle4qg27a_9cbf54e5-1136-44bd-8f9a-1dcd15c346b4 5m33s
kube-apiserver-dz2dqprdpsgnm756t5rnov7yka kube-apiserver-dz2dqprdpsgnm756t5rnov7yka_84f2a85d-37c1-4b14-b6b9-603e62e4896f 4m23s
kube-apiserver-fyloo45sdenffw2ugwaz3likua kube-apiserver-fyloo45sdenffw2ugwaz3likua_c5ffa286-8a9a-45d4-91e7-61118ed58d2e 4m43s
```
The SHA256 hash used in the lease name is based on the OS hostname as seen by kube-apiserver. Each kube-apiserver should be
The SHA256 hash used in the lease name is based on the OS hostname as seen by that API server. Each kube-apiserver should be
configured to use a hostname that is unique within the cluster. New instances of kube-apiserver that use the same hostname
will take over existing Leases using a new holder identity, as opposed to instantiating new lease objects. You can check the
will take over existing Leases using a new holder identity, as opposed to instantiating new Lease objects. You can check the
hostname used by kube-apisever by checking the value of the `kubernetes.io/hostname` label:
```shell
$ kubectl -n kube-system get lease kube-apiserver-c4vwjftbvpc5os2vvzle4qg27a -o yaml
kubectl -n kube-system get lease kube-apiserver-c4vwjftbvpc5os2vvzle4qg27a -o yaml
```
```yaml
apiVersion: coordination.k8s.io/v1
kind: Lease
@ -78,3 +81,23 @@ spec:
```
Expired leases from kube-apiservers that no longer exist are garbage collected by new kube-apiservers after 1 hour.
You can disable API server identity leases by disabling the `APIServerIdentity`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
## Workloads {#custom-workload}
Your own workload can define its own use of Leases. For example, you might run a custom
{{< glossary_tooltip term_id="controller" text="controller" >}} where a primary or leader member
performs operations that its peers do not. You define a Lease so that the controller replicas can select
or elect a leader, using the Kubernetes API for coordination.
If you do use a Lease, it's a good practice to define a name for the Lease that is obviously linked to
the product or component. For example, if you have a component named Example Foo, use a Lease named
`example-foo`.
If a cluster operator or another end user could deploy multiple instances of a component, select a name
prefix and pick a mechanism (such as hash of the name of the Deployment) to avoid name collisions
for the Leases.
You can use another approach so long as it achieves the same outcome: different software products do
not conflict with one another.

View File

@ -9,7 +9,7 @@ weight: 10
<!-- overview -->
Kubernetes runs your workload by placing containers into Pods to run on _Nodes_.
Kubernetes runs your {{< glossary_tooltip text="workload" term_id="workload" >}} by placing containers into Pods to run on _Nodes_.
A node may be a virtual or physical machine, depending on the cluster. Each node
is managed by the
{{< glossary_tooltip text="control plane" term_id="control-plane" >}}
@ -274,7 +274,7 @@ availability of each node, and to take action when failures are detected.
For nodes there are two forms of heartbeats:
* updates to the `.status` of a Node
* [Lease](/docs/reference/kubernetes-api/cluster-resources/lease-v1/) objects
* [Lease](/docs/concepts/architecture/leases/) objects
within the `kube-node-lease`
{{< glossary_tooltip term_id="namespace" text="namespace">}}.
Each Node has an associated Lease object.
@ -563,7 +563,7 @@ ShutdownGracePeriodCriticalPods are not configured properly. Please refer to abo
section [Graceful Node Shutdown](#graceful-node-shutdown) for more details.
When a node is shutdown but not detected by kubelet's Node Shutdown Manager, the pods
that are part of a StatefulSet will be stuck in terminating status on
that are part of a {{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}} will be stuck in terminating status on
the shutdown node and cannot move to a new running node. This is because kubelet on
the shutdown node is not available to delete the pods so the StatefulSet cannot
create a new pod with the same name. If there are volumes used by the pods, the
@ -577,7 +577,7 @@ these pods will be stuck in terminating status on the shutdown node forever.
To mitigate the above situation, a user can manually add the taint `node.kubernetes.io/out-of-service` with either `NoExecute`
or `NoSchedule` effect to a Node marking it out-of-service.
If the `NodeOutOfServiceVolumeDetach`[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
is enabled on `kube-controller-manager`, and a Node is marked out-of-service with this taint, the
is enabled on {{< glossary_tooltip text="kube-controller-manager" term_id="kube-controller-manager" >}}, and a Node is marked out-of-service with this taint, the
pods on the node will be forcefully deleted if there are no matching tolerations on it and volume
detach operations for the pods terminating on the node will happen immediately. This allows the
Pods on the out-of-service node to recover quickly on a different node.
@ -646,9 +646,11 @@ see [KEP-2400](https://github.com/kubernetes/enhancements/issues/2400) and its
## {{% heading "whatsnext" %}}
* Learn about the [components](/docs/concepts/overview/components/#node-components) that make up a node.
* Read the [API definition for Node](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#node-v1-core).
* Read the [Node](https://git.k8s.io/design-proposals-archive/architecture/architecture.md#the-kubernetes-node)
section of the architecture design document.
* Read about [taints and tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/).
Learn more about the following:
* [Components](/docs/concepts/overview/components/#node-components) that make up a node.
* [API definition for Node](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#node-v1-core).
* [Node](https://git.k8s.io/design-proposals-archive/architecture/architecture.md#the-kubernetes-node) section of the architecture design document.
* [Taints and Tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/).
* [Node Resource Managers](/docs/concepts/policy/node-resource-managers/).
* [Resource Management for Windows nodes](/docs/concepts/configuration/windows-resource-management/).

View File

@ -17,11 +17,11 @@ This page lists some of the available add-ons and links to their respective inst
## Networking and Network Policy
* [ACI](https://www.github.com/noironetworks/aci-containers) provides integrated container networking and network security with Cisco ACI.
* [Antrea](https://antrea.io/) operates at Layer 3/4 to provide networking and security services for Kubernetes, leveraging Open vSwitch as the networking data plane.
* [Antrea](https://antrea.io/) operates at Layer 3/4 to provide networking and security services for Kubernetes, leveraging Open vSwitch as the networking data plane. Antrea is a [CNCF project at the Sandbox level](https://www.cncf.io/projects/antrea/).
* [Calico](https://docs.projectcalico.org/latest/introduction/) is a networking and network policy provider. Calico supports a flexible set of networking options so you can choose the most efficient option for your situation, including non-overlay and overlay networks, with or without BGP. Calico uses the same engine to enforce network policy for hosts, pods, and (if using Istio & Envoy) applications at the service mesh layer.
* [Canal](https://projectcalico.docs.tigera.io/getting-started/kubernetes/flannel/flannel) unites Flannel and Calico, providing networking and network policy.
* [Cilium](https://github.com/cilium/cilium) is a networking, observability, and security solution with an eBPF-based data plane. Cilium provides a simple flat Layer 3 network with the ability to span multiple clusters in either a native routing or overlay/encapsulation mode, and can enforce network policies on L3-L7 using an identity-based security model that is decoupled from network addressing. Cilium can act as a replacement for kube-proxy; it also offers additional, opt-in observability and security features.
* [CNI-Genie](https://github.com/cni-genie/CNI-Genie) enables Kubernetes to seamlessly connect to a choice of CNI plugins, such as Calico, Canal, Flannel, or Weave.
* [Cilium](https://github.com/cilium/cilium) is a networking, observability, and security solution with an eBPF-based data plane. Cilium provides a simple flat Layer 3 network with the ability to span multiple clusters in either a native routing or overlay/encapsulation mode, and can enforce network policies on L3-L7 using an identity-based security model that is decoupled from network addressing. Cilium can act as a replacement for kube-proxy; it also offers additional, opt-in observability and security features. Cilium is a [CNCF project at the Incubation level](https://www.cncf.io/projects/cilium/).
* [CNI-Genie](https://github.com/cni-genie/CNI-Genie) enables Kubernetes to seamlessly connect to a choice of CNI plugins, such as Calico, Canal, Flannel, or Weave. CNI-Genie is a [CNCF project at the Sandbox level](https://www.cncf.io/projects/cni-genie/).
* [Contiv](https://contivpp.io/) provides configurable networking (native L3 using BGP, overlay using vxlan, classic L2, and Cisco-SDN/ACI) for various use cases and a rich policy framework. Contiv project is fully [open sourced](https://github.com/contiv). The [installer](https://github.com/contiv/install) provides both kubeadm and non-kubeadm based installation options.
* [Contrail](https://www.juniper.net/us/en/products-services/sdn/contrail/contrail-networking/), based on [Tungsten Fabric](https://tungsten.io), is an open source, multi-cloud network virtualization and policy management platform. Contrail and Tungsten Fabric are integrated with orchestration systems such as Kubernetes, OpenShift, OpenStack and Mesos, and provide isolation modes for virtual machines, containers/pods and bare metal workloads.
* [Flannel](https://github.com/flannel-io/flannel#deploying-flannel-manually) is an overlay network provider that can be used with Kubernetes.

View File

@ -638,6 +638,10 @@ poorly-behaved workloads that may be harming system health.
standard deviation of seat demand seen during the last concurrency
borrowing adjustment period.
* `apiserver_flowcontrol_demand_seats_smoothed` is a gauge vector
holding, for each priority level, the smoothed enveloped seat demand
determined at the last concurrency adjustment.
* `apiserver_flowcontrol_target_seats` is a gauge vector holding, for
each priority level, the concurrency target going into the borrowing
allocation problem.
@ -701,14 +705,15 @@ serves the following additional paths at its HTTP[S] ports.
The output is similar to this:
```none
PriorityLevelName, ActiveQueues, IsIdle, IsQuiescing, WaitingRequests, ExecutingRequests,
workload-low, 0, true, false, 0, 0,
global-default, 0, true, false, 0, 0,
exempt, <none>, <none>, <none>, <none>, <none>,
catch-all, 0, true, false, 0, 0,
system, 0, true, false, 0, 0,
leader-election, 0, true, false, 0, 0,
workload-high, 0, true, false, 0, 0,
PriorityLevelName, ActiveQueues, IsIdle, IsQuiescing, WaitingRequests, ExecutingRequests, DispatchedRequests, RejectedRequests, TimedoutRequests, CancelledRequests
catch-all, 0, true, false, 0, 0, 1, 0, 0, 0
exempt, <none>, <none>, <none>, <none>, <none>, <none>, <none>, <none>, <none>
global-default, 0, true, false, 0, 0, 46, 0, 0, 0
leader-election, 0, true, false, 0, 0, 4, 0, 0, 0
node-high, 0, true, false, 0, 0, 34, 0, 0, 0
system, 0, true, false, 0, 0, 48, 0, 0, 0
workload-high, 0, true, false, 0, 0, 500, 0, 0, 0
workload-low, 0, true, false, 0, 0, 0, 0, 0, 0
```
- `/debug/api_priority_and_fairness/dump_queues` - a listing of all the
@ -761,7 +766,34 @@ serves the following additional paths at its HTTP[S] ports.
system, system-nodes, 12, 0, system:node:127.0.0.1, 2020-07-23T15:31:03.583823404Z, system:node:127.0.0.1, create, /api/v1/namespaces/scaletest/configmaps,
system, system-nodes, 12, 1, system:node:127.0.0.1, 2020-07-23T15:31:03.594555947Z, system:node:127.0.0.1, create, /api/v1/namespaces/scaletest/configmaps,
```
### Debug logging
At `-v=3` or more verbose the server outputs an httplog line for every
request, and it includes the following attributes.
- `apf_fs`: the name of the flow schema to which the request was classified.
- `apf_pl`: the name of the priority level for that flow schema.
- `apf_iseats`: the number of seats determined for the initial
(normal) stage of execution of the request.
- `apf_fseats`: the number of seats determined for the final stage of
execution (accounting for the associated WATCH notifications) of the
request.
- `apf_additionalLatency`: the duration of the final stage of
execution of the request.
At higher levels of verbosity there will be log lines exposing details
of how APF handled the request, primarily for debug purposes.
### Response headers
APF adds the following two headers to each HTTP response message.
- `X-Kubernetes-PF-FlowSchema-UID` holds the UID of the FlowSchema
object to which the corresponding request was classified.
- `X-Kubernetes-PF-PriorityLevel-UID` holds the UID of the
PriorityLevelConfiguration object associated with that FlowSchema.
## {{% heading "whatsnext" %}}

View File

@ -1,20 +1,26 @@
---
reviewers:
- janetkuo
title: Managing Resources
content_type: concept
reviewers:
- janetkuo
weight: 40
---
<!-- overview -->
You've deployed your application and exposed it via a service. Now what? Kubernetes provides a number of tools to help you manage your application deployment, including scaling and updating. Among the features that we will discuss in more depth are [configuration files](/docs/concepts/configuration/overview/) and [labels](/docs/concepts/overview/working-with-objects/labels/).
You've deployed your application and exposed it via a service. Now what? Kubernetes provides a
number of tools to help you manage your application deployment, including scaling and updating.
Among the features that we will discuss in more depth are
[configuration files](/docs/concepts/configuration/overview/) and
[labels](/docs/concepts/overview/working-with-objects/labels/).
<!-- body -->
## Organizing resource configurations
Many applications require multiple resources to be created, such as a Deployment and a Service. Management of multiple resources can be simplified by grouping them together in the same file (separated by `---` in YAML). For example:
Many applications require multiple resources to be created, such as a Deployment and a Service.
Management of multiple resources can be simplified by grouping them together in the same file
(separated by `---` in YAML). For example:
{{< codenew file="application/nginx-app.yaml" >}}
@ -24,81 +30,99 @@ Multiple resources can be created the same way as a single resource:
kubectl apply -f https://k8s.io/examples/application/nginx-app.yaml
```
```shell
```none
service/my-nginx-svc created
deployment.apps/my-nginx created
```
The resources will be created in the order they appear in the file. Therefore, it's best to specify the service first, since that will ensure the scheduler can spread the pods associated with the service as they are created by the controller(s), such as Deployment.
The resources will be created in the order they appear in the file. Therefore, it's best to
specify the service first, since that will ensure the scheduler can spread the pods associated
with the service as they are created by the controller(s), such as Deployment.
`kubectl apply` also accepts multiple `-f` arguments:
```shell
kubectl apply -f https://k8s.io/examples/application/nginx/nginx-svc.yaml -f https://k8s.io/examples/application/nginx/nginx-deployment.yaml
kubectl apply -f https://k8s.io/examples/application/nginx/nginx-svc.yaml \
-f https://k8s.io/examples/application/nginx/nginx-deployment.yaml
```
It is a recommended practice to put resources related to the same microservice or application tier into the same file, and to group all of the files associated with your application in the same directory. If the tiers of your application bind to each other using DNS, you can deploy all of the components of your stack together.
A URL can also be specified as a configuration source, which is handy for deploying directly from configuration files checked into GitHub:
It is a recommended practice to put resources related to the same microservice or application tier
into the same file, and to group all of the files associated with your application in the same
directory. If the tiers of your application bind to each other using DNS, you can deploy all of
the components of your stack together.
A URL can also be specified as a configuration source, which is handy for deploying directly from
configuration files checked into GitHub:
```shell
kubectl apply -f https://raw.githubusercontent.com/kubernetes/website/main/content/en/examples/application/nginx/nginx-deployment.yaml
kubectl apply -f https://k8s.io/examples/application/nginx/nginx-deployment.yaml
```
```shell
```none
deployment.apps/my-nginx created
```
## Bulk operations in kubectl
Resource creation isn't the only operation that `kubectl` can perform in bulk. It can also extract resource names from configuration files in order to perform other operations, in particular to delete the same resources you created:
Resource creation isn't the only operation that `kubectl` can perform in bulk. It can also extract
resource names from configuration files in order to perform other operations, in particular to
delete the same resources you created:
```shell
kubectl delete -f https://k8s.io/examples/application/nginx-app.yaml
```
```shell
```none
deployment.apps "my-nginx" deleted
service "my-nginx-svc" deleted
```
In the case of two resources, you can specify both resources on the command line using the resource/name syntax:
In the case of two resources, you can specify both resources on the command line using the
resource/name syntax:
```shell
kubectl delete deployments/my-nginx services/my-nginx-svc
```
For larger numbers of resources, you'll find it easier to specify the selector (label query) specified using `-l` or `--selector`, to filter resources by their labels:
For larger numbers of resources, you'll find it easier to specify the selector (label query)
specified using `-l` or `--selector`, to filter resources by their labels:
```shell
kubectl delete deployment,services -l app=nginx
```
```shell
```none
deployment.apps "my-nginx" deleted
service "my-nginx-svc" deleted
```
Because `kubectl` outputs resource names in the same syntax it accepts, you can chain operations using `$()` or `xargs`:
Because `kubectl` outputs resource names in the same syntax it accepts, you can chain operations
using `$()` or `xargs`:
```shell
kubectl get $(kubectl create -f docs/concepts/cluster-administration/nginx/ -o name | grep service)
kubectl create -f docs/concepts/cluster-administration/nginx/ -o name | grep service | xargs -i kubectl get {}
```
```shell
```none
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-nginx-svc LoadBalancer 10.0.0.208 <pending> 80/TCP 0s
```
With the above commands, we first create resources under `examples/application/nginx/` and print the resources created with `-o name` output format
(print each resource as resource/name). Then we `grep` only the "service", and then print it with `kubectl get`.
With the above commands, we first create resources under `examples/application/nginx/` and print
the resources created with `-o name` output format (print each resource as resource/name).
Then we `grep` only the "service", and then print it with `kubectl get`.
If you happen to organize your resources across several subdirectories within a particular directory, you can recursively perform the operations on the subdirectories also, by specifying `--recursive` or `-R` alongside the `--filename,-f` flag.
If you happen to organize your resources across several subdirectories within a particular
directory, you can recursively perform the operations on the subdirectories also, by specifying
`--recursive` or `-R` alongside the `--filename,-f` flag.
For instance, assume there is a directory `project/k8s/development` that holds all of the {{< glossary_tooltip text="manifests" term_id="manifest" >}} needed for the development environment, organized by resource type:
For instance, assume there is a directory `project/k8s/development` that holds all of the
{{< glossary_tooltip text="manifests" term_id="manifest" >}} needed for the development environment,
organized by resource type:
```
```none
project/k8s/development
├── configmap
│   └── my-configmap.yaml
@ -108,13 +132,15 @@ project/k8s/development
└── my-pvc.yaml
```
By default, performing a bulk operation on `project/k8s/development` will stop at the first level of the directory, not processing any subdirectories. If we had tried to create the resources in this directory using the following command, we would have encountered an error:
By default, performing a bulk operation on `project/k8s/development` will stop at the first level
of the directory, not processing any subdirectories. If we had tried to create the resources in
this directory using the following command, we would have encountered an error:
```shell
kubectl apply -f project/k8s/development
```
```shell
```none
error: you must provide one or more resources by argument or filename (.json|.yaml|.yml|stdin)
```
@ -124,13 +150,14 @@ Instead, specify the `--recursive` or `-R` flag with the `--filename,-f` flag as
kubectl apply -f project/k8s/development --recursive
```
```shell
```none
configmap/my-config created
deployment.apps/my-deployment created
persistentvolumeclaim/my-pvc created
```
The `--recursive` flag works with any operation that accepts the `--filename,-f` flag such as: `kubectl {create,get,delete,describe,rollout}` etc.
The `--recursive` flag works with any operation that accepts the `--filename,-f` flag such as:
`kubectl {create,get,delete,describe,rollout}` etc.
The `--recursive` flag also works when multiple `-f` arguments are provided:
@ -138,7 +165,7 @@ The `--recursive` flag also works when multiple `-f` arguments are provided:
kubectl apply -f project/k8s/namespaces -f project/k8s/development --recursive
```
```shell
```none
namespace/development created
namespace/staging created
configmap/my-config created
@ -146,36 +173,41 @@ deployment.apps/my-deployment created
persistentvolumeclaim/my-pvc created
```
If you're interested in learning more about `kubectl`, go ahead and read [Command line tool (kubectl)](/docs/reference/kubectl/).
If you're interested in learning more about `kubectl`, go ahead and read
[Command line tool (kubectl)](/docs/reference/kubectl/).
## Using labels effectively
The examples we've used so far apply at most a single label to any resource. There are many scenarios where multiple labels should be used to distinguish sets from one another.
The examples we've used so far apply at most a single label to any resource. There are many
scenarios where multiple labels should be used to distinguish sets from one another.
For instance, different applications would use different values for the `app` label, but a multi-tier application, such as the [guestbook example](https://github.com/kubernetes/examples/tree/master/guestbook/), would additionally need to distinguish each tier. The frontend could carry the following labels:
For instance, different applications would use different values for the `app` label, but a
multi-tier application, such as the [guestbook example](https://github.com/kubernetes/examples/tree/master/guestbook/),
would additionally need to distinguish each tier. The frontend could carry the following labels:
```yaml
labels:
app: guestbook
tier: frontend
labels:
app: guestbook
tier: frontend
```
while the Redis master and slave would have different `tier` labels, and perhaps even an additional `role` label:
while the Redis master and slave would have different `tier` labels, and perhaps even an
additional `role` label:
```yaml
labels:
app: guestbook
tier: backend
role: master
labels:
app: guestbook
tier: backend
role: master
```
and
```yaml
labels:
app: guestbook
tier: backend
role: slave
labels:
app: guestbook
tier: backend
role: slave
```
The labels allow us to slice and dice our resources along any dimension specified by a label:
@ -185,7 +217,7 @@ kubectl apply -f examples/guestbook/all-in-one/guestbook-all-in-one.yaml
kubectl get pods -Lapp -Ltier -Lrole
```
```shell
```none
NAME READY STATUS RESTARTS AGE APP TIER ROLE
guestbook-fe-4nlpb 1/1 Running 0 1m guestbook frontend <none>
guestbook-fe-ght6d 1/1 Running 0 1m guestbook frontend <none>
@ -200,7 +232,8 @@ my-nginx-o0ef1 1/1 Running 0 29m nginx
```shell
kubectl get pods -lapp=guestbook,role=slave
```
```shell
```none
NAME READY STATUS RESTARTS AGE
guestbook-redis-slave-2q2yf 1/1 Running 0 3m
guestbook-redis-slave-qgazl 1/1 Running 0 3m
@ -208,62 +241,72 @@ guestbook-redis-slave-qgazl 1/1 Running 0 3m
## Canary deployments
Another scenario where multiple labels are needed is to distinguish deployments of different releases or configurations of the same component. It is common practice to deploy a *canary* of a new application release (specified via image tag in the pod template) side by side with the previous release so that the new release can receive live production traffic before fully rolling it out.
Another scenario where multiple labels are needed is to distinguish deployments of different
releases or configurations of the same component. It is common practice to deploy a *canary* of a
new application release (specified via image tag in the pod template) side by side with the
previous release so that the new release can receive live production traffic before fully rolling
it out.
For instance, you can use a `track` label to differentiate different releases.
The primary, stable release would have a `track` label with value as `stable`:
```yaml
name: frontend
replicas: 3
...
labels:
app: guestbook
tier: frontend
track: stable
...
image: gb-frontend:v3
```none
name: frontend
replicas: 3
...
labels:
app: guestbook
tier: frontend
track: stable
...
image: gb-frontend:v3
```
and then you can create a new release of the guestbook frontend that carries the `track` label with different value (i.e. `canary`), so that two sets of pods would not overlap:
and then you can create a new release of the guestbook frontend that carries the `track` label
with different value (i.e. `canary`), so that two sets of pods would not overlap:
```yaml
name: frontend-canary
replicas: 1
...
labels:
app: guestbook
tier: frontend
track: canary
...
image: gb-frontend:v4
```none
name: frontend-canary
replicas: 1
...
labels:
app: guestbook
tier: frontend
track: canary
...
image: gb-frontend:v4
```
The frontend service would span both sets of replicas by selecting the common subset of their labels (i.e. omitting the `track` label), so that the traffic will be redirected to both applications:
The frontend service would span both sets of replicas by selecting the common subset of their
labels (i.e. omitting the `track` label), so that the traffic will be redirected to both
applications:
```yaml
selector:
app: guestbook
tier: frontend
selector:
app: guestbook
tier: frontend
```
You can tweak the number of replicas of the stable and canary releases to determine the ratio of each release that will receive live production traffic (in this case, 3:1).
Once you're confident, you can update the stable track to the new application release and remove the canary one.
You can tweak the number of replicas of the stable and canary releases to determine the ratio of
each release that will receive live production traffic (in this case, 3:1).
Once you're confident, you can update the stable track to the new application release and remove
the canary one.
For a more concrete example, check the [tutorial of deploying Ghost](https://github.com/kelseyhightower/talks/tree/master/kubecon-eu-2016/demo#deploy-a-canary).
For a more concrete example, check the
[tutorial of deploying Ghost](https://github.com/kelseyhightower/talks/tree/master/kubecon-eu-2016/demo#deploy-a-canary).
## Updating labels
Sometimes existing pods and other resources need to be relabeled before creating new resources. This can be done with `kubectl label`.
Sometimes existing pods and other resources need to be relabeled before creating new resources.
This can be done with `kubectl label`.
For example, if you want to label all your nginx pods as frontend tier, run:
```shell
kubectl label pods -l app=nginx tier=fe
```
```shell
```none
pod/my-nginx-2035384211-j5fhi labeled
pod/my-nginx-2035384211-u2c7e labeled
pod/my-nginx-2035384211-u3t6x labeled
@ -275,20 +318,25 @@ To see the pods you labeled, run:
```shell
kubectl get pods -l app=nginx -L tier
```
```shell
```none
NAME READY STATUS RESTARTS AGE TIER
my-nginx-2035384211-j5fhi 1/1 Running 0 23m fe
my-nginx-2035384211-u2c7e 1/1 Running 0 23m fe
my-nginx-2035384211-u3t6x 1/1 Running 0 23m fe
```
This outputs all "app=nginx" pods, with an additional label column of pods' tier (specified with `-L` or `--label-columns`).
This outputs all "app=nginx" pods, with an additional label column of pods' tier (specified with
`-L` or `--label-columns`).
For more information, please see [labels](/docs/concepts/overview/working-with-objects/labels/) and [kubectl label](/docs/reference/generated/kubectl/kubectl-commands/#label).
For more information, please see [labels](/docs/concepts/overview/working-with-objects/labels/)
and [kubectl label](/docs/reference/generated/kubectl/kubectl-commands/#label).
## Updating annotations
Sometimes you would want to attach annotations to resources. Annotations are arbitrary non-identifying metadata for retrieval by API clients such as tools, libraries, etc. This can be done with `kubectl annotate`. For example:
Sometimes you would want to attach annotations to resources. Annotations are arbitrary
non-identifying metadata for retrieval by API clients such as tools, libraries, etc.
This can be done with `kubectl annotate`. For example:
```shell
kubectl annotate pods my-nginx-v4-9gw19 description='my frontend running nginx'
@ -304,17 +352,19 @@ metadata:
...
```
For more information, please see [annotations](/docs/concepts/overview/working-with-objects/annotations/) and [kubectl annotate](/docs/reference/generated/kubectl/kubectl-commands/#annotate) document.
For more information, see [annotations](/docs/concepts/overview/working-with-objects/annotations/)
and [kubectl annotate](/docs/reference/generated/kubectl/kubectl-commands/#annotate) document.
## Scaling your application
When load on your application grows or shrinks, use `kubectl` to scale your application. For instance, to decrease the number of nginx replicas from 3 to 1, do:
When load on your application grows or shrinks, use `kubectl` to scale your application.
For instance, to decrease the number of nginx replicas from 3 to 1, do:
```shell
kubectl scale deployment/my-nginx --replicas=1
```
```shell
```none
deployment.apps/my-nginx scaled
```
@ -324,25 +374,27 @@ Now you only have one pod managed by the deployment.
kubectl get pods -l app=nginx
```
```shell
```none
NAME READY STATUS RESTARTS AGE
my-nginx-2035384211-j5fhi 1/1 Running 0 30m
```
To have the system automatically choose the number of nginx replicas as needed, ranging from 1 to 3, do:
To have the system automatically choose the number of nginx replicas as needed,
ranging from 1 to 3, do:
```shell
kubectl autoscale deployment/my-nginx --min=1 --max=3
```
```shell
```none
horizontalpodautoscaler.autoscaling/my-nginx autoscaled
```
Now your nginx replicas will be scaled up and down as needed, automatically.
For more information, please see [kubectl scale](/docs/reference/generated/kubectl/kubectl-commands/#scale), [kubectl autoscale](/docs/reference/generated/kubectl/kubectl-commands/#autoscale) and [horizontal pod autoscaler](/docs/tasks/run-application/horizontal-pod-autoscale/) document.
For more information, please see [kubectl scale](/docs/reference/generated/kubectl/kubectl-commands/#scale),
[kubectl autoscale](/docs/reference/generated/kubectl/kubectl-commands/#autoscale) and
[horizontal pod autoscaler](/docs/tasks/run-application/horizontal-pod-autoscale/) document.
## In-place updates of resources
@ -353,20 +405,34 @@ Sometimes it's necessary to make narrow, non-disruptive updates to resources you
It is suggested to maintain a set of configuration files in source control
(see [configuration as code](https://martinfowler.com/bliki/InfrastructureAsCode.html)),
so that they can be maintained and versioned along with the code for the resources they configure.
Then, you can use [`kubectl apply`](/docs/reference/generated/kubectl/kubectl-commands/#apply) to push your configuration changes to the cluster.
Then, you can use [`kubectl apply`](/docs/reference/generated/kubectl/kubectl-commands/#apply)
to push your configuration changes to the cluster.
This command will compare the version of the configuration that you're pushing with the previous version and apply the changes you've made, without overwriting any automated changes to properties you haven't specified.
This command will compare the version of the configuration that you're pushing with the previous
version and apply the changes you've made, without overwriting any automated changes to properties
you haven't specified.
```shell
kubectl apply -f https://k8s.io/examples/application/nginx/nginx-deployment.yaml
```
```none
deployment.apps/my-nginx configured
```
Note that `kubectl apply` attaches an annotation to the resource in order to determine the changes to the configuration since the previous invocation. When it's invoked, `kubectl apply` does a three-way diff between the previous configuration, the provided input and the current configuration of the resource, in order to determine how to modify the resource.
Note that `kubectl apply` attaches an annotation to the resource in order to determine the changes
to the configuration since the previous invocation. When it's invoked, `kubectl apply` does a
three-way diff between the previous configuration, the provided input and the current
configuration of the resource, in order to determine how to modify the resource.
Currently, resources are created without this annotation, so the first invocation of `kubectl apply` will fall back to a two-way diff between the provided input and the current configuration of the resource. During this first invocation, it cannot detect the deletion of properties set when the resource was created. For this reason, it will not remove them.
Currently, resources are created without this annotation, so the first invocation of `kubectl
apply` will fall back to a two-way diff between the provided input and the current configuration
of the resource. During this first invocation, it cannot detect the deletion of properties set
when the resource was created. For this reason, it will not remove them.
All subsequent calls to `kubectl apply`, and other commands that modify the configuration, such as `kubectl replace` and `kubectl edit`, will update the annotation, allowing subsequent calls to `kubectl apply` to detect and perform deletions using a three-way diff.
All subsequent calls to `kubectl apply`, and other commands that modify the configuration, such as
`kubectl replace` and `kubectl edit`, will update the annotation, allowing subsequent calls to
`kubectl apply` to detect and perform deletions using a three-way diff.
### kubectl edit
@ -376,7 +442,8 @@ Alternatively, you may also update resources with `kubectl edit`:
kubectl edit deployment/my-nginx
```
This is equivalent to first `get` the resource, edit it in text editor, and then `apply` the resource with the updated version:
This is equivalent to first `get` the resource, edit it in text editor, and then `apply` the
resource with the updated version:
```shell
kubectl get deployment my-nginx -o yaml > /tmp/nginx.yaml
@ -389,7 +456,8 @@ deployment.apps/my-nginx configured
rm /tmp/nginx.yaml
```
This allows you to do more significant changes more easily. Note that you can specify the editor with your `EDITOR` or `KUBE_EDITOR` environment variables.
This allows you to do more significant changes more easily. Note that you can specify the editor
with your `EDITOR` or `KUBE_EDITOR` environment variables.
For more information, please see [kubectl edit](/docs/reference/generated/kubectl/kubectl-commands/#edit) document.
@ -403,20 +471,25 @@ and
## Disruptive updates
In some cases, you may need to update resource fields that cannot be updated once initialized, or you may want to make a recursive change immediately, such as to fix broken pods created by a Deployment. To change such fields, use `replace --force`, which deletes and re-creates the resource. In this case, you can modify your original configuration file:
In some cases, you may need to update resource fields that cannot be updated once initialized, or
you may want to make a recursive change immediately, such as to fix broken pods created by a
Deployment. To change such fields, use `replace --force`, which deletes and re-creates the
resource. In this case, you can modify your original configuration file:
```shell
kubectl replace -f https://k8s.io/examples/application/nginx/nginx-deployment.yaml --force
```
```shell
```none
deployment.apps/my-nginx deleted
deployment.apps/my-nginx replaced
```
## Updating your application without a service outage
At some point, you'll eventually need to update your deployed application, typically by specifying a new image or image tag, as in the canary deployment scenario above. `kubectl` supports several update operations, each of which is applicable to different scenarios.
At some point, you'll eventually need to update your deployed application, typically by specifying
a new image or image tag, as in the canary deployment scenario above. `kubectl` supports several
update operations, each of which is applicable to different scenarios.
We'll guide you through how to create and update applications with Deployments.
@ -426,7 +499,7 @@ Let's say you were running version 1.14.2 of nginx:
kubectl create deployment my-nginx --image=nginx:1.14.2
```
```shell
```none
deployment.apps/my-nginx created
```
@ -436,24 +509,24 @@ with 3 replicas (so the old and new revisions can coexist):
kubectl scale deployment my-nginx --current-replicas=1 --replicas=3
```
```
```none
deployment.apps/my-nginx scaled
```
To update to version 1.16.1, change `.spec.template.spec.containers[0].image` from `nginx:1.14.2` to `nginx:1.16.1` using the previous kubectl commands.
To update to version 1.16.1, change `.spec.template.spec.containers[0].image` from `nginx:1.14.2`
to `nginx:1.16.1` using the previous kubectl commands.
```shell
kubectl edit deployment/my-nginx
```
That's it! The Deployment will declaratively update the deployed nginx application progressively behind the scene. It ensures that only a certain number of old replicas may be down while they are being updated, and only a certain number of new replicas may be created above the desired number of pods. To learn more details about it, visit [Deployment page](/docs/concepts/workloads/controllers/deployment/).
That's it! The Deployment will declaratively update the deployed nginx application progressively
behind the scene. It ensures that only a certain number of old replicas may be down while they are
being updated, and only a certain number of new replicas may be created above the desired number
of pods. To learn more details about it, visit [Deployment page](/docs/concepts/workloads/controllers/deployment/).
## {{% heading "whatsnext" %}}
- Learn about [how to use `kubectl` for application introspection and debugging](/docs/tasks/debug/debug-application/debug-running-pod/).
- See [Configuration Best Practices and Tips](/docs/concepts/configuration/overview/).

View File

@ -807,4 +807,5 @@ memory limit (and possibly request) for that container.
and its [resource requirements](/docs/reference/kubernetes-api/workload-resources/pod-v1/#resources)
* Read about [project quotas](https://xfs.org/index.php/XFS_FAQ#Q:_Quota:_Do_quotas_work_on_XFS.3F) in XFS
* Read more about the [kube-scheduler configuration reference (v1beta3)](/docs/reference/config-api/kube-scheduler-config.v1beta3/)
* Read more about [Quality of Service classes for Pods](/docs/concepts/workloads/pods/pod-qos/)

View File

@ -165,15 +165,35 @@ for that Pod, including details of the problem fetching the Secret.
#### Optional Secrets {#restriction-secret-must-exist}
When you define a container environment variable based on a Secret,
you can mark it as _optional_. The default is for the Secret to be
required.
When you reference a Secret in a Pod, you can mark the Secret as _optional_,
such as in the following example. If an optional Secret doesn't exist,
Kubernetes ignores it.
None of a Pod's containers will start until all non-optional Secrets are
available.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: mypod
image: redis
volumeMounts:
- name: foo
mountPath: "/etc/foo"
readOnly: true
volumes:
- name: foo
secret:
secretName: mysecret
optional: true
```
If a Pod references a specific key in a Secret and that Secret does exist, but
is missing the named key, the Pod fails during startup.
By default, Secrets are required. None of a Pod's containers will start until
all non-optional Secrets are available.
If a Pod references a specific key in a non-optional Secret and that Secret
does exist, but is missing the named key, the Pod fails during startup.
### Using Secrets as files from a Pod {#using-secrets-as-files-from-a-pod}
@ -181,181 +201,8 @@ If you want to access data from a Secret in a Pod, one way to do that is to
have Kubernetes make the value of that Secret be available as a file inside
the filesystem of one or more of the Pod's containers.
To configure that, you:
1. Create a secret or use an existing one. Multiple Pods can reference the same secret.
1. Modify your Pod definition to add a volume under `.spec.volumes[]`. Name the volume anything,
and have a `.spec.volumes[].secret.secretName` field equal to the name of the Secret object.
1. Add a `.spec.containers[].volumeMounts[]` to each container that needs the secret. Specify
`.spec.containers[].volumeMounts[].readOnly = true` and
`.spec.containers[].volumeMounts[].mountPath` to an unused directory name where you would like the
secrets to appear.
1. Modify your image or command line so that the program looks for files in that directory. Each
key in the secret `data` map becomes the filename under `mountPath`.
This is an example of a Pod that mounts a Secret named `mysecret` in a volume:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: mypod
image: redis
volumeMounts:
- name: foo
mountPath: "/etc/foo"
readOnly: true
volumes:
- name: foo
secret:
secretName: mysecret
optional: false # default setting; "mysecret" must exist
```
Each Secret you want to use needs to be referred to in `.spec.volumes`.
If there are multiple containers in the Pod, then each container needs its
own `volumeMounts` block, but only one `.spec.volumes` is needed per Secret.
{{< note >}}
Versions of Kubernetes before v1.22 automatically created credentials for accessing
the Kubernetes API. This older mechanism was based on creating token Secrets that
could then be mounted into running Pods.
In more recent versions, including Kubernetes v{{< skew currentVersion >}}, API credentials
are obtained directly by using the [TokenRequest](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/) API,
and are mounted into Pods using a [projected volume](/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume).
The tokens obtained using this method have bounded lifetimes, and are automatically
invalidated when the Pod they are mounted into is deleted.
You can still [manually create](/docs/tasks/configure-pod-container/configure-service-account/#manually-create-a-service-account-api-token)
a service account token Secret; for example, if you need a token that never expires.
However, using the [TokenRequest](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/)
subresource to obtain a token to access the API is recommended instead.
You can use the [`kubectl create token`](/docs/reference/generated/kubectl/kubectl-commands#-em-token-em-)
command to obtain a token from the `TokenRequest` API.
{{< /note >}}
#### Projection of Secret keys to specific paths
You can also control the paths within the volume where Secret keys are projected.
You can use the `.spec.volumes[].secret.items` field to change the target path of each key:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: mypod
image: redis
volumeMounts:
- name: foo
mountPath: "/etc/foo"
readOnly: true
volumes:
- name: foo
secret:
secretName: mysecret
items:
- key: username
path: my-group/my-username
```
What will happen:
* the `username` key from `mysecret` is available to the container at the path
`/etc/foo/my-group/my-username` instead of at `/etc/foo/username`.
* the `password` key from that Secret object is not projected.
If `.spec.volumes[].secret.items` is used, only keys specified in `items` are projected.
To consume all keys from the Secret, all of them must be listed in the `items` field.
If you list keys explicitly, then all listed keys must exist in the corresponding Secret.
Otherwise, the volume is not created.
#### Secret files permissions
You can set the POSIX file access permission bits for a single Secret key.
If you don't specify any permissions, `0644` is used by default.
You can also set a default mode for the entire Secret volume and override per key if needed.
For example, you can specify a default mode like this:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: mypod
image: redis
volumeMounts:
- name: foo
mountPath: "/etc/foo"
volumes:
- name: foo
secret:
secretName: mysecret
defaultMode: 0400
```
The secret is mounted on `/etc/foo`; all the files created by the
secret volume mount have permission `0400`.
{{< note >}}
If you're defining a Pod or a Pod template using JSON, beware that the JSON
specification doesn't support octal notation. You can use the decimal value
for the `defaultMode` (for example, 0400 in octal is 256 in decimal) instead.
If you're writing YAML, you can write the `defaultMode` in octal.
{{< /note >}}
#### Consuming Secret values from volumes
Inside the container that mounts a secret volume, the secret keys appear as
files. The secret values are base64 decoded and stored inside these files.
This is the result of commands executed inside the container from the example above:
```shell
ls /etc/foo/
```
The output is similar to:
```
username
password
```
```shell
cat /etc/foo/username
```
The output is similar to:
```
admin
```
```shell
cat /etc/foo/password
```
The output is similar to:
```
1f2d1e2e67df
```
The program in a container is responsible for reading the secret data from these
files, as needed.
#### Mounted Secrets are updated automatically
For instructions, refer to
[Distribute credentials securely using Secrets](/docs/tasks/inject-data-application/distribute-credentials-secure/#create-a-pod-that-has-access-to-the-secret-data-through-a-volume).
When a volume contains data from a Secret, and that Secret is updated, Kubernetes tracks
this and updates the data in the volume, using an eventually-consistent approach.
@ -388,53 +235,23 @@ watch propagation delay, the configured cache TTL, or zero for direct polling).
To use a Secret in an {{< glossary_tooltip text="environment variable" term_id="container-env-variables" >}}
in a Pod:
1. Create a Secret (or use an existing one). Multiple Pods can reference the same Secret.
1. Modify your Pod definition in each container that you wish to consume the value of a secret
key to add an environment variable for each secret key you wish to consume. The environment
variable that consumes the secret key should populate the secret's name and key in `env[].valueFrom.secretKeyRef`.
1. Modify your image and/or command line so that the program looks for values in the specified
environment variables.
This is an example of a Pod that uses a Secret via environment variables:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: secret-env-pod
spec:
containers:
- name: mycontainer
image: redis
env:
- name: SECRET_USERNAME
valueFrom:
secretKeyRef:
name: mysecret
key: username
optional: false # same as default; "mysecret" must exist
# and include a key named "username"
- name: SECRET_PASSWORD
valueFrom:
secretKeyRef:
name: mysecret
key: password
optional: false # same as default; "mysecret" must exist
# and include a key named "password"
restartPolicy: Never
```
1. For each container in your Pod specification, add an environment variable
for each Secret key that you want to use to the
`env[].valueFrom.secretKeyRef` field.
1. Modify your image and/or command line so that the program looks for values
in the specified environment variables.
For instructions, refer to
[Define container environment variables using Secret data](/docs/tasks/inject-data-application/distribute-credentials-secure/#define-container-environment-variables-using-secret-data).
#### Invalid environment variables {#restriction-env-from-invalid}
Secrets used to populate environment variables by the `envFrom` field that have keys
that are considered invalid environment variable names will have those keys
skipped. The Pod is allowed to start.
If your environment variable definitions in your Pod specification are
considered to be invalid environment variable names, those keys aren't made
available to your container. The Pod is allowed to start.
If you define a Pod with an invalid variable name, the failed Pod startup includes
an event with the reason set to `InvalidVariableNames` and a message that lists the
skipped invalid keys. The following example shows a Pod that refers to a Secret
named `mysecret`, where `mysecret` contains 2 invalid keys: `1badkey` and `2alsobad`.
Kubernetes adds an Event with the reason set to `InvalidVariableNames` and a
message that lists the skipped invalid keys. The following example shows a Pod that refers to a Secret named `mysecret`, where `mysecret` contains 2 invalid keys: `1badkey` and `2alsobad`.
```shell
kubectl get events
@ -447,42 +264,6 @@ LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT
0s 0s 1 dapi-test-pod Pod Warning InvalidEnvironmentVariableNames kubelet, 127.0.0.1 Keys [1badkey, 2alsobad] from the EnvFrom secret default/mysecret were skipped since they are considered invalid environment variable names.
```
#### Consuming Secret values from environment variables
Inside a container that consumes a Secret using environment variables, the secret keys appear
as normal environment variables. The values of those variables are the base64 decoded values
of the secret data.
This is the result of commands executed inside the container from the example above:
```shell
echo "$SECRET_USERNAME"
```
The output is similar to:
```
admin
```
```shell
echo "$SECRET_PASSWORD"
```
The output is similar to:
```
1f2d1e2e67df
```
{{< note >}}
If a container already consumes a Secret in an environment variable,
a Secret update will not be seen by the container unless it is
restarted. There are third party solutions for triggering restarts when
secrets change.
{{< /note >}}
### Container image pull secrets {#using-imagepullsecrets}
If you want to fetch container images from a private repository, you need a way for
@ -518,43 +299,10 @@ You cannot use ConfigMaps or Secrets with {{< glossary_tooltip text="static Pods
## Use cases
### Use case: As container environment variables
### Use case: As container environment variables {#use-case-as-container-environment-variables}
Create a secret
```yaml
apiVersion: v1
kind: Secret
metadata:
name: mysecret
type: Opaque
data:
USER_NAME: YWRtaW4=
PASSWORD: MWYyZDFlMmU2N2Rm
```
Create the Secret:
```shell
kubectl apply -f mysecret.yaml
```
Use `envFrom` to define all of the Secret's data as container environment variables. The key from
the Secret becomes the environment variable name in the Pod.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: secret-test-pod
spec:
containers:
- name: test-container
image: registry.k8s.io/busybox
command: [ "/bin/sh", "-c", "env" ]
envFrom:
- secretRef:
name: mysecret
restartPolicy: Never
```
You can create a Secret and use it to
[set environment variables for a container](/docs/tasks/inject-data-application/distribute-credentials-secure/#define-container-environment-variables-using-secret-data).
### Use case: Pod with SSH keys
@ -873,13 +621,28 @@ A `kubernetes.io/service-account-token` type of Secret is used to store a
token credential that identifies a
{{< glossary_tooltip text="service account" term_id="service-account" >}}.
Since 1.22, this type of Secret is no longer used to mount credentials into Pods,
and obtaining tokens via the [TokenRequest](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/)
API is recommended instead of using service account token Secret objects.
Tokens obtained from the `TokenRequest` API are more secure than ones stored in Secret objects,
because they have a bounded lifetime and are not readable by other API clients.
You can use the [`kubectl create token`](/docs/reference/generated/kubectl/kubectl-commands#-em-token-em-)
{{< note >}}
Versions of Kubernetes before v1.22 automatically created credentials for
accessing the Kubernetes API. This older mechanism was based on creating token
Secrets that could then be mounted into running Pods.
In more recent versions, including Kubernetes v{{< skew currentVersion >}}, API
credentials are obtained directly by using the
[TokenRequest](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/)
API, and are mounted into Pods using a
[projected volume](/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume).
The tokens obtained using this method have bounded lifetimes, and are
automatically invalidated when the Pod they are mounted into is deleted.
You can still
[manually create](/docs/tasks/configure-pod-container/configure-service-account/#manually-create-a-service-account-api-token)
a service account token Secret; for example, if you need a token that never
expires. However, using the
[TokenRequest](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/)
subresource to obtain a token to access the API is recommended instead.
You can use the
[`kubectl create token`](/docs/reference/generated/kubectl/kubectl-commands#-em-token-em-)
command to obtain a token from the `TokenRequest` API.
{{< /note >}}
You should only create a service account token Secret object
if you can't use the `TokenRequest` API to obtain a token,

View File

@ -87,60 +87,65 @@ spec:
The general workflow of a device plugin includes the following steps:
* Initialization. During this phase, the device plugin performs vendor specific
initialization and setup to make sure the devices are in a ready state.
1. Initialization. During this phase, the device plugin performs vendor-specific
initialization and setup to make sure the devices are in a ready state.
* The plugin starts a gRPC service, with a Unix socket under host path
`/var/lib/kubelet/device-plugins/`, that implements the following interfaces:
1. The plugin starts a gRPC service, with a Unix socket under the host path
`/var/lib/kubelet/device-plugins/`, that implements the following interfaces:
```gRPC
service DevicePlugin {
// GetDevicePluginOptions returns options to be communicated with Device Manager.
rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
```gRPC
service DevicePlugin {
// GetDevicePluginOptions returns options to be communicated with Device Manager.
rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
// ListAndWatch returns a stream of List of Devices
// Whenever a Device state change or a Device disappears, ListAndWatch
// returns the new list
rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}
// ListAndWatch returns a stream of List of Devices
// Whenever a Device state change or a Device disappears, ListAndWatch
// returns the new list
rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}
// Allocate is called during container creation so that the Device
// Plugin can run device specific operations and instruct Kubelet
// of the steps to make the Device available in the container
rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
// Allocate is called during container creation so that the Device
// Plugin can run device specific operations and instruct Kubelet
// of the steps to make the Device available in the container
rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
// GetPreferredAllocation returns a preferred set of devices to allocate
// from a list of available ones. The resulting preferred allocation is not
// guaranteed to be the allocation ultimately performed by the
// devicemanager. It is only designed to help the devicemanager make a more
// informed allocation decision when possible.
rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {}
// GetPreferredAllocation returns a preferred set of devices to allocate
// from a list of available ones. The resulting preferred allocation is not
// guaranteed to be the allocation ultimately performed by the
// devicemanager. It is only designed to help the devicemanager make a more
// informed allocation decision when possible.
rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {}
// PreStartContainer is called, if indicated by Device Plugin during registeration phase,
// before each container start. Device plugin can run device specific operations
// such as resetting the device before making devices available to the container.
rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {}
}
```
// PreStartContainer is called, if indicated by Device Plugin during registeration phase,
// before each container start. Device plugin can run device specific operations
// such as resetting the device before making devices available to the container.
rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {}
}
```
{{< note >}}
Plugins are not required to provide useful implementations for
`GetPreferredAllocation()` or `PreStartContainer()`. Flags indicating which
(if any) of these calls are available should be set in the `DevicePluginOptions`
message sent back by a call to `GetDevicePluginOptions()`. The `kubelet` will
always call `GetDevicePluginOptions()` to see which optional functions are
available, before calling any of them directly.
{{< /note >}}
{{< note >}}
Plugins are not required to provide useful implementations for
`GetPreferredAllocation()` or `PreStartContainer()`. Flags indicating
the availability of these calls, if any, should be set in the `DevicePluginOptions`
message sent back by a call to `GetDevicePluginOptions()`. The `kubelet` will
always call `GetDevicePluginOptions()` to see which optional functions are
available, before calling any of them directly.
{{< /note >}}
* The plugin registers itself with the kubelet through the Unix socket at host
path `/var/lib/kubelet/device-plugins/kubelet.sock`.
1. The plugin registers itself with the kubelet through the Unix socket at host
path `/var/lib/kubelet/device-plugins/kubelet.sock`.
* After successfully registering itself, the device plugin runs in serving mode, during which it keeps
monitoring device health and reports back to the kubelet upon any device state changes.
It is also responsible for serving `Allocate` gRPC requests. During `Allocate`, the device plugin may
do device-specific preparation; for example, GPU cleanup or QRNG initialization.
If the operations succeed, the device plugin returns an `AllocateResponse` that contains container
runtime configurations for accessing the allocated devices. The kubelet passes this information
to the container runtime.
{{< note >}}
The ordering of the workflow is important. A plugin MUST start serving gRPC
service before registering itself with kubelet for successful registration.
{{< /note >}}
1. After successfully registering itself, the device plugin runs in serving mode, during which it keeps
monitoring device health and reports back to the kubelet upon any device state changes.
It is also responsible for serving `Allocate` gRPC requests. During `Allocate`, the device plugin may
do device-specific preparation; for example, GPU cleanup or QRNG initialization.
If the operations succeed, the device plugin returns an `AllocateResponse` that contains container
runtime configurations for accessing the allocated devices. The kubelet passes this information
to the container runtime.
### Handling kubelet restarts
@ -292,7 +297,6 @@ However, calling `GetAllocatableResources` endpoint is not sufficient in case of
update and Kubelet needs to be restarted to reflect the correct resource capacity and allocatable.
{{< /note >}}
```gRPC
// AllocatableResourcesResponses contains informations about all the devices known by the kubelet
message AllocatableResourcesResponse {
@ -313,14 +317,14 @@ Preceding Kubernetes v1.23, to enable this feature `kubelet` must be started wit
```
`ContainerDevices` do expose the topology information declaring to which NUMA cells the device is
affine. The NUMA cells are identified using a opaque integer ID, which value is consistent to
affine. The NUMA cells are identified using a opaque integer ID, which value is consistent to
what device plugins report
[when they register themselves to the kubelet](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#device-plugin-integration-with-the-topology-manager).
The gRPC service is served over a unix socket at `/var/lib/kubelet/pod-resources/kubelet.sock`.
Monitoring agents for device plugin resources can be deployed as a daemon, or as a DaemonSet.
The canonical directory `/var/lib/kubelet/pod-resources` requires privileged access, so monitoring
agents must run in a privileged security context. If a device monitoring agent is running as a
agents must run in a privileged security context. If a device monitoring agent is running as a
DaemonSet, `/var/lib/kubelet/pod-resources` must be mounted as a
{{< glossary_tooltip term_id="volume" >}} in the device monitoring agent's
[PodSpec](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podspec-v1-core).
@ -355,7 +359,7 @@ resource assignment decisions.
`TopologyInfo` supports setting a `nodes` field to either `nil` or a list of NUMA nodes. This
allows the Device Plugin to advertise a device that spans multiple NUMA nodes.
Setting `TopologyInfo` to `nil` or providing an empty list of NUMA nodes for a given device
Setting `TopologyInfo` to `nil` or providing an empty list of NUMA nodes for a given device
indicates that the Device Plugin does not have a NUMA affinity preference for that device.
An example `TopologyInfo` struct populated for a device by a Device Plugin:
@ -391,4 +395,3 @@ Here are some examples of device plugin implementations:
* Learn about the [Topology Manager](/docs/tasks/administer-cluster/topology-manager/)
* Read about using [hardware acceleration for TLS ingress](/blog/2019/04/24/hardware-accelerated-ssl/tls-termination-in-ingress-controllers-using-kubernetes-device-plugins-and-runtimeclass/)
with Kubernetes

View File

@ -119,6 +119,7 @@ operator.
* [kubebuilder](https://book.kubebuilder.io/)
* [KubeOps](https://buehler.github.io/dotnet-operator-sdk/) (.NET operator SDK)
* [KUDO](https://kudo.dev/) (Kubernetes Universal Declarative Operator)
* [Mast](https://docs.ansi.services/mast/user_guide/operator/)
* [Metacontroller](https://metacontroller.github.io/metacontroller/intro.html) along with WebHooks that
you implement yourself
* [Operator Framework](https://operatorframework.io)

View File

@ -8,7 +8,6 @@ weight: 60
You can use Kubernetes annotations to attach arbitrary non-identifying metadata
to objects. Clients such as tools and libraries can retrieve this metadata.
<!-- body -->
## Attaching metadata to objects
@ -74,10 +73,9 @@ If the prefix is omitted, the annotation Key is presumed to be private to the us
The `kubernetes.io/` and `k8s.io/` prefixes are reserved for Kubernetes core components.
For example, here's the configuration file for a Pod that has the annotation `imageregistry: https://hub.docker.com/` :
For example, here's a manifest for a Pod that has the annotation `imageregistry: https://hub.docker.com/` :
```yaml
apiVersion: v1
kind: Pod
metadata:
@ -90,14 +88,8 @@ spec:
image: nginx:1.14.2
ports:
- containerPort: 80
```
## {{% heading "whatsnext" %}}
Learn more about [Labels and Selectors](/docs/concepts/overview/working-with-objects/labels/).

View File

@ -9,9 +9,12 @@ weight: 40
<!-- overview -->
_Labels_ are key/value pairs that are attached to objects, such as pods.
Labels are intended to be used to specify identifying attributes of objects that are meaningful and relevant to users, but do not directly imply semantics to the core system.
Labels can be used to organize and to select subsets of objects. Labels can be attached to objects at creation time and subsequently added and modified at any time.
Each object can have a set of key/value labels defined. Each Key must be unique for a given object.
Labels are intended to be used to specify identifying attributes of objects
that are meaningful and relevant to users, but do not directly imply semantics
to the core system. Labels can be used to organize and to select subsets of
objects. Labels can be attached to objects at creation time and subsequently
added and modified at any time. Each object can have a set of key/value labels
defined. Each Key must be unique for a given object.
```json
"metadata": {
@ -30,37 +33,56 @@ and CLIs. Non-identifying information should be recorded using
## Motivation
Labels enable users to map their own organizational structures onto system objects in a loosely coupled fashion, without requiring clients to store these mappings.
Labels enable users to map their own organizational structures onto system objects
in a loosely coupled fashion, without requiring clients to store these mappings.
Service deployments and batch processing pipelines are often multi-dimensional entities (e.g., multiple partitions or deployments, multiple release tracks, multiple tiers, multiple micro-services per tier). Management often requires cross-cutting operations, which breaks encapsulation of strictly hierarchical representations, especially rigid hierarchies determined by the infrastructure rather than by users.
Service deployments and batch processing pipelines are often multi-dimensional entities
(e.g., multiple partitions or deployments, multiple release tracks, multiple tiers,
multiple micro-services per tier). Management often requires cross-cutting operations,
which breaks encapsulation of strictly hierarchical representations, especially rigid
hierarchies determined by the infrastructure rather than by users.
Example labels:
* `"release" : "stable"`, `"release" : "canary"`
* `"environment" : "dev"`, `"environment" : "qa"`, `"environment" : "production"`
* `"tier" : "frontend"`, `"tier" : "backend"`, `"tier" : "cache"`
* `"partition" : "customerA"`, `"partition" : "customerB"`
* `"track" : "daily"`, `"track" : "weekly"`
* `"release" : "stable"`, `"release" : "canary"`
* `"environment" : "dev"`, `"environment" : "qa"`, `"environment" : "production"`
* `"tier" : "frontend"`, `"tier" : "backend"`, `"tier" : "cache"`
* `"partition" : "customerA"`, `"partition" : "customerB"`
* `"track" : "daily"`, `"track" : "weekly"`
These are examples of [commonly used labels](/docs/concepts/overview/working-with-objects/common-labels/); you are free to develop your own conventions. Keep in mind that label Key must be unique for a given object.
These are examples of
[commonly used labels](/docs/concepts/overview/working-with-objects/common-labels/);
you are free to develop your own conventions.
Keep in mind that label Key must be unique for a given object.
## Syntax and character set
_Labels_ are key/value pairs. Valid label keys have two segments: an optional prefix and name, separated by a slash (`/`). The name segment is required and must be 63 characters or less, beginning and ending with an alphanumeric character (`[a-z0-9A-Z]`) with dashes (`-`), underscores (`_`), dots (`.`), and alphanumerics between. The prefix is optional. If specified, the prefix must be a DNS subdomain: a series of DNS labels separated by dots (`.`), not longer than 253 characters in total, followed by a slash (`/`).
_Labels_ are key/value pairs. Valid label keys have two segments: an optional
prefix and name, separated by a slash (`/`). The name segment is required and
must be 63 characters or less, beginning and ending with an alphanumeric
character (`[a-z0-9A-Z]`) with dashes (`-`), underscores (`_`), dots (`.`),
and alphanumerics between. The prefix is optional. If specified, the prefix
must be a DNS subdomain: a series of DNS labels separated by dots (`.`),
not longer than 253 characters in total, followed by a slash (`/`).
If the prefix is omitted, the label Key is presumed to be private to the user. Automated system components (e.g. `kube-scheduler`, `kube-controller-manager`, `kube-apiserver`, `kubectl`, or other third-party automation) which add labels to end-user objects must specify a prefix.
If the prefix is omitted, the label Key is presumed to be private to the user.
Automated system components (e.g. `kube-scheduler`, `kube-controller-manager`,
`kube-apiserver`, `kubectl`, or other third-party automation) which add labels
to end-user objects must specify a prefix.
The `kubernetes.io/` and `k8s.io/` prefixes are [reserved](/docs/reference/labels-annotations-taints/) for Kubernetes core components.
The `kubernetes.io/` and `k8s.io/` prefixes are
[reserved](/docs/reference/labels-annotations-taints/) for Kubernetes core components.
Valid label value:
* must be 63 characters or less (can be empty),
* unless empty, must begin and end with an alphanumeric character (`[a-z0-9A-Z]`),
* could contain dashes (`-`), underscores (`_`), dots (`.`), and alphanumerics between.
For example, here's the configuration file for a Pod that has two labels `environment: production` and `app: nginx` :
For example, here's a manifest for a Pod that has two labels
`environment: production` and `app: nginx`:
```yaml
apiVersion: v1
kind: Pod
metadata:
@ -74,34 +96,43 @@ spec:
image: nginx:1.14.2
ports:
- containerPort: 80
```
## Label selectors
Unlike [names and UIDs](/docs/concepts/overview/working-with-objects/names/), labels do not provide uniqueness. In general, we expect many objects to carry the same label(s).
Unlike [names and UIDs](/docs/concepts/overview/working-with-objects/names/), labels
do not provide uniqueness. In general, we expect many objects to carry the same label(s).
Via a _label selector_, the client/user can identify a set of objects. The label selector is the core grouping primitive in Kubernetes.
Via a _label selector_, the client/user can identify a set of objects.
The label selector is the core grouping primitive in Kubernetes.
The API currently supports two types of selectors: _equality-based_ and _set-based_.
A label selector can be made of multiple _requirements_ which are comma-separated. In the case of multiple requirements, all must be satisfied so the comma separator acts as a logical _AND_ (`&&`) operator.
A label selector can be made of multiple _requirements_ which are comma-separated.
In the case of multiple requirements, all must be satisfied so the comma separator
acts as a logical _AND_ (`&&`) operator.
The semantics of empty or non-specified selectors are dependent on the context,
and API types that use selectors should document the validity and meaning of
them.
{{< note >}}
For some API types, such as ReplicaSets, the label selectors of two instances must not overlap within a namespace, or the controller can see that as conflicting instructions and fail to determine how many replicas should be present.
For some API types, such as ReplicaSets, the label selectors of two instances must
not overlap within a namespace, or the controller can see that as conflicting
instructions and fail to determine how many replicas should be present.
{{< /note >}}
{{< caution >}}
For both equality-based and set-based conditions there is no logical _OR_ (`||`) operator. Ensure your filter statements are structured accordingly.
For both equality-based and set-based conditions there is no logical _OR_ (`||`) operator.
Ensure your filter statements are structured accordingly.
{{< /caution >}}
### _Equality-based_ requirement
_Equality-_ or _inequality-based_ requirements allow filtering by label keys and values. Matching objects must satisfy all of the specified label constraints, though they may have additional labels as well.
Three kinds of operators are admitted `=`,`==`,`!=`. The first two represent _equality_ (and are synonyms), while the latter represents _inequality_. For example:
_Equality-_ or _inequality-based_ requirements allow filtering by label keys and values.
Matching objects must satisfy all of the specified label constraints, though they may
have additional labels as well. Three kinds of operators are admitted `=`,`==`,`!=`.
The first two represent _equality_ (and are synonyms), while the latter represents _inequality_.
For example:
```
environment = production
@ -109,8 +140,9 @@ tier != frontend
```
The former selects all resources with key equal to `environment` and value equal to `production`.
The latter selects all resources with key equal to `tier` and value distinct from `frontend`, and all resources with no labels with the `tier` key.
One could filter for resources in `production` excluding `frontend` using the comma operator: `environment=production,tier!=frontend`
The latter selects all resources with key equal to `tier` and value distinct from `frontend`,
and all resources with no labels with the `tier` key. One could filter for resources in `production`
excluding `frontend` using the comma operator: `environment=production,tier!=frontend`
One usage scenario for equality-based label requirement is for Pods to specify
node selection criteria. For example, the sample Pod below selects nodes with
@ -134,7 +166,9 @@ spec:
### _Set-based_ requirement
_Set-based_ label requirements allow filtering keys according to a set of values. Three kinds of operators are supported: `in`,`notin` and `exists` (only the key identifier). For example:
_Set-based_ label requirements allow filtering keys according to a set of values.
Three kinds of operators are supported: `in`,`notin` and `exists` (only the key identifier).
For example:
```
environment in (production, qa)
@ -143,27 +177,38 @@ partition
!partition
```
* The first example selects all resources with key equal to `environment` and value equal to `production` or `qa`.
* The second example selects all resources with key equal to `tier` and values other than `frontend` and `backend`, and all resources with no labels with the `tier` key.
* The third example selects all resources including a label with key `partition`; no values are checked.
* The fourth example selects all resources without a label with key `partition`; no values are checked.
- The first example selects all resources with key equal to `environment` and value
equal to `production` or `qa`.
- The second example selects all resources with key equal to `tier` and values other
than `frontend` and `backend`, and all resources with no labels with the `tier` key.
- The third example selects all resources including a label with key `partition`;
no values are checked.
- The fourth example selects all resources without a label with key `partition`;
no values are checked.
Similarly the comma separator acts as an _AND_ operator. So filtering resources with a `partition` key (no matter the value) and with `environment` different than  `qa` can be achieved using `partition,environment notin (qa)`.
The _set-based_ label selector is a general form of equality since `environment=production` is equivalent to `environment in (production)`; similarly for `!=` and `notin`.
_Set-based_ requirements can be mixed with _equality-based_ requirements. For example: `partition in (customerA, customerB),environment!=qa`.
Similarly the comma separator acts as an _AND_ operator. So filtering resources
with a `partition` key (no matter the value) and with `environment` different
than `qa` can be achieved using `partition,environment notin (qa)`.
The _set-based_ label selector is a general form of equality since
`environment=production` is equivalent to `environment in (production)`;
similarly for `!=` and `notin`.
_Set-based_ requirements can be mixed with _equality-based_ requirements.
For example: `partition in (customerA, customerB),environment!=qa`.
## API
### LIST and WATCH filtering
LIST and WATCH operations may specify label selectors to filter the sets of objects returned using a query parameter. Both requirements are permitted (presented here as they would appear in a URL query string):
LIST and WATCH operations may specify label selectors to filter the sets of objects
returned using a query parameter. Both requirements are permitted
(presented here as they would appear in a URL query string):
* _equality-based_ requirements: `?labelSelector=environment%3Dproduction,tier%3Dfrontend`
* _set-based_ requirements: `?labelSelector=environment+in+%28production%2Cqa%29%2Ctier+in+%28frontend%29`
* _equality-based_ requirements: `?labelSelector=environment%3Dproduction,tier%3Dfrontend`
* _set-based_ requirements: `?labelSelector=environment+in+%28production%2Cqa%29%2Ctier+in+%28frontend%29`
Both label selector styles can be used to list or watch resources via a REST client. For example, targeting `apiserver` with `kubectl` and using _equality-based_ one may write:
Both label selector styles can be used to list or watch resources via a REST client.
For example, targeting `apiserver` with `kubectl` and using _equality-based_ one may write:
```shell
kubectl get pods -l environment=production,tier=frontend
@ -175,13 +220,14 @@ or using _set-based_ requirements:
kubectl get pods -l 'environment in (production),tier in (frontend)'
```
As already mentioned _set-based_ requirements are more expressive.  For instance, they can implement the _OR_ operator on values:
As already mentioned _set-based_ requirements are more expressive.
For instance, they can implement the _OR_ operator on values:
```shell
kubectl get pods -l 'environment in (production, qa)'
```
or restricting negative matching via _exists_ operator:
or restricting negative matching via _notin_ operator:
```shell
kubectl get pods -l 'environment,environment notin (frontend)'
@ -196,23 +242,28 @@ also use label selectors to specify sets of other resources, such as
#### Service and ReplicationController
The set of pods that a `service` targets is defined with a label selector. Similarly, the population of pods that a `replicationcontroller` should manage is also defined with a label selector.
The set of pods that a `service` targets is defined with a label selector.
Similarly, the population of pods that a `replicationcontroller` should
manage is also defined with a label selector.
Labels selectors for both objects are defined in `json` or `yaml` files using maps, and only _equality-based_ requirement selectors are supported:
Labels selectors for both objects are defined in `json` or `yaml` files using maps,
and only _equality-based_ requirement selectors are supported:
```json
"selector": {
"component" : "redis",
}
```
or
```yaml
selector:
component: redis
component: redis
```
this selector (respectively in `json` or `yaml` format) is equivalent to `component=redis` or `component in (redis)`.
This selector (respectively in `json` or `yaml` format) is equivalent to
`component=redis` or `component in (redis)`.
#### Resources that support set-based requirements
@ -227,16 +278,23 @@ selector:
matchLabels:
component: redis
matchExpressions:
- {key: tier, operator: In, values: [cache]}
- {key: environment, operator: NotIn, values: [dev]}
- { key: tier, operator: In, values: [cache] }
- { key: environment, operator: NotIn, values: [dev] }
```
`matchLabels` is a map of `{key,value}` pairs. A single `{key,value}` in the `matchLabels` map is equivalent to an element of `matchExpressions`, whose `key` field is "key", the `operator` is "In", and the `values` array contains only "value". `matchExpressions` is a list of pod selector requirements. Valid operators include In, NotIn, Exists, and DoesNotExist. The values set must be non-empty in the case of In and NotIn. All of the requirements, from both `matchLabels` and `matchExpressions` are ANDed together -- they must all be satisfied in order to match.
`matchLabels` is a map of `{key,value}` pairs. A single `{key,value}` in the
`matchLabels` map is equivalent to an element of `matchExpressions`, whose `key`
field is "key", the `operator` is "In", and the `values` array contains only "value".
`matchExpressions` is a list of pod selector requirements. Valid operators include
In, NotIn, Exists, and DoesNotExist. The values set must be non-empty in the case of
In and NotIn. All of the requirements, from both `matchLabels` and `matchExpressions`
are ANDed together -- they must all be satisfied in order to match.
#### Selecting sets of nodes
One use case for selecting over labels is to constrain the set of nodes onto which a pod can schedule.
See the documentation on [node selection](/docs/concepts/scheduling-eviction/assign-pod-node/) for more information.
One use case for selecting over labels is to constrain the set of nodes onto which
a pod can schedule. See the documentation on
[node selection](/docs/concepts/scheduling-eviction/assign-pod-node/) for more information.
## {{% heading "whatsnext" %}}

View File

@ -24,6 +24,10 @@ For non-unique user-provided attributes, Kubernetes provides [labels](/docs/conc
{{< glossary_definition term_id="name" length="all" >}}
**Names must be unique across all [API versions](/docs/concepts/overview/kubernetes-api/#api-groups-and-versioning)
of the same resource. API resources are distinguished by their API group, resource type, namespace
(for namespaced resources), and name. In other words, API version is irrelevant in this context.**
{{< note >}}
In cases when objects represent a physical entity, like a Node representing a physical host, when the host is re-created under the same name without deleting and re-creating the Node, Kubernetes treats the new host as the old one, which may lead to inconsistencies.
{{< /note >}}

View File

@ -44,7 +44,7 @@ Kubernetes starts with four initial namespaces:
: Kubernetes includes this namespace so that you can start using your new cluster without first creating a namespace.
`kube-node-lease`
: This namespace holds [Lease](/docs/reference/kubernetes-api/cluster-resources/lease-v1/) objects associated with each node. Node leases allow the kubelet to send [heartbeats](/docs/concepts/architecture/nodes/#heartbeats) so that the control plane can detect node failure.
: This namespace holds [Lease](/docs/concepts/architecture/leases/) objects associated with each node. Node leases allow the kubelet to send [heartbeats](/docs/concepts/architecture/nodes/#heartbeats) so that the control plane can detect node failure.
`kube-public`
: This namespace is readable by *all* clients (including those not authenticated). This namespace is mostly reserved for cluster usage, in case that some resources should be visible and readable publicly throughout the whole cluster. The public aspect of this namespace is only a convention, not a requirement.
@ -147,7 +147,7 @@ kubectl api-resources --namespaced=false
## Automatic labelling
{{< feature-state state="beta" for_k8s_version="1.21" >}}
{{< feature-state for_k8s_version="1.22" state="stable" >}}
The Kubernetes control plane sets an immutable {{< glossary_tooltip text="label" term_id="label" >}}
`kubernetes.io/metadata.name` on all namespaces, provided that the `NamespaceDefaultLabelName`

View File

@ -8,16 +8,15 @@ content_type: concept
weight: 20
---
<!-- overview -->
You can constrain a {{< glossary_tooltip text="Pod" term_id="pod" >}} so that it is
You can constrain a {{< glossary_tooltip text="Pod" term_id="pod" >}} so that it is
_restricted_ to run on particular {{< glossary_tooltip text="node(s)" term_id="node" >}},
or to _prefer_ to run on particular nodes.
There are several ways to do this and the recommended approaches all use
[label selectors](/docs/concepts/overview/working-with-objects/labels/) to facilitate the selection.
Often, you do not need to set any such constraints; the
{{< glossary_tooltip text="scheduler" term_id="kube-scheduler" >}} will automatically do a reasonable placement
{{< glossary_tooltip text="scheduler" term_id="kube-scheduler" >}} will automatically do a reasonable placement
(for example, spreading your Pods across nodes so as not place Pods on a node with insufficient free resources).
However, there are some circumstances where you may want to control which node
the Pod deploys to, for example, to ensure that a Pod ends up on a node with an SSD attached to it,
@ -28,10 +27,10 @@ or to co-locate Pods from two different services that communicate a lot into the
You can use any of the following methods to choose where Kubernetes schedules
specific Pods:
* [nodeSelector](#nodeselector) field matching against [node labels](#built-in-node-labels)
* [Affinity and anti-affinity](#affinity-and-anti-affinity)
* [nodeName](#nodename) field
* [Pod topology spread constraints](#pod-topology-spread-constraints)
- [nodeSelector](#nodeselector) field matching against [node labels](#built-in-node-labels)
- [Affinity and anti-affinity](#affinity-and-anti-affinity)
- [nodeName](#nodename) field
- [Pod topology spread constraints](#pod-topology-spread-constraints)
## Node labels {#built-in-node-labels}
@ -51,7 +50,7 @@ and a different value in other environments.
Adding labels to nodes allows you to target Pods for scheduling on specific
nodes or groups of nodes. You can use this functionality to ensure that specific
Pods only run on nodes with certain isolation, security, or regulatory
properties.
properties.
If you use labels for node isolation, choose label keys that the {{<glossary_tooltip text="kubelet" term_id="kubelet">}}
cannot modify. This prevents a compromised node from setting those labels on
@ -59,7 +58,7 @@ itself so that the scheduler schedules workloads onto the compromised node.
The [`NodeRestriction` admission plugin](/docs/reference/access-authn-authz/admission-controllers/#noderestriction)
prevents the kubelet from setting or modifying labels with a
`node-restriction.kubernetes.io/` prefix.
`node-restriction.kubernetes.io/` prefix.
To make use of that label prefix for node isolation:
@ -73,7 +72,7 @@ To make use of that label prefix for node isolation:
You can add the `nodeSelector` field to your Pod specification and specify the
[node labels](#built-in-node-labels) you want the target node to have.
Kubernetes only schedules the Pod onto nodes that have each of the labels you
specify.
specify.
See [Assign Pods to Nodes](/docs/tasks/configure-pod-container/assign-pods-nodes) for more
information.
@ -84,20 +83,20 @@ information.
labels. Affinity and anti-affinity expands the types of constraints you can
define. Some of the benefits of affinity and anti-affinity include:
* The affinity/anti-affinity language is more expressive. `nodeSelector` only
- The affinity/anti-affinity language is more expressive. `nodeSelector` only
selects nodes with all the specified labels. Affinity/anti-affinity gives you
more control over the selection logic.
* You can indicate that a rule is *soft* or *preferred*, so that the scheduler
- You can indicate that a rule is *soft* or *preferred*, so that the scheduler
still schedules the Pod even if it can't find a matching node.
* You can constrain a Pod using labels on other Pods running on the node (or other topological domain),
- You can constrain a Pod using labels on other Pods running on the node (or other topological domain),
instead of just node labels, which allows you to define rules for which Pods
can be co-located on a node.
The affinity feature consists of two types of affinity:
* *Node affinity* functions like the `nodeSelector` field but is more expressive and
- *Node affinity* functions like the `nodeSelector` field but is more expressive and
allows you to specify soft rules.
* *Inter-pod affinity/anti-affinity* allows you to constrain Pods against labels
- *Inter-pod affinity/anti-affinity* allows you to constrain Pods against labels
on other Pods.
### Node affinity
@ -106,12 +105,12 @@ Node affinity is conceptually similar to `nodeSelector`, allowing you to constra
Pod can be scheduled on based on node labels. There are two types of node
affinity:
* `requiredDuringSchedulingIgnoredDuringExecution`: The scheduler can't
schedule the Pod unless the rule is met. This functions like `nodeSelector`,
but with a more expressive syntax.
* `preferredDuringSchedulingIgnoredDuringExecution`: The scheduler tries to
find a node that meets the rule. If a matching node is not available, the
scheduler still schedules the Pod.
- `requiredDuringSchedulingIgnoredDuringExecution`: The scheduler can't
schedule the Pod unless the rule is met. This functions like `nodeSelector`,
but with a more expressive syntax.
- `preferredDuringSchedulingIgnoredDuringExecution`: The scheduler tries to
find a node that meets the rule. If a matching node is not available, the
scheduler still schedules the Pod.
{{<note>}}
In the preceding types, `IgnoredDuringExecution` means that if the node labels
@ -127,17 +126,17 @@ For example, consider the following Pod spec:
In this example, the following rules apply:
* The node *must* have a label with the key `topology.kubernetes.io/zone` and
the value of that label *must* be either `antarctica-east1` or `antarctica-west1`.
* The node *preferably* has a label with the key `another-node-label-key` and
the value `another-node-label-value`.
- The node *must* have a label with the key `topology.kubernetes.io/zone` and
the value of that label *must* be either `antarctica-east1` or `antarctica-west1`.
- The node *preferably* has a label with the key `another-node-label-key` and
the value `another-node-label-value`.
You can use the `operator` field to specify a logical operator for Kubernetes to use when
interpreting the rules. You can use `In`, `NotIn`, `Exists`, `DoesNotExist`,
`Gt` and `Lt`.
`NotIn` and `DoesNotExist` allow you to define node anti-affinity behavior.
Alternatively, you can use [node taints](/docs/concepts/scheduling-eviction/taint-and-toleration/)
Alternatively, you can use [node taints](/docs/concepts/scheduling-eviction/taint-and-toleration/)
to repel Pods from specific nodes.
{{<note>}}
@ -168,7 +167,7 @@ The final sum is added to the score of other priority functions for the node.
Nodes with the highest total score are prioritized when the scheduler makes a
scheduling decision for the Pod.
For example, consider the following Pod spec:
For example, consider the following Pod spec:
{{< codenew file="pods/pod-with-affinity-anti-affinity.yaml" >}}
@ -268,8 +267,8 @@ to unintended behavior.
Similar to [node affinity](#node-affinity) are two types of Pod affinity and
anti-affinity as follows:
* `requiredDuringSchedulingIgnoredDuringExecution`
* `preferredDuringSchedulingIgnoredDuringExecution`
- `requiredDuringSchedulingIgnoredDuringExecution`
- `preferredDuringSchedulingIgnoredDuringExecution`
For example, you could use
`requiredDuringSchedulingIgnoredDuringExecution` affinity to tell the scheduler to
@ -297,7 +296,7 @@ The affinity rule says that the scheduler can only schedule a Pod onto a node if
the node is in the same zone as one or more existing Pods with the label
`security=S1`. More precisely, the scheduler must place the Pod on a node that has the
`topology.kubernetes.io/zone=V` label, as long as there is at least one node in
that zone that currently has one or more Pods with the Pod label `security=S1`.
that zone that currently has one or more Pods with the Pod label `security=S1`.
The anti-affinity rule says that the scheduler should try to avoid scheduling
the Pod onto a node that is in the same zone as one or more Pods with the label
@ -314,9 +313,9 @@ You can use the `In`, `NotIn`, `Exists` and `DoesNotExist` values in the
In principle, the `topologyKey` can be any allowed label key with the following
exceptions for performance and security reasons:
* For Pod affinity and anti-affinity, an empty `topologyKey` field is not allowed in both `requiredDuringSchedulingIgnoredDuringExecution`
- For Pod affinity and anti-affinity, an empty `topologyKey` field is not allowed in both `requiredDuringSchedulingIgnoredDuringExecution`
and `preferredDuringSchedulingIgnoredDuringExecution`.
* For `requiredDuringSchedulingIgnoredDuringExecution` Pod anti-affinity rules,
- For `requiredDuringSchedulingIgnoredDuringExecution` Pod anti-affinity rules,
the admission controller `LimitPodHardAntiAffinityTopology` limits
`topologyKey` to `kubernetes.io/hostname`. You can modify or disable the
admission controller if you want to allow custom topologies.
@ -328,17 +327,18 @@ If omitted or empty, `namespaces` defaults to the namespace of the Pod where the
affinity/anti-affinity definition appears.
#### Namespace selector
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
You can also select matching namespaces using `namespaceSelector`, which is a label query over the set of namespaces.
The affinity term is applied to namespaces selected by both `namespaceSelector` and the `namespaces` field.
Note that an empty `namespaceSelector` ({}) matches all namespaces, while a null or empty `namespaces` list and
Note that an empty `namespaceSelector` ({}) matches all namespaces, while a null or empty `namespaces` list and
null `namespaceSelector` matches the namespace of the Pod where the rule is defined.
#### More practical use-cases
Inter-pod affinity and anti-affinity can be even more useful when they are used with higher
level collections such as ReplicaSets, StatefulSets, Deployments, etc. These
level collections such as ReplicaSets, StatefulSets, Deployments, etc. These
rules allow you to configure that a set of workloads should
be co-located in the same defined topology; for example, preferring to place two related
Pods onto the same node.
@ -430,10 +430,10 @@ spec:
Creating the two preceding Deployments results in the following cluster layout,
where each web server is co-located with a cache, on three separate nodes.
| node-1 | node-2 | node-3 |
|:--------------------:|:-------------------:|:------------------:|
| *webserver-1* | *webserver-2* | *webserver-3* |
| *cache-1* | *cache-2* | *cache-3* |
| node-1 | node-2 | node-3 |
| :-----------: | :-----------: | :-----------: |
| *webserver-1* | *webserver-2* | *webserver-3* |
| *cache-1* | *cache-2* | *cache-3* |
The overall effect is that each cache instance is likely to be accessed by a single client, that
is running on the same node. This approach aims to minimize both skew (imbalanced load) and latency.
@ -453,13 +453,18 @@ tries to place the Pod on that node. Using `nodeName` overrules using
Some of the limitations of using `nodeName` to select nodes are:
- If the named node does not exist, the Pod will not run, and in
some cases may be automatically deleted.
- If the named node does not have the resources to accommodate the
Pod, the Pod will fail and its reason will indicate why,
for example OutOfmemory or OutOfcpu.
- Node names in cloud environments are not always predictable or
stable.
- If the named node does not exist, the Pod will not run, and in
some cases may be automatically deleted.
- If the named node does not have the resources to accommodate the
Pod, the Pod will fail and its reason will indicate why,
for example OutOfmemory or OutOfcpu.
- Node names in cloud environments are not always predictable or stable.
{{< note >}}
`nodeName` is intended for use by custom schedulers or advanced use cases where
you need to bypass any configured schedulers. Bypassing the schedulers might lead to
failed Pods if the assigned Nodes get oversubscribed. You can use [node affinity](#node-affinity) or a the [`nodeselector` field](#nodeselector) to assign a Pod to a specific Node without bypassing the schedulers.
{{</ note >}}
Here is an example of a Pod spec using the `nodeName` field:
@ -489,12 +494,10 @@ to learn more about how these work.
## {{% heading "whatsnext" %}}
* Read more about [taints and tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/) .
* Read the design docs for [node affinity](https://git.k8s.io/design-proposals-archive/scheduling/nodeaffinity.md)
- Read more about [taints and tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/) .
- Read the design docs for [node affinity](https://git.k8s.io/design-proposals-archive/scheduling/nodeaffinity.md)
and for [inter-pod affinity/anti-affinity](https://git.k8s.io/design-proposals-archive/scheduling/podaffinity.md).
* Learn about how the [topology manager](/docs/tasks/administer-cluster/topology-manager/) takes part in node-level
resource allocation decisions.
* Learn how to use [nodeSelector](/docs/tasks/configure-pod-container/assign-pods-nodes/).
* Learn how to use [affinity and anti-affinity](/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/).
- Learn about how the [topology manager](/docs/tasks/administer-cluster/topology-manager/) takes part in node-level
resource allocation decisions.
- Learn how to use [nodeSelector](/docs/tasks/configure-pod-container/assign-pods-nodes/).
- Learn how to use [affinity and anti-affinity](/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/).

View File

@ -6,16 +6,16 @@ weight: 100
{{<glossary_definition term_id="node-pressure-eviction" length="short">}}</br>
The {{<glossary_tooltip term_id="kubelet" text="kubelet">}} monitors resources
like memory, disk space, and filesystem inodes on your cluster's nodes.
When one or more of these resources reach specific consumption levels, the
The {{<glossary_tooltip term_id="kubelet" text="kubelet">}} monitors resources
like memory, disk space, and filesystem inodes on your cluster's nodes.
When one or more of these resources reach specific consumption levels, the
kubelet can proactively fail one or more pods on the node to reclaim resources
and prevent starvation.
and prevent starvation.
During a node-pressure eviction, the kubelet sets the `PodPhase` for the
selected pods to `Failed`. This terminates the pods.
selected pods to `Failed`. This terminates the pods.
Node-pressure eviction is not the same as
Node-pressure eviction is not the same as
[API-initiated eviction](/docs/concepts/scheduling-eviction/api-eviction/).
The kubelet does not respect your configured `PodDisruptionBudget` or the pod's
@ -26,7 +26,7 @@ the kubelet respects your configured `eviction-max-pod-grace-period`. If you use
If the pods are managed by a {{< glossary_tooltip text="workload" term_id="workload" >}}
resource (such as {{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}}
or {{< glossary_tooltip text="Deployment" term_id="deployment" >}}) that
replaces failed pods, the control plane or `kube-controller-manager` creates new
replaces failed pods, the control plane or `kube-controller-manager` creates new
pods in place of the evicted pods.
{{<note>}}
@ -37,16 +37,16 @@ images when disk resources are starved.
The kubelet uses various parameters to make eviction decisions, like the following:
* Eviction signals
* Eviction thresholds
* Monitoring intervals
- Eviction signals
- Eviction thresholds
- Monitoring intervals
### Eviction signals {#eviction-signals}
Eviction signals are the current state of a particular resource at a specific
point in time. Kubelet uses eviction signals to make eviction decisions by
comparing the signals to eviction thresholds, which are the minimum amount of
the resource that should be available on the node.
comparing the signals to eviction thresholds, which are the minimum amount of
the resource that should be available on the node.
Kubelet uses the following eviction signals:
@ -60,9 +60,9 @@ Kubelet uses the following eviction signals:
| `pid.available` | `pid.available` := `node.stats.rlimit.maxpid` - `node.stats.rlimit.curproc` |
In this table, the `Description` column shows how kubelet gets the value of the
signal. Each signal supports either a percentage or a literal value. Kubelet
signal. Each signal supports either a percentage or a literal value. Kubelet
calculates the percentage value relative to the total capacity associated with
the signal.
the signal.
The value for `memory.available` is derived from the cgroupfs instead of tools
like `free -m`. This is important because `free -m` does not work in a
@ -78,7 +78,7 @@ memory is reclaimable under pressure.
The kubelet supports the following filesystem partitions:
1. `nodefs`: The node's main filesystem, used for local disk volumes, emptyDir,
log storage, and more. For example, `nodefs` contains `/var/lib/kubelet/`.
log storage, and more. For example, `nodefs` contains `/var/lib/kubelet/`.
1. `imagefs`: An optional filesystem that container runtimes use to store container
images and container writable layers.
@ -102,10 +102,10 @@ eviction decisions.
Eviction thresholds have the form `[eviction-signal][operator][quantity]`, where:
* `eviction-signal` is the [eviction signal](#eviction-signals) to use.
* `operator` is the [relational operator](https://en.wikipedia.org/wiki/Relational_operator#Standard_relational_operators)
- `eviction-signal` is the [eviction signal](#eviction-signals) to use.
- `operator` is the [relational operator](https://en.wikipedia.org/wiki/Relational_operator#Standard_relational_operators)
you want, such as `<` (less than).
* `quantity` is the eviction threshold amount, such as `1Gi`. The value of `quantity`
- `quantity` is the eviction threshold amount, such as `1Gi`. The value of `quantity`
must match the quantity representation used by Kubernetes. You can use either
literal values or percentages (`%`).
@ -120,22 +120,22 @@ You can configure soft and hard eviction thresholds.
A soft eviction threshold pairs an eviction threshold with a required
administrator-specified grace period. The kubelet does not evict pods until the
grace period is exceeded. The kubelet returns an error on startup if there is no
specified grace period.
specified grace period.
You can specify both a soft eviction threshold grace period and a maximum
allowed pod termination grace period for kubelet to use during evictions. If you
specify a maximum allowed grace period and the soft eviction threshold is met,
specify a maximum allowed grace period and the soft eviction threshold is met,
the kubelet uses the lesser of the two grace periods. If you do not specify a
maximum allowed grace period, the kubelet kills evicted pods immediately without
graceful termination.
You can use the following flags to configure soft eviction thresholds:
* `eviction-soft`: A set of eviction thresholds like `memory.available<1.5Gi`
- `eviction-soft`: A set of eviction thresholds like `memory.available<1.5Gi`
that can trigger pod eviction if held over the specified grace period.
* `eviction-soft-grace-period`: A set of eviction grace periods like `memory.available=1m30s`
- `eviction-soft-grace-period`: A set of eviction grace periods like `memory.available=1m30s`
that define how long a soft eviction threshold must hold before triggering a Pod eviction.
* `eviction-max-pod-grace-period`: The maximum allowed grace period (in seconds)
- `eviction-max-pod-grace-period`: The maximum allowed grace period (in seconds)
to use when terminating pods in response to a soft eviction threshold being met.
#### Hard eviction thresholds {#hard-eviction-thresholds}
@ -144,20 +144,20 @@ A hard eviction threshold has no grace period. When a hard eviction threshold is
met, the kubelet kills pods immediately without graceful termination to reclaim
the starved resource.
You can use the `eviction-hard` flag to configure a set of hard eviction
thresholds like `memory.available<1Gi`.
You can use the `eviction-hard` flag to configure a set of hard eviction
thresholds like `memory.available<1Gi`.
The kubelet has the following default hard eviction thresholds:
* `memory.available<100Mi`
* `nodefs.available<10%`
* `imagefs.available<15%`
* `nodefs.inodesFree<5%` (Linux nodes)
- `memory.available<100Mi`
- `nodefs.available<10%`
- `imagefs.available<15%`
- `nodefs.inodesFree<5%` (Linux nodes)
These default values of hard eviction thresholds will only be set if none
of the parameters is changed. If you changed the value of any parameter,
then the values of other parameters will not be inherited as the default
values and will be set to zero. In order to provide custom values, you
These default values of hard eviction thresholds will only be set if none
of the parameters is changed. If you changed the value of any parameter,
then the values of other parameters will not be inherited as the default
values and will be set to zero. In order to provide custom values, you
should provide all the thresholds respectively.
### Eviction monitoring interval
@ -169,9 +169,9 @@ which defaults to `10s`.
The kubelet reports node conditions to reflect that the node is under pressure
because hard or soft eviction threshold is met, independent of configured grace
periods.
periods.
The kubelet maps eviction signals to node conditions as follows:
The kubelet maps eviction signals to node conditions as follows:
| Node Condition | Eviction Signal | Description |
|-------------------|---------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
@ -179,7 +179,7 @@ The kubelet maps eviction signals to node conditions as follows:
| `DiskPressure` | `nodefs.available`, `nodefs.inodesFree`, `imagefs.available`, or `imagefs.inodesFree` | Available disk space and inodes on either the node's root filesystem or image filesystem has satisfied an eviction threshold |
| `PIDPressure` | `pid.available` | Available processes identifiers on the (Linux) node has fallen below an eviction threshold |
The kubelet updates the node conditions based on the configured
The kubelet updates the node conditions based on the configured
`--node-status-update-frequency`, which defaults to `10s`.
#### Node condition oscillation
@ -197,17 +197,17 @@ condition to a different state. The transition period has a default value of `5m
The kubelet tries to reclaim node-level resources before it evicts end-user pods.
When a `DiskPressure` node condition is reported, the kubelet reclaims node-level
resources based on the filesystems on the node.
resources based on the filesystems on the node.
#### With `imagefs`
If the node has a dedicated `imagefs` filesystem for container runtimes to use,
the kubelet does the following:
* If the `nodefs` filesystem meets the eviction thresholds, the kubelet garbage collects
dead pods and containers.
* If the `imagefs` filesystem meets the eviction thresholds, the kubelet
deletes all unused images.
- If the `nodefs` filesystem meets the eviction thresholds, the kubelet garbage collects
dead pods and containers.
- If the `imagefs` filesystem meets the eviction thresholds, the kubelet
deletes all unused images.
#### Without `imagefs`
@ -220,7 +220,7 @@ the kubelet frees up disk space in the following order:
### Pod selection for kubelet eviction
If the kubelet's attempts to reclaim node-level resources don't bring the eviction
signal below the threshold, the kubelet begins to evict end-user pods.
signal below the threshold, the kubelet begins to evict end-user pods.
The kubelet uses the following parameters to determine the pod eviction order:
@ -238,7 +238,7 @@ As a result, kubelet ranks and evicts pods in the following order:
{{<note>}}
The kubelet does not use the pod's QoS class to determine the eviction order.
You can use the QoS class to estimate the most likely pod eviction order when
You can use the QoS class to estimate the most likely pod eviction order when
reclaiming resources like memory. QoS does not apply to EphemeralStorage requests,
so the above scenario will not apply if the node is, for example, under `DiskPressure`.
{{</note>}}
@ -246,7 +246,7 @@ so the above scenario will not apply if the node is, for example, under `DiskPre
`Guaranteed` pods are guaranteed only when requests and limits are specified for
all the containers and they are equal. These pods will never be evicted because
of another pod's resource consumption. If a system daemon (such as `kubelet`
and `journald`) is consuming more resources than were reserved via
and `journald`) is consuming more resources than were reserved via
`system-reserved` or `kube-reserved` allocations, and the node only has
`Guaranteed` or `Burstable` pods using less resources than requests left on it,
then the kubelet must choose to evict one of these pods to preserve node stability
@ -277,14 +277,14 @@ disk usage (`local volumes + logs & writable layer of all containers`)
In some cases, pod eviction only reclaims a small amount of the starved resource.
This can lead to the kubelet repeatedly hitting the configured eviction thresholds
and triggering multiple evictions.
and triggering multiple evictions.
You can use the `--eviction-minimum-reclaim` flag or a [kubelet config file](/docs/tasks/administer-cluster/kubelet-config-file/)
to configure a minimum reclaim amount for each resource. When the kubelet notices
that a resource is starved, it continues to reclaim that resource until it
reclaims the quantity you specify.
reclaims the quantity you specify.
For example, the following configuration sets minimum reclaim amounts:
For example, the following configuration sets minimum reclaim amounts:
```yaml
apiVersion: kubelet.config.k8s.io/v1beta1
@ -302,10 +302,10 @@ evictionMinimumReclaim:
In this example, if the `nodefs.available` signal meets the eviction threshold,
the kubelet reclaims the resource until the signal reaches the threshold of `1Gi`,
and then continues to reclaim the minimum amount of `500Mi` it until the signal
reaches `1.5Gi`.
reaches `1.5Gi`.
Similarly, the kubelet reclaims the `imagefs` resource until the `imagefs.available`
signal reaches `102Gi`.
signal reaches `102Gi`.
The default `eviction-minimum-reclaim` is `0` for all resources.
@ -336,7 +336,7 @@ for each container. It then kills the container with the highest score.
This means that containers in low QoS pods that consume a large amount of memory
relative to their scheduling requests are killed first.
Unlike pod eviction, if a container is OOM killed, the `kubelet` can restart it
Unlike pod eviction, if a container is OOM killed, the `kubelet` can restart it
based on its `RestartPolicy`.
### Best practices {#node-pressure-eviction-good-practices}
@ -351,9 +351,9 @@ immediately induce memory pressure.
Consider the following scenario:
* Node memory capacity: `10Gi`
* Operator wants to reserve 10% of memory capacity for system daemons (kernel, `kubelet`, etc.)
* Operator wants to evict Pods at 95% memory utilization to reduce incidence of system OOM.
- Node memory capacity: `10Gi`
- Operator wants to reserve 10% of memory capacity for system daemons (kernel, `kubelet`, etc.)
- Operator wants to evict Pods at 95% memory utilization to reduce incidence of system OOM.
For this to work, the kubelet is launched as follows:
@ -363,18 +363,18 @@ For this to work, the kubelet is launched as follows:
```
In this configuration, the `--system-reserved` flag reserves `1.5Gi` of memory
for the system, which is `10% of the total memory + the eviction threshold amount`.
for the system, which is `10% of the total memory + the eviction threshold amount`.
The node can reach the eviction threshold if a pod is using more than its request,
or if the system is using more than `1Gi` of memory, which makes the `memory.available`
signal fall below `500Mi` and triggers the threshold.
signal fall below `500Mi` and triggers the threshold.
#### DaemonSet
Pod Priority is a major factor in making eviction decisions. If you do not want
the kubelet to evict pods that belong to a `DaemonSet`, give those pods a high
enough `priorityClass` in the pod spec. You can also use a lower `priorityClass`
or the default to only allow `DaemonSet` pods to run when there are enough
or the default to only allow `DaemonSet` pods to run when there are enough
resources.
### Known issues
@ -386,7 +386,7 @@ The following sections describe known issues related to out of resource handling
By default, the kubelet polls `cAdvisor` to collect memory usage stats at a
regular interval. If memory usage increases within that window rapidly, the
kubelet may not observe `MemoryPressure` fast enough, and the `OOMKiller`
will still be invoked.
will still be invoked.
You can use the `--kernel-memcg-notification` flag to enable the `memcg`
notification API on the kubelet to get notified immediately when a threshold
@ -394,29 +394,29 @@ is crossed.
If you are not trying to achieve extreme utilization, but a sensible measure of
overcommit, a viable workaround for this issue is to use the `--kube-reserved`
and `--system-reserved` flags to allocate memory for the system.
and `--system-reserved` flags to allocate memory for the system.
#### active_file memory is not considered as available memory
On Linux, the kernel tracks the number of bytes of file-backed memory on active
On Linux, the kernel tracks the number of bytes of file-backed memory on active
LRU list as the `active_file` statistic. The kubelet treats `active_file` memory
areas as not reclaimable. For workloads that make intensive use of block-backed
local storage, including ephemeral local storage, kernel-level caches of file
and block data means that many recently accessed cache pages are likely to be
counted as `active_file`. If enough of these kernel block buffers are on the
active LRU list, the kubelet is liable to observe this as high resource use and
areas as not reclaimable. For workloads that make intensive use of block-backed
local storage, including ephemeral local storage, kernel-level caches of file
and block data means that many recently accessed cache pages are likely to be
counted as `active_file`. If enough of these kernel block buffers are on the
active LRU list, the kubelet is liable to observe this as high resource use and
taint the node as experiencing memory pressure - triggering pod eviction.
For more details, see [https://github.com/kubernetes/kubernetes/issues/43916](https://github.com/kubernetes/kubernetes/issues/43916)
You can work around that behavior by setting the memory limit and memory request
the same for containers likely to perform intensive I/O activity. You will need
the same for containers likely to perform intensive I/O activity. You will need
to estimate or measure an optimal memory limit value for that container.
## {{% heading "whatsnext" %}}
* Learn about [API-initiated Eviction](/docs/concepts/scheduling-eviction/api-eviction/)
* Learn about [Pod Priority and Preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption/)
* Learn about [PodDisruptionBudgets](/docs/tasks/run-application/configure-pdb/)
* Learn about [Quality of Service](/docs/tasks/configure-pod-container/quality-service-pod/) (QoS)
* Check out the [Eviction API](/docs/reference/generated/kubernetes-api/{{<param "version">}}/#create-eviction-pod-v1-core)
- Learn about [API-initiated Eviction](/docs/concepts/scheduling-eviction/api-eviction/)
- Learn about [Pod Priority and Preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption/)
- Learn about [PodDisruptionBudgets](/docs/tasks/run-application/configure-pdb/)
- Learn about [Quality of Service](/docs/tasks/configure-pod-container/quality-service-pod/) (QoS)
- Check out the [Eviction API](/docs/reference/generated/kubernetes-api/{{<param "version">}}/#create-eviction-pod-v1-core)

View File

@ -26,22 +26,7 @@ criteria that Pod should be satisfied before considered schedulable. This field
only when a Pod is created (either by the client, or mutated during admission). After creation,
each schedulingGate can be removed in arbitrary order, but addition of a new scheduling gate is disallowed.
{{<mermaid>}}
stateDiagram-v2
s1: pod created
s2: pod scheduling gated
s3: pod scheduling ready
s4: pod running
if: empty scheduling gates?
[*] --> s1
s1 --> if
s2 --> if: scheduling gate removed
if --> s2: no
if --> s3: yes
s3 --> s4
s4 --> [*]
{{< /mermaid >}}
{{< figure src="/docs/images/podSchedulingGates.svg" alt="pod-scheduling-gates-diagram" caption="Figure. Pod SchedulingGates" class="diagram-large" link="https://mermaid.live/edit#pako:eNplkktTwyAUhf8KgzuHWpukaYszutGlK3caFxQuCVMCGSDVTKf_XfKyPlhxz4HDB9wT5lYAptgHFuBRsdKxenFMClMYFIdfUdRYgbiD6ItJTEbR8wpEq5UpUfnDTf-5cbPoJjcbXdcaE61RVJIiqJvQ_Y30D-OCt-t3tFjcR5wZayiVnIGmkv4NiEfX9jijKTmmRH5jf0sRugOP0HyHUc1m6KGMFP27cM28fwSJDluPpNKaXqVJzmFNfHD2APRKSjnNFx9KhIpmzSfhVls3eHdTRrwG8QnxKfEZUUNeYTDBNbiaKRF_5dSfX-BQQQ0FpnEqQLJWhwIX5hyXsjbYl85wTINrgeC2EZd_xFQy7b_VJ6GCdd-itkxALE84dE3fAqXyIUZya6Qqe711OspVCI2ny2Vv35QqVO3-htt66ZWomAvVcZcv8yTfsiSFfJOydZoKvl_ttjLJVlJsblcJw-czwQ0zr9ZeqGDgeR77b2jD8xdtjtDn" >}}
## Usage example
To mark a Pod not-ready for scheduling, you can create it with one or more scheduling gates like this:

View File

@ -10,7 +10,7 @@ weight: 80
<!-- overview -->
In the [scheduling-plugin](/docs/reference/scheduling/config/#scheduling-plugins) `NodeResourcesFit` of kube-scheduler, there are two
In the [scheduling-plugin](/docs/reference/scheduling/config/#scheduling-plugins) `NodeResourcesFit` of kube-scheduler, there are two
scoring strategies that support the bin packing of resources: `MostAllocated` and `RequestedToCapacityRatio`.
<!-- body -->
@ -42,7 +42,7 @@ profiles:
name: NodeResourcesFit
```
To learn more about other parameters and their default configuration, see the API documentation for
To learn more about other parameters and their default configuration, see the API documentation for
[`NodeResourcesFitArgs`](/docs/reference/config-api/kube-scheduler-config.v1beta3/#kubescheduler-config-k8s-io-v1beta3-NodeResourcesFitArgs).
## Enabling bin packing using RequestedToCapacityRatio
@ -55,10 +55,10 @@ configured function of the allocated resources. The behavior of the `RequestedTo
the `NodeResourcesFit` score function can be controlled by the
[scoringStrategy](/docs/reference/config-api/kube-scheduler-config.v1beta3/#kubescheduler-config-k8s-io-v1beta3-ScoringStrategy) field.
Within the `scoringStrategy` field, you can configure two parameters: `requestedToCapacityRatio` and
`resources`. The `shape` in the `requestedToCapacityRatio`
parameter allows the user to tune the function as least requested or most
requested based on `utilization` and `score` values. The `resources` parameter
consists of `name` of the resource to be considered during scoring and `weight`
`resources`. The `shape` in the `requestedToCapacityRatio`
parameter allows the user to tune the function as least requested or most
requested based on `utilization` and `score` values. The `resources` parameter
consists of `name` of the resource to be considered during scoring and `weight`
specify the weight of each resource.
Below is an example configuration that sets
@ -87,11 +87,11 @@ profiles:
name: NodeResourcesFit
```
Referencing the `KubeSchedulerConfiguration` file with the kube-scheduler
flag `--config=/path/to/config/file` will pass the configuration to the
Referencing the `KubeSchedulerConfiguration` file with the kube-scheduler
flag `--config=/path/to/config/file` will pass the configuration to the
scheduler.
To learn more about other parameters and their default configuration, see the API documentation for
To learn more about other parameters and their default configuration, see the API documentation for
[`NodeResourcesFitArgs`](/docs/reference/config-api/kube-scheduler-config.v1beta3/#kubescheduler-config-k8s-io-v1beta3-NodeResourcesFitArgs).
### Tuning the score function
@ -100,10 +100,10 @@ To learn more about other parameters and their default configuration, see the AP
```yaml
shape:
- utilization: 0
score: 0
- utilization: 100
score: 10
- utilization: 0
score: 0
- utilization: 100
score: 10
```
The above arguments give the node a `score` of 0 if `utilization` is 0% and 10 for
@ -120,7 +120,7 @@ shape:
`resources` is an optional parameter which defaults to:
``` yaml
```yaml
resources:
- name: cpu
weight: 1
@ -128,7 +128,7 @@ resources:
weight: 1
```
It can be used to add extended resources as follows:
It can be used to add extended resources as follows:
```yaml
resources:
@ -188,8 +188,8 @@ intel.com/foo = resourceScoringFunction((2+1),4)
= (100 - ((4-3)*100/4)
= (100 - 25)
= 75 # requested + used = 75% * available
= rawScoringFunction(75)
= 7 # floor(75/10)
= rawScoringFunction(75)
= 7 # floor(75/10)
memory = resourceScoringFunction((256+256),1024)
= (100 -((1024-512)*100/1024))
@ -251,4 +251,3 @@ NodeScore = (5 * 5) + (7 * 1) + (10 * 3) / (5 + 1 + 3)
- Read more about the [scheduling framework](/docs/concepts/scheduling-eviction/scheduling-framework/)
- Read more about [scheduler configuration](/docs/reference/scheduling/config/)

View File

@ -8,8 +8,8 @@ weight: 90
<!-- overview -->
The Kubernetes API server is the main point of entry to a cluster for external parties
(users and services) interacting with it.
The Kubernetes API server is the main point of entry to a cluster for external parties
(users and services) interacting with it.
As part of this role, the API server has several key built-in security controls, such as
audit logging and {{< glossary_tooltip text="admission controllers" term_id="admission-controller" >}}.
@ -48,13 +48,13 @@ API server. However, the Pod still runs on the node. For more information, refer
### Mitigations {#static-pods-mitigations}
- Only [enable the kubelet static Pod manifest functionality](/docs/tasks/configure-pod-container/static-pod/#static-pod-creation)
if required by the node.
if required by the node.
- If a node uses the static Pod functionality, restrict filesystem access to the static Pod manifest directory
or URL to users who need the access.
or URL to users who need the access.
- Restrict access to kubelet configuration parameters and files to prevent an attacker setting
a static Pod path or URL.
a static Pod path or URL.
- Regularly audit and centrally report all access to directories or web storage locations that host
static Pod manifests and kubelet configuration files.
static Pod manifests and kubelet configuration files.
## The kubelet API {#kubelet-api}
@ -73,7 +73,7 @@ Direct access to the kubelet API is not subject to admission control and is not
by Kubernetes audit logging. An attacker with direct access to this API may be able to
bypass controls that detect or prevent certain actions.
The kubelet API can be configured to authenticate requests in a number of ways.
The kubelet API can be configured to authenticate requests in a number of ways.
By default, the kubelet configuration allows anonymous access. Most Kubernetes providers
change the default to use webhook and certificate authentication. This lets the control plane
ensure that the caller is authorized to access the `nodes` API resource or sub-resources.
@ -86,7 +86,7 @@ The default anonymous access doesn't make this assertion with the control plane.
such as by monitoring services.
- Restrict access to the kubelet port. Only allow specified and trusted IP address
ranges to access the port.
- Ensure that [kubelet authentication](/docs/reference/access-authn-authz/kubelet-authn-authz/#kubelet-authentication).
- Ensure that [kubelet authentication](/docs/reference/access-authn-authz/kubelet-authn-authz/#kubelet-authentication).
is set to webhook or certificate mode.
- Ensure that the unauthenticated "read-only" Kubelet port is not enabled on the cluster.
@ -108,7 +108,7 @@ cluster admin rights by accessing cluster secrets or modifying access rules. Eve
elevating their Kubernetes RBAC privileges, an attacker who can modify etcd can retrieve any API object
or create new workloads inside the cluster.
Many Kubernetes providers configure
Many Kubernetes providers configure
etcd to use mutual TLS (both client and server verify each other's certificate for authentication).
There is no widely accepted implementation of authorization for the etcd API, although
the feature exists. Since there is no authorization model, any certificate
@ -124,10 +124,9 @@ that are only used for health checking can also grant full read and write access
- Consider restricting access to the etcd port at a network level, to only allow access
from specified and trusted IP address ranges.
## Container runtime socket {#runtime-socket}
On each node in a Kubernetes cluster, access to interact with containers is controlled
On each node in a Kubernetes cluster, access to interact with containers is controlled
by the container runtime (or runtimes, if you have configured more than one). Typically,
the container runtime exposes a Unix socket that the kubelet can access. An attacker with
access to this socket can launch new containers or interact with running containers.
@ -139,12 +138,12 @@ control plane components.
### Mitigations {#runtime-socket-mitigations}
- Ensure that you tightly control filesystem access to container runtime sockets.
When possible, restrict this access to the `root` user.
- Ensure that you tightly control filesystem access to container runtime sockets.
When possible, restrict this access to the `root` user.
- Isolate the kubelet from other components running on the node, using
mechanisms such as Linux kernel namespaces.
mechanisms such as Linux kernel namespaces.
- Ensure that you restrict or forbid the use of [`hostPath` mounts](/docs/concepts/storage/volumes/#hostpath)
that include the container runtime socket, either directly or by mounting a parent
directory. Also `hostPath` mounts must be set as read-only to mitigate risks
of attackers bypassing directory restrictions.
that include the container runtime socket, either directly or by mounting a parent
directory. Also `hostPath` mounts must be set as read-only to mitigate risks
of attackers bypassing directory restrictions.
- Restrict user access to nodes, and especially restrict superuser access to nodes.

View File

@ -131,4 +131,7 @@ current policy level:
- [Enforcing Pod Security Standards](/docs/setup/best-practices/enforcing-pod-security-standards)
- [Enforce Pod Security Standards by Configuring the Built-in Admission Controller](/docs/tasks/configure-pod-container/enforce-standards-admission-controller)
- [Enforce Pod Security Standards with Namespace Labels](/docs/tasks/configure-pod-container/enforce-standards-namespace-labels)
- [Migrate from PodSecurityPolicy to the Built-In PodSecurity Admission Controller](/docs/tasks/configure-pod-container/migrate-from-psp)
If you are running an older version of Kubernetes and want to upgrade
to a version of Kubernetes that does not include PodSecurityPolicies,
read [migrate from PodSecurityPolicy to the Built-In PodSecurity Admission Controller](/docs/tasks/configure-pod-container/migrate-from-psp).

View File

@ -152,7 +152,7 @@ fail validation.
<tr>
<td style="white-space: nowrap">Host Ports</td>
<td>
<p>HostPorts should be disallowed, or at minimum restricted to a known list.</p>
<p>HostPorts should be disallowed entirely (recommended) or restricted to a known list</p>
<p><strong>Restricted Fields</strong></p>
<ul>
<li><code>spec.containers[*].ports[*].hostPort</code></li>
@ -162,7 +162,7 @@ fail validation.
<p><strong>Allowed Values</strong></p>
<ul>
<li>Undefined/nil</li>
<li>Known list</li>
<li>Known list (not supported by the built-in <a href="/docs/concepts/security/pod-security-admission/">Pod Security Admission controller</a>)</li>
<li><code>0</code></li>
</ul>
</td>

View File

@ -121,8 +121,20 @@ considered weak.
### Persistent volume creation
As noted in the [PodSecurityPolicy](/docs/concepts/security/pod-security-policy/#volumes-and-file-systems)
documentation, access to create PersistentVolumes can allow for escalation of access to the underlying host.
If someone - or some application - is allowed to create arbitrary PersistentVolumes, that access
includes the creation of `hostPath` volumes, which then means that a Pod would get access
to the underlying host filesystem(s) on the associated node. Granting that ability is a security risk.
There are many ways a container with unrestricted access to the host filesystem can escalate privileges, including
reading data from other containers, and abusing the credentials of system services, such as Kubelet.
You should only allow access to create PersistentVolume objects for:
- users (cluster operators) that need this access for their work, and who you trust,
- the Kubernetes control plane components which creates PersistentVolumes based on PersistentVolumeClaims
that are configured for automatic provisioning.
This is usually setup by the Kubernetes provider or by the operator when installing a CSI driver.
Where access to persistent storage is required trusted administrators should create
PersistentVolumes, and constrained users should use PersistentVolumeClaims to access that storage.

View File

@ -0,0 +1,266 @@
---
title: Service Accounts
description: >
Learn about ServiceAccount objects in Kubernetes.
content_type: concept
weight: 10
---
<!-- overview -->
This page introduces the ServiceAccount object in Kubernetes, providing
information about how service accounts work, use cases, limitations,
alternatives, and links to resources for additional guidance.
<!-- body -->
## What are service accounts? {#what-are-service-accounts}
A service account is a type of non-human account that, in Kubernetes, provides
a distinct identity in a Kubernetes cluster. Application Pods, system
components, and entities inside and outside the cluster can use a specific
ServiceAccount's credentials to identify as that ServiceAccount. This identity
is useful in various situations, including authenticating to the API server or
implementing identity-based security policies.
Service accounts exist as ServiceAccount objects in the API server. Service
accounts have the following properties:
* **Namespaced:** Each service account is bound to a Kubernetes
{{<glossary_tooltip text="namespace" term_id="namespace">}}. Every namespace
gets a [`default` ServiceAccount](#default-service-accounts) upon creation.
* **Lightweight:** Service accounts exist in the cluster and are
defined in the Kubernetes API. You can quickly create service accounts to
enable specific tasks.
* **Portable:** A configuration bundle for a complex containerized workload
might include service account definitions for the system's components. The
lightweight nature of service accounts and the namespaced identities make
the configurations portable.
Service accounts are different from user accounts, which are authenticated
human users in the cluster. By default, user accounts don't exist in the Kubernetes
API server; instead, the API server treats user identities as opaque
data. You can authenticate as a user account using multiple methods. Some
Kubernetes distributions might add custom extension APIs to represent user
accounts in the API server.
{{< table caption="Comparison between service accounts and users" >}}
| Description | ServiceAccount | User or group |
| --- | --- | --- |
| Location | Kubernetes API (ServiceAccount object) | External |
| Access control | Kubernetes RBAC or other [authorization mechanisms](/docs/reference/access-authn-authz/authorization/#authorization-modules) | Kubernetes RBAC or other identity and access management mechanisms |
| Intended use | Workloads, automation | People |
{{< /table >}}
### Default service accounts {#default-service-accounts}
When you create a cluster, Kubernetes automatically creates a ServiceAccount
object named `default` for every namespace in your cluster. The `default`
service accounts in each namespace get no permissions by default other than the
[default API discovery permissions](/docs/reference/access-authn-authz/rbac/#default-roles-and-role-bindings)
that Kubernetes grants to all authenticated principals if role-based access control (RBAC) is enabled.
If you delete the `default` ServiceAccount object in a namespace, the
{{< glossary_tooltip text="control plane" term_id="control-plane" >}}
replaces it with a new one.
If you deploy a Pod in a namespace, and you don't
[manually assign a ServiceAccount to the Pod](#assign-to-pod), Kubernetes
assigns the `default` ServiceAccount for that namespace to the Pod.
## Use cases for Kubernetes service accounts {#use-cases}
As a general guideline, you can use service accounts to provide identities in
the following scenarios:
* Your Pods need to communicate with the Kubernetes API server, for example in
situations such as the following:
* Providing read-only access to sensitive information stored in Secrets.
* Granting [cross-namespace access](#cross-namespace), such as allowing a
Pod in namespace `example` to read, list, and watch for Lease objects in
the `kube-node-lease` namespace.
* Your Pods need to communicate with an external service. For example, a
workload Pod requires an identity for a commercially available cloud API,
and the commercial provider allows configuring a suitable trust relationship.
* [Authenticating to a private image registry using an `imagePullSecret`](/docs/tasks/configure-pod-container/configure-service-account/#add-imagepullsecrets-to-a-service-account).
* An external service needs to communicate with the Kubernetes API server. For
example, authenticating to the cluster as part of a CI/CD pipeline.
* You use third-party security software in your cluster that relies on the
ServiceAccount identity of different Pods to group those Pods into different
contexts.
## How to use service accounts {#how-to-use}
To use a Kubernetes service account, you do the following:
1. Create a ServiceAccount object using a Kubernetes
client like `kubectl` or a manifest that defines the object.
1. Grant permissions to the ServiceAccount object using an authorization
mechanism such as
[RBAC](https://kubernetes.io/docs/reference/access-authn-authz/rbac/).
1. Assign the ServiceAccount object to Pods during Pod creation.
If you're using the identity from an external service,
[retrieve the ServiceAccount token](#get-a-token) and use it from that
service instead.
For instructions, refer to
[Configure Service Accounts for Pods](/docs/tasks/configure-pod-container/configure-service-account/).
### Grant permissions to a ServiceAccount {#grant-permissions}
You can use the built-in Kubernetes
[role-based access control (RBAC)](/docs/reference/access-authn-authz/rbac/)
mechanism to grant the minimum permissions required by each service account.
You create a *role*, which grants access, and then *bind* the role to your
ServiceAccount. RBAC lets you define a minimum set of permissions so that the
service account permissions follow the principle of least privilege. Pods that
use that service account don't get more permissions than are required to
function correctly.
For instructions, refer to
[ServiceAccount permissions](/docs/reference/access-authn-authz/rbac/#service-account-permissions).
#### Cross-namespace access using a ServiceAccount {#cross-namespace}
You can use RBAC to allow service accounts in one namespace to perform actions
on resources in a different namespace in the cluster. For example, consider a
scenario where you have a service account and Pod in the `dev` namespace and
you want your Pod to see Jobs running in the `maintenance` namespace. You could
create a Role object that grants permissions to list Job objects. Then,
you'd create a RoleBinding object in the `maintenance` namespace to bind the
Role to the ServiceAccount object. Now, Pods in the `dev` namespace can list
Job objects in the `maintenance` namespace using that service account.
### Assign a ServiceAccount to a Pod {#assign-to-pod}
To assign a ServiceAccount to a Pod, you set the `spec.serviceAccountName`
field in the Pod specification. Kubernetes then automatically provides the
credentials for that ServiceAccount to the Pod. In v1.22 and later, Kubernetes
gets a short-lived, **automatically rotating** token using the `TokenRequest`
API and mounts the token as a
[projected volume](/docs/concepts/storage/projected-volumes/#serviceaccounttoken).
By default, Kubernetes provides the Pod
with the credentials for an assigned ServiceAccount, whether that is the
`default` ServiceAccount or a custom ServiceAccount that you specify.
To prevent Kubernetes from automatically injecting
credentials for a specified ServiceAccount or the `default` ServiceAccount, set the
`automountServiceAccountToken` field in your Pod specification to `false`.
<!-- OK to remove this historical detail after Kubernetes 1.31 is released -->
In versions earlier than 1.22, Kubernetes provides a long-lived, static token
to the Pod as a Secret.
#### Manually retrieve ServiceAccount credentials {#get-a-token}
If you need the credentials for a ServiceAccount to mount in a non-standard
location, or for an audience that isn't the API server, use one of the
following methods:
* [TokenRequest API](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/)
(recommended): Request a short-lived service account token from within
your own *application code*. The token expires automatically and can rotate
upon expiration.
If you have a legacy application that is not aware of Kubernetes, you
could use a sidecar container within the same pod to fetch these tokens
and make them available to the application workload.
* [Token Volume Projection](/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection)
(also recommended): In Kubernetes v1.20 and later, use the Pod specification to
tell the kubelet to add the service account token to the Pod as a
*projected volume*. Projected tokens expire automatically, and the kubelet
rotates the token before it expires.
* [Service Account Token Secrets](/docs/tasks/configure-pod-container/configure-service-account/#manually-create-a-service-account-api-token)
(not recommended): You can mount service account tokens as Kubernetes
Secrets in Pods. These tokens don't expire and don't rotate. This method
is not recommended, especially at scale, because of the risks associated
with static, long-lived credentials. In Kubernetes v1.24 and later, the
[LegacyServiceAccountTokenNoAutoGeneration feature gate](/docs/reference/command-line-tools-reference/feature-gates/#feature-gates-for-graduated-or-deprecated-features)
prevents Kubernetes from automatically creating these tokens for
ServiceAccounts. `LegacyServiceAccountTokenNoAutoGeneration` is enabled
by default; in other words, Kubernetes does not create these tokens.
## Authenticating service account credentials {#authenticating-credentials}
ServiceAccounts use signed
{{<glossary_tooltip term_id="jwt" text="JSON Web Tokens">}} (JWTs)
to authenticate to the Kubernetes API server, and to any other system where a
trust relationship exists. Depending on how the token was issued
(either time-limited using a `TokenRequest` or using a legacy mechanism with
a Secret), a ServiceAccount token might also have an expiry time, an audience,
and a time after which the token *starts* being valid. When a client that is
acting as a ServiceAccount tries to communicate with the Kubernetes API server,
the client includes an `Authorization: Bearer <token>` header with the HTTP
request. The API server checks the validity of that bearer token as follows:
1. Check the token signature.
1. Check whether the token has expired.
1. Check whether object references in the token claims are currently valid.
1. Check whether the token is currently valid.
1. Check the audience claims.
The TokenRequest API produces _bound tokens_ for a ServiceAccount. This
binding is linked to the lifetime of the client, such as a Pod, that is acting
as that ServiceAccount.
For tokens issued using the `TokenRequest` API, the API server also checks that
the specific object reference that is using the ServiceAccount still exists,
matching by the {{< glossary_tooltip term_id="uid" text="unique ID" >}} of that
object. For legacy tokens that are mounted as Secrets in Pods, the API server
checks the token against the Secret.
For more information about the authentication process, refer to
[Authentication](/docs/reference/access-authn-authz/authentication/#service-account-tokens).
### Authenticating service account credentials in your own code {#authenticating-in-code}
If you have services of your own that need to validate Kubernetes service
account credentials, you can use the following methods:
* [TokenReview API](/docs/reference/kubernetes-api/authentication-resources/token-review-v1/)
(recommended)
* OIDC discovery
The Kubernetes project recommends that you use the TokenReview API, because
this method invalidates tokens that are bound to API objects such as Secrets,
ServiceAccounts, and Pods when those objects are deleted. For example, if you
delete the Pod that contains a projected ServiceAccount token, the cluster
invalidates that token immediately and a TokenReview immediately fails.
If you use OIDC validation instead, your clients continue to treat the token
as valid until the token reaches its expiration timestamp.
Your application should always define the audience that it accepts, and should
check that the token's audiences match the audiences that the application
expects. This helps to minimize the scope of the token so that it can only be
used in your application and nowhere else.
## Alternatives
* Issue your own tokens using another mechanism, and then use
[Webhook Token Authentication](/docs/reference/access-authn-authz/authentication/#webhook-token-authentication)
to validate bearer tokens using your own validation service.
* Provide your own identities to Pods.
* [Use the SPIFFE CSI driver plugin to provide SPIFFE SVIDs as X.509 certificate pairs to Pods](https://cert-manager.io/docs/projects/csi-driver-spiffe/).
{{% thirdparty-content single="true" %}}
* [Use a service mesh such as Istio to provide certificates to Pods](https://istio.io/latest/docs/tasks/security/cert-management/plugin-ca-cert/).
* Authenticate from outside the cluster to the API server without using service account tokens:
* [Configure the API server to accept OpenID Connect (OIDC) tokens from your identity provider](/docs/reference/access-authn-authz/authentication/#openid-connect-tokens).
* Use service accounts or user accounts created using an external Identity
and Access Management (IAM) service, such as from a cloud provider, to
authenticate to your cluster.
* [Use the CertificateSigningRequest API with client certificates](/docs/tasks/tls/managing-tls-in-a-cluster/).
* [Configure the kubelet to retrieve credentials from an image registry](/docs/tasks/administer-cluster/kubelet-credential-provider/).
* Use a Device Plugin to access a virtual Trusted Platform Module (TPM), which
then allows authentication using a private key.
## {{% heading "whatsnext" %}}
* Learn how to [manage your ServiceAccounts as a cluster administrator](/docs/reference/access-authn-authz/service-accounts-admin/).
* Learn how to [assign a ServiceAccount to a Pod](/docs/tasks/configure-pod-container/configure-service-account/).
* Read the [ServiceAccount API reference](/docs/reference/kubernetes-api/authentication-resources/service-account-v1/).

View File

@ -306,7 +306,7 @@ When the Pod above is created, the container `test` gets the following contents
in its `/etc/resolv.conf` file:
```
nameserver 1.2.3.4
nameserver 192.0.2.1
search ns1.svc.cluster-domain.example my.dns.search.suffix
options ndots:2 edns0
```

View File

@ -104,7 +104,7 @@ the pod is also terminating.
{{< note >}}
Although `serving` is almost identical to `ready`, it was added to prevent break the existing meaning
Although `serving` is almost identical to `ready`, it was added to prevent breaking the existing meaning
of `ready`. It may be unexpected for existing clients if `ready` could be `true` for terminating
endpoints, since historically terminating endpoints were never included in the Endpoints or
EndpointSlice API to begin with. For this reason, `ready` is _always_ `false` for terminating

View File

@ -69,7 +69,7 @@ The name of an Ingress object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
For general information about working with config files, see [deploying applications](/docs/tasks/run-application/run-stateless-application-deployment/), [configuring containers](/docs/tasks/configure-pod-container/configure-pod-configmap/), [managing resources](/docs/concepts/cluster-administration/manage-deployment/).
Ingress frequently uses annotations to configure some options depending on the Ingress controller, an example of which
is the [rewrite-target annotation](https://github.com/kubernetes/ingress-nginx/blob/master/docs/examples/rewrite/README.md).
is the [rewrite-target annotation](https://github.com/kubernetes/ingress-nginx/blob/main/docs/examples/rewrite/README.md).
Different [Ingress controllers](/docs/concepts/services-networking/ingress-controllers) support different annotations. Review the documentation for
your choice of Ingress controller to learn which annotations are supported.

View File

@ -18,22 +18,23 @@ weight: 10
{{< glossary_definition term_id="service" length="short" >}}
With Kubernetes you don't need to modify your application to use an unfamiliar service discovery mechanism.
Kubernetes gives Pods their own IP addresses and a single DNS name for a set of Pods,
and can load-balance across them.
A key aim of Services in Kubernetes is that you don't need to modify your existing
application to use an unfamiliar service discovery mechanism.
You can run code in Pods, whether this is a code designed for a cloud-native world, or
an older app you've containerized. You use a Service to make that set of Pods available
on the network so that clients can interact with it.
<!-- body -->
## Motivation
Kubernetes {{< glossary_tooltip term_id="pod" text="Pods" >}} are created and destroyed
to match the desired state of your cluster. Pods are nonpermanent resources.
If you use a {{< glossary_tooltip term_id="deployment" >}} to run your app,
it can create and destroy Pods dynamically.
that Deployment can create and destroy Pods dynamically. From one moment to the next,
you don't know how many of those Pods are working and healthy; you might not even know
what those healthy Pods are named.
Kubernetes {{< glossary_tooltip term_id="pod" text="Pods" >}} are created and destroyed
to match the desired state of your cluster. Pods are emphemeral resources (you should not
expect that an individual Pod is reliable and durable).
Each Pod gets its own IP address, however in a Deployment, the set of Pods
running in one moment in time could be different from
the set of Pods running that application a moment later.
Each Pod gets its own IP address (Kubernetes expects network plugins to ensure this).
For a given Deployment in your cluster, the set of Pods running in one moment in
time could be different from the set of Pods running that application a moment later.
This leads to a problem: if some set of Pods (call them "backends") provides
functionality to other Pods (call them "frontends") inside your cluster,
@ -42,14 +43,13 @@ to, so that the frontend can use the backend part of the workload?
Enter _Services_.
## Service resources {#service-resource}
<!-- body -->
In Kubernetes, a Service is an abstraction which defines a logical set of Pods
and a policy by which to access them (sometimes this pattern is called
a micro-service). The set of Pods targeted by a Service is usually determined
by a {{< glossary_tooltip text="selector" term_id="selector" >}}.
To learn about other ways to define Service endpoints,
see [Services _without_ selectors](#services-without-selectors).
## Services in Kubernetes
The Service API, part of Kubernetes, is an abstraction to help you expose groups of
Pods over a network. Each Service object defines a logical set of endpoints (usually
these endpoints are Pods) along with a policy about how to make those pods accessible.
For example, consider a stateless image-processing backend which is running with
3 replicas. Those replicas are fungible&mdash;frontends do not care which backend
@ -59,6 +59,26 @@ track of the set of backends themselves.
The Service abstraction enables this decoupling.
The set of Pods targeted by a Service is usually determined
by a {{< glossary_tooltip text="selector" term_id="selector" >}} that you
define.
To learn about other ways to define Service endpoints,
see [Services _without_ selectors](#services-without-selectors).
If your workload speaks HTTP, you might choose to use an
[Ingress](/docs/concepts/services-networking/ingress/) to control how web traffic
reaches that workload.
Ingress is not a Service type, but it acts as the entry point for your
cluster. An Ingress lets you consolidate your routing rules into a single resource, so
that you can expose multiple components of your workload, running separately in your
cluster, behind a single listener.
The [Gateway](https://gateway-api.sigs.k8s.io/#what-is-the-gateway-api) API for Kubernetes
provides extra capabilities beyond Ingress and Service. You can add Gateway to your cluster -
it is a family of extension APIs, implemented using
{{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinitions" >}} -
and then use these to configure access to network services that are running in your cluster.
### Cloud-native service discovery
If you're able to use Kubernetes APIs for service discovery in your application,
@ -69,16 +89,20 @@ whenever the set of Pods in a Service changes.
For non-native applications, Kubernetes offers ways to place a network port or load
balancer in between your application and the backend Pods.
Either way, your workload can use these [service discovery](#discovering-services)
mechanisms to find the target it wants to connect to.
## Defining a Service
A Service in Kubernetes is a REST object, similar to a Pod. Like all of the
REST objects, you can `POST` a Service definition to the API server to create
a new instance.
The name of a Service object must be a valid
[RFC 1035 label name](/docs/concepts/overview/working-with-objects/names#rfc-1035-label-names).
A Service in Kubernetes is an
{{< glossary_tooltip text="object" term_id="object" >}}
(the same way that a Pod or a ConfigMap is an object). You can create,
view or modify Service definitions using the Kubernetes API. Usually
you use a tool such as `kubectl` to make those API calls for you.
For example, suppose you have a set of Pods where each listens on TCP port 9376
and contains a label `app.kubernetes.io/name=MyApp`:
For example, suppose you have a set of Pods that each listen on TCP port 9376
and are labelled as `app.kubernetes.io/name=MyApp`. You can define a Service to
publish that TCP listener:
```yaml
apiVersion: v1
@ -94,16 +118,20 @@ spec:
targetPort: 9376
```
This specification creates a new Service object named "my-service", which
targets TCP port 9376 on any Pod with the `app.kubernetes.io/name=MyApp` label.
Applying this manifest creates a new Service named "my-service", which
targets TCP port 9376 on any Pod with the `app.kubernetes.io/name: MyApp` label.
Kubernetes assigns this Service an IP address (sometimes called the "cluster IP"),
which is used by the Service proxies
(see [Virtual IP addressing mechanism](#virtual-ip-addressing-mechanism) below).
Kubernetes assigns this Service an IP address (the _cluster IP_),
that is used by the virtual IP address mechanism. For more details on that mechanism,
read [Virtual IPs and Service Proxies](/docs/reference/networking/virtual-ips/).
The controller for that Service continuously scans for Pods that
match its selector, and then makes any necessary updates to the set of
EndpointSlices for the Service.
The name of a Service object must be a valid
[RFC 1035 label name](/docs/concepts/overview/working-with-objects/names#rfc-1035-label-names).
The controller for the Service selector continuously scans for Pods that
match its selector, and then POSTs any updates to an Endpoint object
also named "my-service".
{{< note >}}
A Service can map _any_ incoming `port` to a `targetPort`. By default and
@ -177,8 +205,8 @@ For example:
* You are migrating a workload to Kubernetes. While evaluating the approach,
you run only a portion of your backends in Kubernetes.
In any of these scenarios you can define a Service _without_ a Pod selector.
For example:
In any of these scenarios you can define a Service _without_ specifying a
selector to match Pods. For example:
```yaml
apiVersion: v1
@ -262,9 +290,9 @@ selector will fail due to this constraint. This prevents the Kubernetes API serv
from being used as a proxy to endpoints the caller may not be authorized to access.
{{< /note >}}
An ExternalName Service is a special case of Service that does not have
An `ExternalName` Service is a special case of Service that does not have
selectors and uses DNS names instead. For more information, see the
[ExternalName](#externalname) section later in this document.
[ExternalName](#externalname) section.
### EndpointSlices
@ -436,7 +464,7 @@ the port number for `http`, as well as the IP address.
The Kubernetes DNS server is the only way to access `ExternalName` Services.
You can find more information about `ExternalName` resolution in
[DNS Pods and Services](/docs/concepts/services-networking/dns-pod-service/).
[DNS for Services and Pods](/docs/concepts/services-networking/dns-pod-service/).
## Headless Services
@ -483,6 +511,8 @@ Kubernetes `ServiceTypes` allow you to specify what kind of Service you want.
* `ClusterIP`: Exposes the Service on a cluster-internal IP. Choosing this value
makes the Service only reachable from within the cluster. This is the
default that is used if you don't explicitly specify a `type` for a Service.
You can expose the service to the public with an [Ingress](/docs/concepts/services-networking/ingress/) or the
[Gateway API](https://gateway-api.sigs.k8s.io/).
* [`NodePort`](#type-nodeport): Exposes the Service on each Node's IP at a static port
(the `NodePort`).
To make the node port available, Kubernetes sets up a cluster IP address,
@ -702,7 +732,7 @@ In a split-horizon DNS environment you would need two Services to be able to rou
and internal traffic to your endpoints.
To set an internal load balancer, add one of the following annotations to your Service
depending on the cloud Service provider you're using.
depending on the cloud service provider you're using:
{{< tabs name="service_tabs" >}}
{{% tab name="Default" %}}
@ -1149,9 +1179,9 @@ spec:
- name: http
protocol: TCP
port: 80
targetPort: 9376
targetPort: 49152
externalIPs:
- 80.11.12.10
- 198.51.100.32
```
## Session stickiness
@ -1176,12 +1206,17 @@ mechanism Kubernetes provides to expose a Service with a virtual IP address.
## {{% heading "whatsnext" %}}
* Follow the [Connecting Applications with Services](/docs/tutorials/services/connect-applications-service/) tutorial
* Read about [Ingress](/docs/concepts/services-networking/ingress/)
* Read about [EndpointSlices](/docs/concepts/services-networking/endpoint-slices/)
Learn more about Services and how they fit into Kubernetes:
* Follow the [Connecting Applications with Services](/docs/tutorials/services/connect-applications-service/) tutorial.
* Read about [Ingress](/docs/concepts/services-networking/ingress/), which
exposes HTTP and HTTPS routes from outside the cluster to Services within
your cluster.
* Read about [Gateway](https://gateway-api.sigs.k8s.io/), an extension to
Kubernetes that provides more flexibility than Ingress.
For more context:
* Read [Virtual IPs and Service Proxies](/docs/reference/networking/virtual-ips/)
* Read the [API reference](/docs/reference/kubernetes-api/service-resources/service-v1/) for the Service API
* Read the [API reference](/docs/reference/kubernetes-api/service-resources/endpoints-v1/) for the Endpoints API
* Read the [API reference](/docs/reference/kubernetes-api/service-resources/endpoint-slice-v1/) for the EndpointSlice API
For more context, read the following:
* [Virtual IPs and Service Proxies](/docs/reference/networking/virtual-ips/)
* [EndpointSlices](/docs/concepts/services-networking/endpoint-slices/)
* [Service API reference](/docs/reference/kubernetes-api/service-resources/service-v1/)
* [EndpointSlice API reference](/docs/reference/kubernetes-api/service-resources/endpoint-slice-v1/)
* [Endpoint API reference (legacy)](/docs/reference/kubernetes-api/service-resources/endpoints-v1/)

View File

@ -126,7 +126,7 @@ zone.
5. **A zone is not represented in hints:** If the kube-proxy is unable to find
at least one endpoint with a hint targeting the zone it is running in, it falls
to using endpoints from all zones. This is most likely to happen as you add
back to using endpoints from all zones. This is most likely to happen as you add
a new zone into your existing cluster.
## Constraints

View File

@ -16,25 +16,45 @@ weight: 20
<!-- overview -->
This document describes _persistent volumes_ in Kubernetes. Familiarity with [volumes](/docs/concepts/storage/volumes/) is suggested.
This document describes _persistent volumes_ in Kubernetes. Familiarity with
[volumes](/docs/concepts/storage/volumes/) is suggested.
<!-- body -->
## Introduction
Managing storage is a distinct problem from managing compute instances. The PersistentVolume subsystem provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed. To do this, we introduce two new API resources: PersistentVolume and PersistentVolumeClaim.
Managing storage is a distinct problem from managing compute instances.
The PersistentVolume subsystem provides an API for users and administrators
that abstracts details of how storage is provided from how it is consumed.
To do this, we introduce two new API resources: PersistentVolume and PersistentVolumeClaim.
A _PersistentVolume_ (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using [Storage Classes](/docs/concepts/storage/storage-classes/). It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A _PersistentVolume_ (PV) is a piece of storage in the cluster that has been
provisioned by an administrator or dynamically provisioned using
[Storage Classes](/docs/concepts/storage/storage-classes/). It is a resource in
the cluster just like a node is a cluster resource. PVs are volume plugins like
Volumes, but have a lifecycle independent of any individual Pod that uses the PV.
This API object captures the details of the implementation of the storage, be that
NFS, iSCSI, or a cloud-provider-specific storage system.
A _PersistentVolumeClaim_ (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany or ReadWriteMany, see [AccessModes](#access-modes)).
A _PersistentVolumeClaim_ (PVC) is a request for storage by a user. It is similar
to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can
request specific levels of resources (CPU and Memory). Claims can request specific
size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany or
ReadWriteMany, see [AccessModes](#access-modes)).
While PersistentVolumeClaims allow a user to consume abstract storage resources, it is common that users need PersistentVolumes with varying properties, such as performance, for different problems. Cluster administrators need to be able to offer a variety of PersistentVolumes that differ in more ways than size and access modes, without exposing users to the details of how those volumes are implemented. For these needs, there is the _StorageClass_ resource.
While PersistentVolumeClaims allow a user to consume abstract storage resources,
it is common that users need PersistentVolumes with varying properties, such as
performance, for different problems. Cluster administrators need to be able to
offer a variety of PersistentVolumes that differ in more ways than size and access
modes, without exposing users to the details of how those volumes are implemented.
For these needs, there is the _StorageClass_ resource.
See the [detailed walkthrough with working examples](/docs/tasks/configure-pod-container/configure-persistent-volume-storage/).
## Lifecycle of a volume and claim
PVs are resources in the cluster. PVCs are requests for those resources and also act as claim checks to the resource. The interaction between PVs and PVCs follows this lifecycle:
PVs are resources in the cluster. PVCs are requests for those resources and also act
as claim checks to the resource. The interaction between PVs and PVCs follows this lifecycle:
### Provisioning
@ -42,7 +62,9 @@ There are two ways PVs may be provisioned: statically or dynamically.
#### Static
A cluster administrator creates a number of PVs. They carry the details of the real storage, which is available for use by cluster users. They exist in the Kubernetes API and are available for consumption.
A cluster administrator creates a number of PVs. They carry the details of the
real storage, which is available for use by cluster users. They exist in the
Kubernetes API and are available for consumption.
#### Dynamic
@ -55,7 +77,8 @@ provisioning to occur. Claims that request the class `""` effectively disable
dynamic provisioning for themselves.
To enable dynamic storage provisioning based on storage class, the cluster administrator
needs to enable the `DefaultStorageClass` [admission controller](/docs/reference/access-authn-authz/admission-controllers/#defaultstorageclass)
needs to enable the `DefaultStorageClass`
[admission controller](/docs/reference/access-authn-authz/admission-controllers/#defaultstorageclass)
on the API server. This can be done, for example, by ensuring that `DefaultStorageClass` is
among the comma-delimited, ordered list of values for the `--enable-admission-plugins` flag of
the API server component. For more information on API server command-line flags,
@ -63,26 +86,51 @@ check [kube-apiserver](/docs/admin/kube-apiserver/) documentation.
### Binding
A user creates, or in the case of dynamic provisioning, has already created, a PersistentVolumeClaim with a specific amount of storage requested and with certain access modes. A control loop in the master watches for new PVCs, finds a matching PV (if possible), and binds them together. If a PV was dynamically provisioned for a new PVC, the loop will always bind that PV to the PVC. Otherwise, the user will always get at least what they asked for, but the volume may be in excess of what was requested. Once bound, PersistentVolumeClaim binds are exclusive, regardless of how they were bound. A PVC to PV binding is a one-to-one mapping, using a ClaimRef which is a bi-directional binding between the PersistentVolume and the PersistentVolumeClaim.
A user creates, or in the case of dynamic provisioning, has already created,
a PersistentVolumeClaim with a specific amount of storage requested and with
certain access modes. A control loop in the master watches for new PVCs, finds
a matching PV (if possible), and binds them together. If a PV was dynamically
provisioned for a new PVC, the loop will always bind that PV to the PVC. Otherwise,
the user will always get at least what they asked for, but the volume may be in
excess of what was requested. Once bound, PersistentVolumeClaim binds are exclusive,
regardless of how they were bound. A PVC to PV binding is a one-to-one mapping,
using a ClaimRef which is a bi-directional binding between the PersistentVolume
and the PersistentVolumeClaim.
Claims will remain unbound indefinitely if a matching volume does not exist. Claims will be bound as matching volumes become available. For example, a cluster provisioned with many 50Gi PVs would not match a PVC requesting 100Gi. The PVC can be bound when a 100Gi PV is added to the cluster.
Claims will remain unbound indefinitely if a matching volume does not exist.
Claims will be bound as matching volumes become available. For example, a
cluster provisioned with many 50Gi PVs would not match a PVC requesting 100Gi.
The PVC can be bound when a 100Gi PV is added to the cluster.
### Using
Pods use claims as volumes. The cluster inspects the claim to find the bound volume and mounts that volume for a Pod. For volumes that support multiple access modes, the user specifies which mode is desired when using their claim as a volume in a Pod.
Pods use claims as volumes. The cluster inspects the claim to find the bound
volume and mounts that volume for a Pod. For volumes that support multiple
access modes, the user specifies which mode is desired when using their claim
as a volume in a Pod.
Once a user has a claim and that claim is bound, the bound PV belongs to the user for as long as they need it. Users schedule Pods and access their claimed PVs by including a `persistentVolumeClaim` section in a Pod's `volumes` block. See [Claims As Volumes](#claims-as-volumes) for more details on this.
Once a user has a claim and that claim is bound, the bound PV belongs to the
user for as long as they need it. Users schedule Pods and access their claimed
PVs by including a `persistentVolumeClaim` section in a Pod's `volumes` block.
See [Claims As Volumes](#claims-as-volumes) for more details on this.
### Storage Object in Use Protection
The purpose of the Storage Object in Use Protection feature is to ensure that PersistentVolumeClaims (PVCs) in active use by a Pod and PersistentVolume (PVs) that are bound to PVCs are not removed from the system, as this may result in data loss.
The purpose of the Storage Object in Use Protection feature is to ensure that
PersistentVolumeClaims (PVCs) in active use by a Pod and PersistentVolume (PVs)
that are bound to PVCs are not removed from the system, as this may result in data loss.
{{< note >}}
PVC is in active use by a Pod when a Pod object exists that is using the PVC.
{{< /note >}}
If a user deletes a PVC in active use by a Pod, the PVC is not removed immediately. PVC removal is postponed until the PVC is no longer actively used by any Pods. Also, if an admin deletes a PV that is bound to a PVC, the PV is not removed immediately. PV removal is postponed until the PV is no longer bound to a PVC.
If a user deletes a PVC in active use by a Pod, the PVC is not removed immediately.
PVC removal is postponed until the PVC is no longer actively used by any Pods. Also,
if an admin deletes a PV that is bound to a PVC, the PV is not removed immediately.
PV removal is postponed until the PV is no longer bound to a PVC.
You can see that a PVC is protected when the PVC's status is `Terminating` and the `Finalizers` list includes `kubernetes.io/pvc-protection`:
You can see that a PVC is protected when the PVC's status is `Terminating` and the
`Finalizers` list includes `kubernetes.io/pvc-protection`:
```shell
kubectl describe pvc hostpath
@ -98,7 +146,8 @@ Finalizers: [kubernetes.io/pvc-protection]
...
```
You can see that a PV is protected when the PV's status is `Terminating` and the `Finalizers` list includes `kubernetes.io/pv-protection` too:
You can see that a PV is protected when the PV's status is `Terminating` and
the `Finalizers` list includes `kubernetes.io/pv-protection` too:
```shell
kubectl describe pv task-pv-volume
@ -122,29 +171,48 @@ Events: <none>
### Reclaiming
When a user is done with their volume, they can delete the PVC objects from the API that allows reclamation of the resource. The reclaim policy for a PersistentVolume tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be Retained, Recycled, or Deleted.
When a user is done with their volume, they can delete the PVC objects from the
API that allows reclamation of the resource. The reclaim policy for a PersistentVolume
tells the cluster what to do with the volume after it has been released of its claim.
Currently, volumes can either be Retained, Recycled, or Deleted.
#### Retain
The `Retain` reclaim policy allows for manual reclamation of the resource. When the PersistentVolumeClaim is deleted, the PersistentVolume still exists and the volume is considered "released". But it is not yet available for another claim because the previous claimant's data remains on the volume. An administrator can manually reclaim the volume with the following steps.
The `Retain` reclaim policy allows for manual reclamation of the resource.
When the PersistentVolumeClaim is deleted, the PersistentVolume still exists
and the volume is considered "released". But it is not yet available for
another claim because the previous claimant's data remains on the volume.
An administrator can manually reclaim the volume with the following steps.
1. Delete the PersistentVolume. The associated storage asset in external infrastructure (such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume) still exists after the PV is deleted.
1. Delete the PersistentVolume. The associated storage asset in external infrastructure
(such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume) still exists after the PV is deleted.
1. Manually clean up the data on the associated storage asset accordingly.
1. Manually delete the associated storage asset.
If you want to reuse the same storage asset, create a new PersistentVolume with the same storage asset definition.
If you want to reuse the same storage asset, create a new PersistentVolume with
the same storage asset definition.
#### Delete
For volume plugins that support the `Delete` reclaim policy, deletion removes both the PersistentVolume object from Kubernetes, as well as the associated storage asset in the external infrastructure, such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume. Volumes that were dynamically provisioned inherit the [reclaim policy of their StorageClass](#reclaim-policy), which defaults to `Delete`. The administrator should configure the StorageClass according to users' expectations; otherwise, the PV must be edited or patched after it is created. See [Change the Reclaim Policy of a PersistentVolume](/docs/tasks/administer-cluster/change-pv-reclaim-policy/).
For volume plugins that support the `Delete` reclaim policy, deletion removes
both the PersistentVolume object from Kubernetes, as well as the associated
storage asset in the external infrastructure, such as an AWS EBS, GCE PD,
Azure Disk, or Cinder volume. Volumes that were dynamically provisioned
inherit the [reclaim policy of their StorageClass](#reclaim-policy), which
defaults to `Delete`. The administrator should configure the StorageClass
according to users' expectations; otherwise, the PV must be edited or
patched after it is created. See
[Change the Reclaim Policy of a PersistentVolume](/docs/tasks/administer-cluster/change-pv-reclaim-policy/).
#### Recycle
{{< warning >}}
The `Recycle` reclaim policy is deprecated. Instead, the recommended approach is to use dynamic provisioning.
The `Recycle` reclaim policy is deprecated. Instead, the recommended approach
is to use dynamic provisioning.
{{< /warning >}}
If supported by the underlying volume plugin, the `Recycle` reclaim policy performs a basic scrub (`rm -rf /thevolume/*`) on the volume and makes it available again for a new claim.
If supported by the underlying volume plugin, the `Recycle` reclaim policy performs
a basic scrub (`rm -rf /thevolume/*`) on the volume and makes it available again for a new claim.
However, an administrator can configure a custom recycler Pod template using
the Kubernetes controller manager command line arguments as described in the
@ -173,7 +241,8 @@ spec:
mountPath: /scrub
```
However, the particular path specified in the custom recycler Pod template in the `volumes` part is replaced with the particular path of the volume that is being recycled.
However, the particular path specified in the custom recycler Pod template in the
`volumes` part is replaced with the particular path of the volume that is being recycled.
### PersistentVolume deletion protection finalizer
{{< feature-state for_k8s_version="v1.23" state="alpha" >}}
@ -181,10 +250,12 @@ However, the particular path specified in the custom recycler Pod template in th
Finalizers can be added on a PersistentVolume to ensure that PersistentVolumes
having `Delete` reclaim policy are deleted only after the backing storage are deleted.
The newly introduced finalizers `kubernetes.io/pv-controller` and `external-provisioner.volume.kubernetes.io/finalizer`
The newly introduced finalizers `kubernetes.io/pv-controller` and
`external-provisioner.volume.kubernetes.io/finalizer`
are only added to dynamically provisioned volumes.
The finalizer `kubernetes.io/pv-controller` is added to in-tree plugin volumes. The following is an example
The finalizer `kubernetes.io/pv-controller` is added to in-tree plugin volumes.
The following is an example
```shell
kubectl describe pv pvc-74a498d6-3929-47e8-8c02-078c1ece4d78
@ -213,6 +284,7 @@ Events: <none>
The finalizer `external-provisioner.volume.kubernetes.io/finalizer` is added for CSI volumes.
The following is an example:
```shell
Name: pvc-2f0bab97-85a8-4552-8044-eb8be45cf48d
Labels: <none>
@ -244,14 +316,17 @@ the `kubernetes.io/pv-controller` finalizer is replaced by the
### Reserving a PersistentVolume
The control plane can [bind PersistentVolumeClaims to matching PersistentVolumes](#binding) in the
cluster. However, if you want a PVC to bind to a specific PV, you need to pre-bind them.
The control plane can [bind PersistentVolumeClaims to matching PersistentVolumes](#binding)
in the cluster. However, if you want a PVC to bind to a specific PV, you need to pre-bind them.
By specifying a PersistentVolume in a PersistentVolumeClaim, you declare a binding between that specific PV and PVC.
If the PersistentVolume exists and has not reserved PersistentVolumeClaims through its `claimRef` field, then the PersistentVolume and PersistentVolumeClaim will be bound.
By specifying a PersistentVolume in a PersistentVolumeClaim, you declare a binding
between that specific PV and PVC. If the PersistentVolume exists and has not reserved
PersistentVolumeClaims through its `claimRef` field, then the PersistentVolume and
PersistentVolumeClaim will be bound.
The binding happens regardless of some volume matching criteria, including node affinity.
The control plane still checks that [storage class](/docs/concepts/storage/storage-classes/), access modes, and requested storage size are valid.
The control plane still checks that [storage class](/docs/concepts/storage/storage-classes/),
access modes, and requested storage size are valid.
```yaml
apiVersion: v1
@ -265,7 +340,10 @@ spec:
...
```
This method does not guarantee any binding privileges to the PersistentVolume. If other PersistentVolumeClaims could use the PV that you specify, you first need to reserve that storage volume. Specify the relevant PersistentVolumeClaim in the `claimRef` field of the PV so that other PVCs can not bind to it.
This method does not guarantee any binding privileges to the PersistentVolume.
If other PersistentVolumeClaims could use the PV that you specify, you first
need to reserve that storage volume. Specify the relevant PersistentVolumeClaim
in the `claimRef` field of the PV so that other PVCs can not bind to it.
```yaml
apiVersion: v1
@ -334,8 +412,9 @@ increased and that no resize is necessary.
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
Support for expanding CSI volumes is enabled by default but it also requires a specific CSI driver to support volume expansion. Refer to documentation of the specific CSI driver for more information.
Support for expanding CSI volumes is enabled by default but it also requires a
specific CSI driver to support volume expansion. Refer to documentation of the
specific CSI driver for more information.
#### Resizing a volume containing a file system
@ -364,22 +443,33 @@ FlexVolume resize is possible only when the underlying driver supports resize.
{{< /note >}}
{{< note >}}
Expanding EBS volumes is a time-consuming operation. Also, there is a per-volume quota of one modification every 6 hours.
Expanding EBS volumes is a time-consuming operation.
Also, there is a per-volume quota of one modification every 6 hours.
{{< /note >}}
#### Recovering from Failure when Expanding Volumes
If a user specifies a new size that is too big to be satisfied by underlying storage system, expansion of PVC will be continuously retried until user or cluster administrator takes some action. This can be undesirable and hence Kubernetes provides following methods of recovering from such failures.
If a user specifies a new size that is too big to be satisfied by underlying
storage system, expansion of PVC will be continuously retried until user or
cluster administrator takes some action. This can be undesirable and hence
Kubernetes provides following methods of recovering from such failures.
{{< tabs name="recovery_methods" >}}
{{% tab name="Manually with Cluster Administrator access" %}}
If expanding underlying storage fails, the cluster administrator can manually recover the Persistent Volume Claim (PVC) state and cancel the resize requests. Otherwise, the resize requests are continuously retried by the controller without administrator intervention.
If expanding underlying storage fails, the cluster administrator can manually
recover the Persistent Volume Claim (PVC) state and cancel the resize requests.
Otherwise, the resize requests are continuously retried by the controller without
administrator intervention.
1. Mark the PersistentVolume(PV) that is bound to the PersistentVolumeClaim(PVC) with `Retain` reclaim policy.
2. Delete the PVC. Since PV has `Retain` reclaim policy - we will not lose any data when we recreate the PVC.
3. Delete the `claimRef` entry from PV specs, so as new PVC can bind to it. This should make the PV `Available`.
4. Re-create the PVC with smaller size than PV and set `volumeName` field of the PVC to the name of the PV. This should bind new PVC to existing PV.
1. Mark the PersistentVolume(PV) that is bound to the PersistentVolumeClaim(PVC)
with `Retain` reclaim policy.
2. Delete the PVC. Since PV has `Retain` reclaim policy - we will not lose any data
when we recreate the PVC.
3. Delete the `claimRef` entry from PV specs, so as new PVC can bind to it.
This should make the PV `Available`.
4. Re-create the PVC with smaller size than PV and set `volumeName` field of the
PVC to the name of the PV. This should bind new PVC to existing PV.
5. Don't forget to restore the reclaim policy of the PV.
{{% /tab %}}
@ -387,7 +477,11 @@ If expanding underlying storage fails, the cluster administrator can manually re
{{% feature-state for_k8s_version="v1.23" state="alpha" %}}
{{< note >}}
Recovery from failing PVC expansion by users is available as an alpha feature since Kubernetes 1.23. The `RecoverVolumeExpansionFailure` feature must be enabled for this feature to work. Refer to the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) documentation for more information.
Recovery from failing PVC expansion by users is available as an alpha feature
since Kubernetes 1.23. The `RecoverVolumeExpansionFailure` feature must be
enabled for this feature to work. Refer to the
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
documentation for more information.
{{< /note >}}
If the feature gates `RecoverVolumeExpansionFailure` is
@ -397,7 +491,8 @@ smaller proposed size, edit `.spec.resources` for that PVC and choose a value th
value you previously tried.
This is useful if expansion to a higher value did not succeed because of capacity constraint.
If that has happened, or you suspect that it might have, you can retry expansion by specifying a
size that is within the capacity limits of underlying storage provider. You can monitor status of resize operation by watching `.status.resizeStatus` and events on the PVC.
size that is within the capacity limits of underlying storage provider. You can monitor status of
resize operation by watching `.status.resizeStatus` and events on the PVC.
Note that,
although you can specify a lower amount of storage than what was requested previously,
@ -406,7 +501,6 @@ Kubernetes does not support shrinking a PVC to less than its current size.
{{% /tab %}}
{{% /tabs %}}
## Types of Persistent Volumes
PersistentVolume types are implemented as plugins. Kubernetes currently supports the following plugins:
@ -423,7 +517,8 @@ PersistentVolume types are implemented as plugins. Kubernetes currently supports
* [`nfs`](/docs/concepts/storage/volumes/#nfs) - Network File System (NFS) storage
* [`rbd`](/docs/concepts/storage/volumes/#rbd) - Rados Block Device (RBD) volume
The following types of PersistentVolume are deprecated. This means that support is still available but will be removed in a future Kubernetes release.
The following types of PersistentVolume are deprecated.
This means that support is still available but will be removed in a future Kubernetes release.
* [`awsElasticBlockStore`](/docs/concepts/storage/volumes/#awselasticblockstore) - AWS Elastic Block Store (EBS)
(**deprecated** in v1.17)
@ -483,14 +578,21 @@ spec:
```
{{< note >}}
Helper programs relating to the volume type may be required for consumption of a PersistentVolume within a cluster. In this example, the PersistentVolume is of type NFS and the helper program /sbin/mount.nfs is required to support the mounting of NFS filesystems.
Helper programs relating to the volume type may be required for consumption of
a PersistentVolume within a cluster. In this example, the PersistentVolume is
of type NFS and the helper program /sbin/mount.nfs is required to support the
mounting of NFS filesystems.
{{< /note >}}
### Capacity
Generally, a PV will have a specific storage capacity. This is set using the PV's `capacity` attribute. Read the glossary term [Quantity](/docs/reference/glossary/?all=true#term-quantity) to understand the units expected by `capacity`.
Generally, a PV will have a specific storage capacity. This is set using the PV's
`capacity` attribute. Read the glossary term
[Quantity](/docs/reference/glossary/?all=true#term-quantity) to understand the units
expected by `capacity`.
Currently, storage size is the only resource that can be set or requested. Future attributes may include IOPS, throughput, etc.
Currently, storage size is the only resource that can be set or requested.
Future attributes may include IOPS, throughput, etc.
### Volume Mode
@ -515,12 +617,18 @@ for an example on how to use a volume with `volumeMode: Block` in a Pod.
### Access Modes
A PersistentVolume can be mounted on a host in any way supported by the resource provider. As shown in the table below, providers will have different capabilities and each PV's access modes are set to the specific modes supported by that particular volume. For example, NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV's capabilities.
A PersistentVolume can be mounted on a host in any way supported by the resource
provider. As shown in the table below, providers will have different capabilities
and each PV's access modes are set to the specific modes supported by that particular
volume. For example, NFS can support multiple read/write clients, but a specific
NFS PV might be exported on the server as read-only. Each PV gets its own set of
access modes describing that specific PV's capabilities.
The access modes are:
`ReadWriteOnce`
: the volume can be mounted as read-write by a single node. ReadWriteOnce access mode still can allow multiple pods to access the volume when the pods are running on the same node.
`ReadWriteOnce`
: the volume can be mounted as read-write by a single node. ReadWriteOnce access
mode still can allow multiple pods to access the volume when the pods are running on the same node.
`ReadOnlyMany`
: the volume can be mounted as read-only by many nodes.
@ -529,12 +637,14 @@ The access modes are:
: the volume can be mounted as read-write by many nodes.
`ReadWriteOncePod`
: the volume can be mounted as read-write by a single Pod. Use ReadWriteOncePod access mode if you want to ensure that only one pod across whole cluster can read that PVC or write to it. This is only supported for CSI volumes and Kubernetes version 1.22+.
: the volume can be mounted as read-write by a single Pod. Use ReadWriteOncePod
access mode if you want to ensure that only one pod across whole cluster can
read that PVC or write to it. This is only supported for CSI volumes and
Kubernetes version 1.22+.
The blog article [Introducing Single Pod Access Mode for PersistentVolumes](/blog/2021/09/13/read-write-once-pod-access-mode-alpha/) covers this in more detail.
The blog article
[Introducing Single Pod Access Mode for PersistentVolumes](/blog/2021/09/13/read-write-once-pod-access-mode-alpha/)
covers this in more detail.
In the CLI, the access modes are abbreviated to:
@ -547,13 +657,15 @@ In the CLI, the access modes are abbreviated to:
Kubernetes uses volume access modes to match PersistentVolumeClaims and PersistentVolumes.
In some cases, the volume access modes also constrain where the PersistentVolume can be mounted.
Volume access modes do **not** enforce write protection once the storage has been mounted.
Even if the access modes are specified as ReadWriteOnce, ReadOnlyMany, or ReadWriteMany, they don't set any constraints on the volume.
For example, even if a PersistentVolume is created as ReadOnlyMany, it is no guarantee that it will be read-only.
If the access modes are specified as ReadWriteOncePod, the volume is constrained and can be mounted on only a single Pod.
Even if the access modes are specified as ReadWriteOnce, ReadOnlyMany, or ReadWriteMany,
they don't set any constraints on the volume. For example, even if a PersistentVolume is
created as ReadOnlyMany, it is no guarantee that it will be read-only. If the access modes
are specified as ReadWriteOncePod, the volume is constrained and can be mounted on only a single Pod.
{{< /note >}}
> __Important!__ A volume can only be mounted using one access mode at a time, even if it supports many. For example, a GCEPersistentDisk can be mounted as ReadWriteOnce by a single node or ReadOnlyMany by many nodes, but not at the same time.
> __Important!__ A volume can only be mounted using one access mode at a time,
> even if it supports many. For example, a GCEPersistentDisk can be mounted as
> ReadWriteOnce by a single node or ReadOnlyMany by many nodes, but not at the same time.
| Volume Plugin | ReadWriteOnce | ReadOnlyMany | ReadWriteMany | ReadWriteOncePod |
| :--- | :---: | :---: | :---: | - |
@ -593,13 +705,16 @@ Current reclaim policies are:
* Retain -- manual reclamation
* Recycle -- basic scrub (`rm -rf /thevolume/*`)
* Delete -- associated storage asset such as AWS EBS, GCE PD, Azure Disk, or OpenStack Cinder volume is deleted
* Delete -- associated storage asset such as AWS EBS, GCE PD, Azure Disk,
or OpenStack Cinder volume is deleted
Currently, only NFS and HostPath support recycling. AWS EBS, GCE PD, Azure Disk, and Cinder volumes support deletion.
Currently, only NFS and HostPath support recycling. AWS EBS, GCE PD, Azure Disk,
and Cinder volumes support deletion.
### Mount Options
A Kubernetes administrator can specify additional mount options for when a Persistent Volume is mounted on a node.
A Kubernetes administrator can specify additional mount options for when a
Persistent Volume is mounted on a node.
{{< note >}}
Not all Persistent Volume types support mount options.
@ -627,10 +742,19 @@ it will become fully deprecated in a future Kubernetes release.
### Node Affinity
{{< note >}}
For most volume types, you do not need to set this field. It is automatically populated for [AWS EBS](/docs/concepts/storage/volumes/#awselasticblockstore), [GCE PD](/docs/concepts/storage/volumes/#gcepersistentdisk) and [Azure Disk](/docs/concepts/storage/volumes/#azuredisk) volume block types. You need to explicitly set this for [local](/docs/concepts/storage/volumes/#local) volumes.
For most volume types, you do not need to set this field. It is automatically
populated for [AWS EBS](/docs/concepts/storage/volumes/#awselasticblockstore),
[GCE PD](/docs/concepts/storage/volumes/#gcepersistentdisk) and
[Azure Disk](/docs/concepts/storage/volumes/#azuredisk) volume block types. You
need to explicitly set this for [local](/docs/concepts/storage/volumes/#local) volumes.
{{< /note >}}
A PV can specify node affinity to define constraints that limit what nodes this volume can be accessed from. Pods that use a PV will only be scheduled to nodes that are selected by the node affinity. To specify node affinity, set `nodeAffinity` in the `.spec` of a PV. The [PersistentVolume](/docs/reference/kubernetes-api/config-and-storage-resources/persistent-volume-v1/#PersistentVolumeSpec) API reference has more details on this field.
A PV can specify node affinity to define constraints that limit what nodes this
volume can be accessed from. Pods that use a PV will only be scheduled to nodes
that are selected by the node affinity. To specify node affinity, set
`nodeAffinity` in the `.spec` of a PV. The
[PersistentVolume](/docs/reference/kubernetes-api/config-and-storage-resources/persistent-volume-v1/#PersistentVolumeSpec)
API reference has more details on this field.
### Phase
@ -671,24 +795,35 @@ spec:
### Access Modes
Claims use [the same conventions as volumes](#access-modes) when requesting storage with specific access modes.
Claims use [the same conventions as volumes](#access-modes) when requesting
storage with specific access modes.
### Volume Modes
Claims use [the same convention as volumes](#volume-mode) to indicate the consumption of the volume as either a filesystem or block device.
Claims use [the same convention as volumes](#volume-mode) to indicate the
consumption of the volume as either a filesystem or block device.
### Resources
Claims, like Pods, can request specific quantities of a resource. In this case, the request is for storage. The same [resource model](https://git.k8s.io/design-proposals-archive/scheduling/resources.md) applies to both volumes and claims.
Claims, like Pods, can request specific quantities of a resource. In this case,
the request is for storage. The same
[resource model](https://git.k8s.io/design-proposals-archive/scheduling/resources.md)
applies to both volumes and claims.
### Selector
Claims can specify a [label selector](/docs/concepts/overview/working-with-objects/labels/#label-selectors) to further filter the set of volumes. Only the volumes whose labels match the selector can be bound to the claim. The selector can consist of two fields:
Claims can specify a
[label selector](/docs/concepts/overview/working-with-objects/labels/#label-selectors)
to further filter the set of volumes. Only the volumes whose labels match the selector
can be bound to the claim. The selector can consist of two fields:
* `matchLabels` - the volume must have a label with this value
* `matchExpressions` - a list of requirements made by specifying key, list of values, and operator that relates the key and values. Valid operators include In, NotIn, Exists, and DoesNotExist.
* `matchExpressions` - a list of requirements made by specifying key, list of values,
and operator that relates the key and values. Valid operators include In, NotIn,
Exists, and DoesNotExist.
All of the requirements, from both `matchLabels` and `matchExpressions`, are ANDed together they must all be satisfied in order to match.
All of the requirements, from both `matchLabels` and `matchExpressions`, are
ANDed together they must all be satisfied in order to match.
### Class
@ -738,22 +873,38 @@ In the past, the annotation `volume.beta.kubernetes.io/storage-class` was used i
of `storageClassName` attribute. This annotation is still working; however,
it won't be supported in a future Kubernetes release.
#### Retroactive default StorageClass assignment
{{< feature-state for_k8s_version="v1.26" state="beta" >}}
You can create a PersistentVolumeClaim without specifying a `storageClassName` for the new PVC, and you can do so even when no default StorageClass exists in your cluster. In this case, the new PVC creates as you defined it, and the `storageClassName` of that PVC remains unset until default becomes available.
You can create a PersistentVolumeClaim without specifying a `storageClassName`
for the new PVC, and you can do so even when no default StorageClass exists
in your cluster. In this case, the new PVC creates as you defined it, and the
`storageClassName` of that PVC remains unset until default becomes available.
When a default StorageClass becomes available, the control plane identifies any existing PVCs without `storageClassName`. For the PVCs that either have an empty value for `storageClassName` or do not have this key, the control plane then updates those PVCs to set `storageClassName` to match the new default StorageClass. If you have an existing PVC where the `storageClassName` is `""`, and you configure a default StorageClass, then this PVC will not get updated.
When a default StorageClass becomes available, the control plane identifies any
existing PVCs without `storageClassName`. For the PVCs that either have an empty
value for `storageClassName` or do not have this key, the control plane then
updates those PVCs to set `storageClassName` to match the new default StorageClass.
If you have an existing PVC where the `storageClassName` is `""`, and you configure
a default StorageClass, then this PVC will not get updated.
In order to keep binding to PVs with `storageClassName` set to `""` (while a default StorageClass is present), you need to set the `storageClassName` of the associated PVC to `""`.
In order to keep binding to PVs with `storageClassName` set to `""`
(while a default StorageClass is present), you need to set the `storageClassName`
of the associated PVC to `""`.
This behavior helps administrators change default StorageClass by removing the old one first and then creating or setting another one. This brief window while there is no default causes PVCs without `storageClassName` created at that time to not have any default, but due to the retroactive default StorageClass assignment this way of changing defaults is safe.
This behavior helps administrators change default StorageClass by removing the
old one first and then creating or setting another one. This brief window while
there is no default causes PVCs without `storageClassName` created at that time
to not have any default, but due to the retroactive default StorageClass
assignment this way of changing defaults is safe.
## Claims As Volumes
Pods access storage by using the claim as a volume. Claims must exist in the same namespace as the Pod using the claim. The cluster finds the claim in the Pod's namespace and uses it to get the PersistentVolume backing the claim. The volume is then mounted to the host and into the Pod.
Pods access storage by using the claim as a volume. Claims must exist in the
same namespace as the Pod using the claim. The cluster finds the claim in the
Pod's namespace and uses it to get the PersistentVolume backing the claim.
The volume is then mounted to the host and into the Pod.
```yaml
apiVersion: v1
@ -775,12 +926,15 @@ spec:
### A Note on Namespaces
PersistentVolumes binds are exclusive, and since PersistentVolumeClaims are namespaced objects, mounting claims with "Many" modes (`ROX`, `RWX`) is only possible within one namespace.
PersistentVolumes binds are exclusive, and since PersistentVolumeClaims are
namespaced objects, mounting claims with "Many" modes (`ROX`, `RWX`) is only
possible within one namespace.
### PersistentVolumes typed `hostPath`
A `hostPath` PersistentVolume uses a file or directory on the Node to emulate network-attached storage.
See [an example of `hostPath` typed volume](/docs/tasks/configure-pod-container/configure-persistent-volume-storage/#create-a-persistentvolume).
A `hostPath` PersistentVolume uses a file or directory on the Node to emulate
network-attached storage. See
[an example of `hostPath` typed volume](/docs/tasks/configure-pod-container/configure-persistent-volume-storage/#create-a-persistentvolume).
## Raw Block Volume Support
@ -819,6 +973,7 @@ spec:
lun: 0
readOnly: false
```
### PersistentVolumeClaim requesting a Raw Block Volume {#persistent-volume-claim-requesting-a-raw-block-volume}
```yaml
@ -858,14 +1013,18 @@ spec:
```
{{< note >}}
When adding a raw block device for a Pod, you specify the device path in the container instead of a mount path.
When adding a raw block device for a Pod, you specify the device path in the
container instead of a mount path.
{{< /note >}}
### Binding Block Volumes
If a user requests a raw block volume by indicating this using the `volumeMode` field in the PersistentVolumeClaim spec, the binding rules differ slightly from previous releases that didn't consider this mode as part of the spec.
Listed is a table of possible combinations the user and admin might specify for requesting a raw block device. The table indicates if the volume will be bound or not given the combinations:
Volume binding matrix for statically provisioned volumes:
If a user requests a raw block volume by indicating this using the `volumeMode`
field in the PersistentVolumeClaim spec, the binding rules differ slightly from
previous releases that didn't consider this mode as part of the spec.
Listed is a table of possible combinations the user and admin might specify for
requesting a raw block device. The table indicates if the volume will be bound or
not given the combinations: Volume binding matrix for statically provisioned volumes:
| PV volumeMode | PVC volumeMode | Result |
| --------------|:---------------:| ----------------:|
@ -880,15 +1039,19 @@ Volume binding matrix for statically provisioned volumes:
| Filesystem | unspecified | BIND |
{{< note >}}
Only statically provisioned volumes are supported for alpha release. Administrators should take care to consider these values when working with raw block devices.
Only statically provisioned volumes are supported for alpha release. Administrators
should take care to consider these values when working with raw block devices.
{{< /note >}}
## Volume Snapshot and Restore Volume from Snapshot Support
{{< feature-state for_k8s_version="v1.20" state="stable" >}}
Volume snapshots only support the out-of-tree CSI volume plugins. For details, see [Volume Snapshots](/docs/concepts/storage/volume-snapshots/).
In-tree volume plugins are deprecated. You can read about the deprecated volume plugins in the [Volume Plugin FAQ](https://github.com/kubernetes/community/blob/master/sig-storage/volume-plugin-faq.md).
Volume snapshots only support the out-of-tree CSI volume plugins.
For details, see [Volume Snapshots](/docs/concepts/storage/volume-snapshots/).
In-tree volume plugins are deprecated. You can read about the deprecated volume
plugins in the
[Volume Plugin FAQ](https://github.com/kubernetes/community/blob/master/sig-storage/volume-plugin-faq.md).
### Create a PersistentVolumeClaim from a Volume Snapshot {#create-persistent-volume-claim-from-volume-snapshot}
@ -912,7 +1075,8 @@ spec:
## Volume Cloning
[Volume Cloning](/docs/concepts/storage/volume-pvc-datasource/) only available for CSI volume plugins.
[Volume Cloning](/docs/concepts/storage/volume-pvc-datasource/)
only available for CSI volume plugins.
### Create PersistentVolumeClaim from an existing PVC {#create-persistent-volume-claim-from-an-existing-pvc}
@ -949,27 +1113,32 @@ same namespace, except for core objects other than PVCs. For clusters that have
gate enabled, use of the `dataSourceRef` is preferred over `dataSource`.
## Cross namespace data sources
{{< feature-state for_k8s_version="v1.26" state="alpha" >}}
Kubernetes supports cross namespace volume data sources.
To use cross namespace volume data sources, you must enable the `AnyVolumeDataSource` and `CrossNamespaceVolumeDataSource`
To use cross namespace volume data sources, you must enable the `AnyVolumeDataSource`
and `CrossNamespaceVolumeDataSource`
[feature gates](/docs/reference/command-line-tools-reference/feature-gates/) for
the kube-apiserver, kube-controller-manager.
Also, you must enable the `CrossNamespaceVolumeDataSource` feature gate for the csi-provisioner.
Enabling the `CrossNamespaceVolumeDataSource` feature gate allow you to specify a namespace in the dataSourceRef field.
Enabling the `CrossNamespaceVolumeDataSource` feature gate allows you to specify
a namespace in the dataSourceRef field.
{{< note >}}
When you specify a namespace for a volume data source, Kubernetes checks for a
ReferenceGrant in the other namespace before accepting the reference.
ReferenceGrant is part of the `gateway.networking.k8s.io` extension APIs.
See [ReferenceGrant](https://gateway-api.sigs.k8s.io/api-types/referencegrant/) in the Gateway API documentation for details.
See [ReferenceGrant](https://gateway-api.sigs.k8s.io/api-types/referencegrant/)
in the Gateway API documentation for details.
This means that you must extend your Kubernetes cluster with at least ReferenceGrant from the
Gateway API before you can use this mechanism.
{{< /note >}}
## Data source references
The `dataSourceRef` field behaves almost the same as the `dataSource` field. If either one is
The `dataSourceRef` field behaves almost the same as the `dataSource` field. If one is
specified while the other is not, the API server will give both fields the same value. Neither
field can be changed after creation, and attempting to specify different values for the two
fields will result in a validation error. Therefore the two fields will always have the same
@ -986,7 +1155,8 @@ users should be aware of:
When the `CrossNamespaceVolumeDataSource` feature is enabled, there are additional differences:
* The `dataSource` field only allows local objects, while the `dataSourceRef` field allows objects in any namespaces.
* The `dataSource` field only allows local objects, while the `dataSourceRef` field allows
objects in any namespaces.
* When namespace is specified, `dataSource` and `dataSourceRef` are not synced.
Users should always use `dataSourceRef` on clusters that have the feature gate enabled, and
@ -1030,10 +1200,13 @@ responsibility of that populator controller to report Events that relate to volu
the process.
### Using a cross-namespace volume data source
{{< feature-state for_k8s_version="v1.26" state="alpha" >}}
Create a ReferenceGrant to allow the namespace owner to accept the reference.
You define a populated volume by specifying a cross namespace volume data source using the `dataSourceRef` field. You must already have a valid ReferenceGrant in the source namespace:
You define a populated volume by specifying a cross namespace volume data source
using the `dataSourceRef` field. You must already have a valid ReferenceGrant
in the source namespace:
```yaml
apiVersion: gateway.networking.k8s.io/v1beta1

View File

@ -62,22 +62,22 @@ volumeBindingMode: Immediate
Each StorageClass has a provisioner that determines what volume plugin is used
for provisioning PVs. This field must be specified.
| Volume Plugin | Internal Provisioner| Config Example |
| :--- | :---: | :---: |
| AWSElasticBlockStore | &#x2713; | [AWS EBS](#aws-ebs) |
| AzureFile | &#x2713; | [Azure File](#azure-file) |
| AzureDisk | &#x2713; | [Azure Disk](#azure-disk) |
| CephFS | - | - |
| Cinder | &#x2713; | [OpenStack Cinder](#openstack-cinder)|
| FC | - | - |
| FlexVolume | - | - |
| GCEPersistentDisk | &#x2713; | [GCE PD](#gce-pd) |
| iSCSI | - | - |
| NFS | - | [NFS](#nfs) |
| RBD | &#x2713; | [Ceph RBD](#ceph-rbd) |
| VsphereVolume | &#x2713; | [vSphere](#vsphere) |
| PortworxVolume | &#x2713; | [Portworx Volume](#portworx-volume) |
| Local | - | [Local](#local) |
| Volume Plugin | Internal Provisioner | Config Example |
| :------------------- | :------------------: | :-----------------------------------: |
| AWSElasticBlockStore | &#x2713; | [AWS EBS](#aws-ebs) |
| AzureFile | &#x2713; | [Azure File](#azure-file) |
| AzureDisk | &#x2713; | [Azure Disk](#azure-disk) |
| CephFS | - | - |
| Cinder | &#x2713; | [OpenStack Cinder](#openstack-cinder) |
| FC | - | - |
| FlexVolume | - | - |
| GCEPersistentDisk | &#x2713; | [GCE PD](#gce-pd) |
| iSCSI | - | - |
| NFS | - | [NFS](#nfs) |
| RBD | &#x2713; | [Ceph RBD](#ceph-rbd) |
| VsphereVolume | &#x2713; | [vSphere](#vsphere) |
| PortworxVolume | &#x2713; | [Portworx Volume](#portworx-volume) |
| Local | - | [Local](#local) |
You are not restricted to specifying the "internal" provisioners
listed here (whose names are prefixed with "kubernetes.io" and shipped
@ -109,29 +109,28 @@ whatever reclaim policy they were assigned at creation.
{{< feature-state for_k8s_version="v1.11" state="beta" >}}
PersistentVolumes can be configured to be expandable. This feature when set to `true`,
allows the users to resize the volume by editing the corresponding PVC object.
PersistentVolumes can be configured to be expandable. This feature when set to `true`,
allows the users to resize the volume by editing the corresponding PVC object.
The following types of volumes support volume expansion, when the underlying
StorageClass has the field `allowVolumeExpansion` set to true.
{{< table caption = "Table of Volume types and the version of Kubernetes they require" >}}
Volume type | Required Kubernetes version
:---------- | :--------------------------
gcePersistentDisk | 1.11
awsElasticBlockStore | 1.11
Cinder | 1.11
rbd | 1.11
Azure File | 1.11
Azure Disk | 1.11
Portworx | 1.11
FlexVolume | 1.13
CSI | 1.14 (alpha), 1.16 (beta)
| Volume type | Required Kubernetes version |
| :------------------- | :-------------------------- |
| gcePersistentDisk | 1.11 |
| awsElasticBlockStore | 1.11 |
| Cinder | 1.11 |
| rbd | 1.11 |
| Azure File | 1.11 |
| Azure Disk | 1.11 |
| Portworx | 1.11 |
| FlexVolume | 1.13 |
| CSI | 1.14 (alpha), 1.16 (beta) |
{{< /table >}}
{{< note >}}
You can only use the volume expansion feature to grow a Volume, not to shrink it.
{{< /note >}}
@ -168,14 +167,14 @@ and [taints and tolerations](/docs/concepts/scheduling-eviction/taint-and-tolera
The following plugins support `WaitForFirstConsumer` with dynamic provisioning:
* [AWSElasticBlockStore](#aws-ebs)
* [GCEPersistentDisk](#gce-pd)
* [AzureDisk](#azure-disk)
- [AWSElasticBlockStore](#aws-ebs)
- [GCEPersistentDisk](#gce-pd)
- [AzureDisk](#azure-disk)
The following plugins support `WaitForFirstConsumer` with pre-created PersistentVolume binding:
* All of the above
* [Local](#local)
- All of the above
- [Local](#local)
{{< feature-state state="stable" for_k8s_version="v1.17" >}}
[CSI volumes](/docs/concepts/storage/volumes/#csi) are also supported with dynamic provisioning
@ -183,10 +182,10 @@ and pre-created PVs, but you'll need to look at the documentation for a specific
to see its supported topology keys and examples.
{{< note >}}
If you choose to use `WaitForFirstConsumer`, do not use `nodeName` in the Pod spec
to specify node affinity. If `nodeName` is used in this case, the scheduler will be bypassed and PVC will remain in `pending` state.
If you choose to use `WaitForFirstConsumer`, do not use `nodeName` in the Pod spec
to specify node affinity. If `nodeName` is used in this case, the scheduler will be bypassed and PVC will remain in `pending` state.
Instead, you can use node selector for hostname in this case as shown below.
Instead, you can use node selector for hostname in this case as shown below.
{{< /note >}}
```yaml
@ -243,7 +242,7 @@ allowedTopologies:
Storage Classes have parameters that describe volumes belonging to the storage
class. Different parameters may be accepted depending on the `provisioner`. For
example, the value `io1`, for the parameter `type`, and the parameter
example, the value `io1`, for the parameter `type`, and the parameter
`iopsPerGB` are specific to EBS. When a parameter is omitted, some default is
used.
@ -265,26 +264,26 @@ parameters:
fsType: ext4
```
* `type`: `io1`, `gp2`, `sc1`, `st1`. See
- `type`: `io1`, `gp2`, `sc1`, `st1`. See
[AWS docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html)
for details. Default: `gp2`.
* `zone` (Deprecated): AWS zone. If neither `zone` nor `zones` is specified, volumes are
- `zone` (Deprecated): AWS zone. If neither `zone` nor `zones` is specified, volumes are
generally round-robin-ed across all active zones where Kubernetes cluster
has a node. `zone` and `zones` parameters must not be used at the same time.
* `zones` (Deprecated): A comma separated list of AWS zone(s). If neither `zone` nor `zones`
- `zones` (Deprecated): A comma separated list of AWS zone(s). If neither `zone` nor `zones`
is specified, volumes are generally round-robin-ed across all active zones
where Kubernetes cluster has a node. `zone` and `zones` parameters must not
be used at the same time.
* `iopsPerGB`: only for `io1` volumes. I/O operations per second per GiB. AWS
- `iopsPerGB`: only for `io1` volumes. I/O operations per second per GiB. AWS
volume plugin multiplies this with size of requested volume to compute IOPS
of the volume and caps it at 20 000 IOPS (maximum supported by AWS, see
[AWS docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html)).
A string is expected here, i.e. `"10"`, not `10`.
* `fsType`: fsType that is supported by kubernetes. Default: `"ext4"`.
* `encrypted`: denotes whether the EBS volume should be encrypted or not.
- `fsType`: fsType that is supported by kubernetes. Default: `"ext4"`.
- `encrypted`: denotes whether the EBS volume should be encrypted or not.
Valid values are `"true"` or `"false"`. A string is expected here,
i.e. `"true"`, not `true`.
* `kmsKeyId`: optional. The full Amazon Resource Name of the key to use when
- `kmsKeyId`: optional. The full Amazon Resource Name of the key to use when
encrypting the volume. If none is supplied but `encrypted` is true, a key is
generated by AWS. See AWS docs for valid ARN value.
@ -307,17 +306,17 @@ parameters:
replication-type: none
```
* `type`: `pd-standard` or `pd-ssd`. Default: `pd-standard`
* `zone` (Deprecated): GCE zone. If neither `zone` nor `zones` is specified, volumes are
- `type`: `pd-standard` or `pd-ssd`. Default: `pd-standard`
- `zone` (Deprecated): GCE zone. If neither `zone` nor `zones` is specified, volumes are
generally round-robin-ed across all active zones where Kubernetes cluster has
a node. `zone` and `zones` parameters must not be used at the same time.
* `zones` (Deprecated): A comma separated list of GCE zone(s). If neither `zone` nor `zones`
- `zones` (Deprecated): A comma separated list of GCE zone(s). If neither `zone` nor `zones`
is specified, volumes are generally round-robin-ed across all active zones
where Kubernetes cluster has a node. `zone` and `zones` parameters must not
be used at the same time.
* `fstype`: `ext4` or `xfs`. Default: `ext4`. The defined filesystem type must be supported by the host operating system.
- `fstype`: `ext4` or `xfs`. Default: `ext4`. The defined filesystem type must be supported by the host operating system.
* `replication-type`: `none` or `regional-pd`. Default: `none`.
- `replication-type`: `none` or `regional-pd`. Default: `none`.
If `replication-type` is set to `none`, a regular (zonal) PD will be provisioned.
@ -350,14 +349,15 @@ parameters:
readOnly: "false"
```
* `server`: Server is the hostname or IP address of the NFS server.
* `path`: Path that is exported by the NFS server.
* `readOnly`: A flag indicating whether the storage will be mounted as read only (default false).
- `server`: Server is the hostname or IP address of the NFS server.
- `path`: Path that is exported by the NFS server.
- `readOnly`: A flag indicating whether the storage will be mounted as read only (default false).
Kubernetes doesn't include an internal NFS provisioner. You need to use an external provisioner to create a StorageClass for NFS.
Here are some examples:
* [NFS Ganesha server and external provisioner](https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner)
* [NFS subdir external provisioner](https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner)
- [NFS Ganesha server and external provisioner](https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner)
- [NFS subdir external provisioner](https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner)
### OpenStack Cinder
@ -371,7 +371,7 @@ parameters:
availability: nova
```
* `availability`: Availability Zone. If not specified, volumes are generally
- `availability`: Availability Zone. If not specified, volumes are generally
round-robin-ed across all active zones where Kubernetes cluster has a node.
{{< note >}}
@ -381,7 +381,7 @@ This internal provisioner of OpenStack is deprecated. Please use [the external c
### vSphere
There are two types of provisioners for vSphere storage classes:
There are two types of provisioners for vSphere storage classes:
- [CSI provisioner](#vsphere-provisioner-csi): `csi.vsphere.vmware.com`
- [vCP provisioner](#vcp-provisioner): `kubernetes.io/vsphere-volume`
@ -392,73 +392,73 @@ In-tree provisioners are [deprecated](/blog/2019/12/09/kubernetes-1-17-feature-c
The vSphere CSI StorageClass provisioner works with Tanzu Kubernetes clusters. For an example, refer to the [vSphere CSI repository](https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/master/example/vanilla-k8s-RWM-filesystem-volumes/example-sc.yaml).
#### vCP Provisioner
#### vCP Provisioner
The following examples use the VMware Cloud Provider (vCP) StorageClass provisioner.
The following examples use the VMware Cloud Provider (vCP) StorageClass provisioner.
1. Create a StorageClass with a user specified disk format.
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: zeroedthick
```
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: zeroedthick
```
`diskformat`: `thin`, `zeroedthick` and `eagerzeroedthick`. Default: `"thin"`.
`diskformat`: `thin`, `zeroedthick` and `eagerzeroedthick`. Default: `"thin"`.
2. Create a StorageClass with a disk format on a user specified datastore.
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: zeroedthick
datastore: VSANDatastore
```
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: zeroedthick
datastore: VSANDatastore
```
`datastore`: The user can also specify the datastore in the StorageClass.
The volume will be created on the datastore specified in the StorageClass,
which in this case is `VSANDatastore`. This field is optional. If the
datastore is not specified, then the volume will be created on the datastore
specified in the vSphere config file used to initialize the vSphere Cloud
Provider.
`datastore`: The user can also specify the datastore in the StorageClass.
The volume will be created on the datastore specified in the StorageClass,
which in this case is `VSANDatastore`. This field is optional. If the
datastore is not specified, then the volume will be created on the datastore
specified in the vSphere config file used to initialize the vSphere Cloud
Provider.
3. Storage Policy Management inside kubernetes
* Using existing vCenter SPBM policy
- Using existing vCenter SPBM policy
One of the most important features of vSphere for Storage Management is
policy based Management. Storage Policy Based Management (SPBM) is a
storage policy framework that provides a single unified control plane
across a broad range of data services and storage solutions. SPBM enables
vSphere administrators to overcome upfront storage provisioning challenges,
such as capacity planning, differentiated service levels and managing
capacity headroom.
One of the most important features of vSphere for Storage Management is
policy based Management. Storage Policy Based Management (SPBM) is a
storage policy framework that provides a single unified control plane
across a broad range of data services and storage solutions. SPBM enables
vSphere administrators to overcome upfront storage provisioning challenges,
such as capacity planning, differentiated service levels and managing
capacity headroom.
The SPBM policies can be specified in the StorageClass using the
`storagePolicyName` parameter.
The SPBM policies can be specified in the StorageClass using the
`storagePolicyName` parameter.
* Virtual SAN policy support inside Kubernetes
- Virtual SAN policy support inside Kubernetes
Vsphere Infrastructure (VI) Admins will have the ability to specify custom
Virtual SAN Storage Capabilities during dynamic volume provisioning. You
can now define storage requirements, such as performance and availability,
in the form of storage capabilities during dynamic volume provisioning.
The storage capability requirements are converted into a Virtual SAN
policy which are then pushed down to the Virtual SAN layer when a
persistent volume (virtual disk) is being created. The virtual disk is
distributed across the Virtual SAN datastore to meet the requirements.
Vsphere Infrastructure (VI) Admins will have the ability to specify custom
Virtual SAN Storage Capabilities during dynamic volume provisioning. You
can now define storage requirements, such as performance and availability,
in the form of storage capabilities during dynamic volume provisioning.
The storage capability requirements are converted into a Virtual SAN
policy which are then pushed down to the Virtual SAN layer when a
persistent volume (virtual disk) is being created. The virtual disk is
distributed across the Virtual SAN datastore to meet the requirements.
You can see [Storage Policy Based Management for dynamic provisioning of volumes](https://github.com/vmware-archive/vsphere-storage-for-kubernetes/blob/fa4c8b8ad46a85b6555d715dd9d27ff69839df53/documentation/policy-based-mgmt.md)
for more details on how to use storage policies for persistent volumes
management.
You can see [Storage Policy Based Management for dynamic provisioning of volumes](https://github.com/vmware-archive/vsphere-storage-for-kubernetes/blob/fa4c8b8ad46a85b6555d715dd9d27ff69839df53/documentation/policy-based-mgmt.md)
for more details on how to use storage policies for persistent volumes
management.
There are few
[vSphere examples](https://github.com/kubernetes/examples/tree/master/staging/volumes/vsphere)
@ -486,29 +486,30 @@ parameters:
imageFeatures: "layering"
```
* `monitors`: Ceph monitors, comma delimited. This parameter is required.
* `adminId`: Ceph client ID that is capable of creating images in the pool.
- `monitors`: Ceph monitors, comma delimited. This parameter is required.
- `adminId`: Ceph client ID that is capable of creating images in the pool.
Default is "admin".
* `adminSecretName`: Secret Name for `adminId`. This parameter is required.
- `adminSecretName`: Secret Name for `adminId`. This parameter is required.
The provided secret must have type "kubernetes.io/rbd".
* `adminSecretNamespace`: The namespace for `adminSecretName`. Default is "default".
* `pool`: Ceph RBD pool. Default is "rbd".
* `userId`: Ceph client ID that is used to map the RBD image. Default is the
- `adminSecretNamespace`: The namespace for `adminSecretName`. Default is "default".
- `pool`: Ceph RBD pool. Default is "rbd".
- `userId`: Ceph client ID that is used to map the RBD image. Default is the
same as `adminId`.
* `userSecretName`: The name of Ceph Secret for `userId` to map RBD image. It
- `userSecretName`: The name of Ceph Secret for `userId` to map RBD image. It
must exist in the same namespace as PVCs. This parameter is required.
The provided secret must have type "kubernetes.io/rbd", for example created in this
way:
```shell
kubectl create secret generic ceph-secret --type="kubernetes.io/rbd" \
--from-literal=key='QVFEQ1pMdFhPUnQrSmhBQUFYaERWNHJsZ3BsMmNjcDR6RFZST0E9PQ==' \
--namespace=kube-system
```
* `userSecretNamespace`: The namespace for `userSecretName`.
* `fsType`: fsType that is supported by kubernetes. Default: `"ext4"`.
* `imageFormat`: Ceph RBD image format, "1" or "2". Default is "2".
* `imageFeatures`: This parameter is optional and should only be used if you
```shell
kubectl create secret generic ceph-secret --type="kubernetes.io/rbd" \
--from-literal=key='QVFEQ1pMdFhPUnQrSmhBQUFYaERWNHJsZ3BsMmNjcDR6RFZST0E9PQ==' \
--namespace=kube-system
```
- `userSecretNamespace`: The namespace for `userSecretName`.
- `fsType`: fsType that is supported by kubernetes. Default: `"ext4"`.
- `imageFormat`: Ceph RBD image format, "1" or "2". Default is "2".
- `imageFeatures`: This parameter is optional and should only be used if you
set `imageFormat` to "2". Currently supported features are `layering` only.
Default is "", and no features are turned on.
@ -528,9 +529,9 @@ parameters:
storageAccount: azure_storage_account_name
```
* `skuName`: Azure storage account Sku tier. Default is empty.
* `location`: Azure storage account location. Default is empty.
* `storageAccount`: Azure storage account name. If a storage account is provided,
- `skuName`: Azure storage account Sku tier. Default is empty.
- `location`: Azure storage account location. Default is empty.
- `storageAccount`: Azure storage account name. If a storage account is provided,
it must reside in the same resource group as the cluster, and `location` is
ignored. If a storage account is not provided, a new storage account will be
created in the same resource group as the cluster.
@ -548,21 +549,21 @@ parameters:
kind: managed
```
* `storageaccounttype`: Azure storage account Sku tier. Default is empty.
* `kind`: Possible values are `shared`, `dedicated`, and `managed` (default).
- `storageaccounttype`: Azure storage account Sku tier. Default is empty.
- `kind`: Possible values are `shared`, `dedicated`, and `managed` (default).
When `kind` is `shared`, all unmanaged disks are created in a few shared
storage accounts in the same resource group as the cluster. When `kind` is
`dedicated`, a new dedicated storage account will be created for the new
unmanaged disk in the same resource group as the cluster. When `kind` is
`managed`, all managed disks are created in the same resource group as
unmanaged disk in the same resource group as the cluster. When `kind` is
`managed`, all managed disks are created in the same resource group as
the cluster.
* `resourceGroup`: Specify the resource group in which the Azure disk will be created.
It must be an existing resource group name. If it is unspecified, the disk will be
placed in the same resource group as the current Kubernetes cluster.
- `resourceGroup`: Specify the resource group in which the Azure disk will be created.
It must be an existing resource group name. If it is unspecified, the disk will be
placed in the same resource group as the current Kubernetes cluster.
- Premium VM can attach both Standard_LRS and Premium_LRS disks, while Standard
* Premium VM can attach both Standard_LRS and Premium_LRS disks, while Standard
VM can only attach Standard_LRS disks.
- Managed VM can only attach managed disks and unmanaged VM can only attach
* Managed VM can only attach managed disks and unmanaged VM can only attach
unmanaged disks.
### Azure File
@ -579,29 +580,29 @@ parameters:
storageAccount: azure_storage_account_name
```
* `skuName`: Azure storage account Sku tier. Default is empty.
* `location`: Azure storage account location. Default is empty.
* `storageAccount`: Azure storage account name. Default is empty. If a storage
- `skuName`: Azure storage account Sku tier. Default is empty.
- `location`: Azure storage account location. Default is empty.
- `storageAccount`: Azure storage account name. Default is empty. If a storage
account is not provided, all storage accounts associated with the resource
group are searched to find one that matches `skuName` and `location`. If a
storage account is provided, it must reside in the same resource group as the
cluster, and `skuName` and `location` are ignored.
* `secretNamespace`: the namespace of the secret that contains the Azure Storage
- `secretNamespace`: the namespace of the secret that contains the Azure Storage
Account Name and Key. Default is the same as the Pod.
* `secretName`: the name of the secret that contains the Azure Storage Account Name and
- `secretName`: the name of the secret that contains the Azure Storage Account Name and
Key. Default is `azure-storage-account-<accountName>-secret`
* `readOnly`: a flag indicating whether the storage will be mounted as read only.
Defaults to false which means a read/write mount. This setting will impact the
- `readOnly`: a flag indicating whether the storage will be mounted as read only.
Defaults to false which means a read/write mount. This setting will impact the
`ReadOnly` setting in VolumeMounts as well.
During storage provisioning, a secret named by `secretName` is created for the
mounting credentials. If the cluster has enabled both
[RBAC](/docs/reference/access-authn-authz/rbac/) and
During storage provisioning, a secret named by `secretName` is created for the
mounting credentials. If the cluster has enabled both
[RBAC](/docs/reference/access-authn-authz/rbac/) and
[Controller Roles](/docs/reference/access-authn-authz/rbac/#controller-roles),
add the `create` permission of resource `secret` for clusterrole
`system:controller:persistent-volume-binder`.
In a multi-tenancy context, it is strongly recommended to set the value for
In a multi-tenancy context, it is strongly recommended to set the value for
`secretNamespace` explicitly, otherwise the storage account credentials may
be read by other users.
@ -615,26 +616,25 @@ metadata:
provisioner: kubernetes.io/portworx-volume
parameters:
repl: "1"
snap_interval: "70"
priority_io: "high"
snap_interval: "70"
priority_io: "high"
```
* `fs`: filesystem to be laid out: `none/xfs/ext4` (default: `ext4`).
* `block_size`: block size in Kbytes (default: `32`).
* `repl`: number of synchronous replicas to be provided in the form of
- `fs`: filesystem to be laid out: `none/xfs/ext4` (default: `ext4`).
- `block_size`: block size in Kbytes (default: `32`).
- `repl`: number of synchronous replicas to be provided in the form of
replication factor `1..3` (default: `1`) A string is expected here i.e.
`"1"` and not `1`.
* `priority_io`: determines whether the volume will be created from higher
- `priority_io`: determines whether the volume will be created from higher
performance or a lower priority storage `high/medium/low` (default: `low`).
* `snap_interval`: clock/time interval in minutes for when to trigger snapshots.
- `snap_interval`: clock/time interval in minutes for when to trigger snapshots.
Snapshots are incremental based on difference with the prior snapshot, 0
disables snaps (default: `0`). A string is expected here i.e.
`"70"` and not `70`.
* `aggregation_level`: specifies the number of chunks the volume would be
- `aggregation_level`: specifies the number of chunks the volume would be
distributed into, 0 indicates a non-aggregated volume (default: `0`). A string
is expected here i.e. `"0"` and not `0`
* `ephemeral`: specifies whether the volume should be cleaned-up after unmount
- `ephemeral`: specifies whether the volume should be cleaned-up after unmount
or should be persistent. `emptyDir` use case can set this value to true and
`persistent volumes` use case such as for databases like Cassandra should set
to false, `true/false` (default `false`). A string is expected here i.e.
@ -660,4 +660,3 @@ specified by the `WaitForFirstConsumer` volume binding mode.
Delaying volume binding allows the scheduler to consider all of a Pod's
scheduling constraints when choosing an appropriate PersistentVolume for a
PersistentVolumeClaim.

View File

@ -227,7 +227,7 @@ $ kubectl get crd volumesnapshotcontent -o yaml
If you want to allow users to create a `PersistentVolumeClaim` from an existing
`VolumeSnapshot`, but with a different volume mode than the source, the annotation
`snapshot.storage.kubernetes.io/allowVolumeModeChange: "true"`needs to be added to
`snapshot.storage.kubernetes.io/allow-volume-mode-change: "true"`needs to be added to
the `VolumeSnapshotContent` that corresponds to the `VolumeSnapshot`.
For pre-provisioned snapshots, `spec.sourceVolumeMode` needs to be populated
@ -241,7 +241,7 @@ kind: VolumeSnapshotContent
metadata:
name: new-snapshot-content-test
annotations:
- snapshot.storage.kubernetes.io/allowVolumeModeChange: "true"
- snapshot.storage.kubernetes.io/allow-volume-mode-change: "true"
spec:
deletionPolicy: Delete
driver: hostpath.csi.k8s.io

View File

@ -549,7 +549,7 @@ spec:
<!-- maintenance note: OK to remove all mention of glusterfs once the v1.25 release of
Kubernetes has gone out of support -->
-
Kubernetes {{< skew currentVersion >}} does not include a `glusterfs` volume type.
The GlusterFS in-tree storage driver was deprecated in the Kubernetes v1.25 release
@ -1282,8 +1282,13 @@ in `Container.volumeMounts`. Its values are:
In similar fashion, no mounts created by the container will be visible on
the host. This is the default mode.
This mode is equal to `private` mount propagation as described in the
[Linux kernel documentation](https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt)
This mode is equal to `rprivate` mount propagation as described in
[`mount(8)`](https://man7.org/linux/man-pages/man8/mount.8.html)
However, the CRI runtime may choose `rslave` mount propagation (i.e.,
`HostToContainer`) instead, when `rprivate` propagation is not applicable.
cri-dockerd (Docker) is known to choose `rslave` mount propagation when the
mount source contains the Docker daemon's root directory (`/var/lib/docker`).
* `HostToContainer` - This volume mount will receive all subsequent mounts
that are mounted to this volume or any of its subdirectories.
@ -1296,7 +1301,7 @@ in `Container.volumeMounts`. Its values are:
propagation will see it.
This mode is equal to `rslave` mount propagation as described in the
[Linux kernel documentation](https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt)
[`mount(8)`](https://man7.org/linux/man-pages/man8/mount.8.html)
* `Bidirectional` - This volume mount behaves the same the `HostToContainer` mount.
In addition, all volume mounts created by the container will be propagated
@ -1306,7 +1311,7 @@ in `Container.volumeMounts`. Its values are:
a Pod that needs to mount something on the host using a `hostPath` volume.
This mode is equal to `rshared` mount propagation as described in the
[Linux kernel documentation](https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt)
[`mount(8)`](https://man7.org/linux/man-pages/man8/mount.8.html)
{{< warning >}}
`Bidirectional` mount propagation can be dangerous. It can damage

View File

@ -274,7 +274,8 @@ This functionality requires a container runtime that supports this functionality
#### Field compatibility for Pod security context {#compatibility-v1-pod-spec-containers-securitycontext}
None of the Pod [`securityContext`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context) fields work on Windows.
Only the `securityContext.runAsNonRoot` and `securityContext.windowsOptions` from the Pod
[`securityContext`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context) fields work on Windows.
## Node problem detector

View File

@ -105,30 +105,24 @@ If you do not specify either, then the DaemonSet controller will create Pods on
## How Daemon Pods are scheduled
### Scheduled by default scheduler
A DaemonSet ensures that all eligible nodes run a copy of a Pod. The DaemonSet
controller creates a Pod for each eligible node and adds the
`spec.affinity.nodeAffinity` field of the Pod to match the target host. After
the Pod is created, the default scheduler typically takes over and then binds
the Pod to the target host by setting the `.spec.nodeName` field. If the new
Pod cannot fit on the node, the default scheduler may preempt (evict) some of
the existing Pods based on the
[priority](/docs/concepts/scheduling-eviction/pod-priority-preemption/#pod-priority)
of the new Pod.
{{< feature-state for_k8s_version="1.17" state="stable" >}}
The user can specify a different scheduler for the Pods of the DamonSet, by
setting the `.spec.template.spec.schedulerName` field of the DaemonSet.
A DaemonSet ensures that all eligible nodes run a copy of a Pod. Normally, the
node that a Pod runs on is selected by the Kubernetes scheduler. However,
DaemonSet pods are created and scheduled by the DaemonSet controller instead.
That introduces the following issues:
* Inconsistent Pod behavior: Normal Pods waiting to be scheduled are created
and in `Pending` state, but DaemonSet pods are not created in `Pending`
state. This is confusing to the user.
* [Pod preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption/)
is handled by default scheduler. When preemption is enabled, the DaemonSet controller
will make scheduling decisions without considering pod priority and preemption.
`ScheduleDaemonSetPods` allows you to schedule DaemonSets using the default
scheduler instead of the DaemonSet controller, by adding the `NodeAffinity` term
to the DaemonSet pods, instead of the `.spec.nodeName` term. The default
scheduler is then used to bind the pod to the target host. If node affinity of
the DaemonSet pod already exists, it is replaced (the original node affinity was
taken into account before selecting the target host). The DaemonSet controller only
performs these operations when creating or modifying DaemonSet pods, and no
changes are made to the `spec.template` of the DaemonSet.
The original node affinity specified at the
`.spec.template.spec.affinity.nodeAffinity` field (if specified) is taken into
consideration by the DaemonSet controller when evaluating the eligible nodes,
but is replaced on the created Pod with the node affinity that matches the name
of the eligible node.
```yaml
nodeAffinity:
@ -141,25 +135,40 @@ nodeAffinity:
- target-host-name
```
In addition, `node.kubernetes.io/unschedulable:NoSchedule` toleration is added
automatically to DaemonSet Pods. The default scheduler ignores
`unschedulable` Nodes when scheduling DaemonSet Pods.
### Taints and Tolerations
### Taints and tolerations
Although Daemon Pods respect
[taints and tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/),
the following tolerations are added to DaemonSet Pods automatically according to
the related features.
The DaemonSet controller automatically adds a set of {{< glossary_tooltip
text="tolerations" term_id="toleration" >}} to DaemonSet Pods:
| Toleration Key | Effect | Version | Description |
| ---------------------------------------- | ---------- | ------- | ----------- |
| `node.kubernetes.io/not-ready` | NoExecute | 1.13+ | DaemonSet pods will not be evicted when there are node problems such as a network partition. |
| `node.kubernetes.io/unreachable` | NoExecute | 1.13+ | DaemonSet pods will not be evicted when there are node problems such as a network partition. |
| `node.kubernetes.io/disk-pressure` | NoSchedule | 1.8+ | DaemonSet pods tolerate disk-pressure attributes by default scheduler. |
| `node.kubernetes.io/memory-pressure` | NoSchedule | 1.8+ | DaemonSet pods tolerate memory-pressure attributes by default scheduler. |
| `node.kubernetes.io/unschedulable` | NoSchedule | 1.12+ | DaemonSet pods tolerate unschedulable attributes by default scheduler. |
| `node.kubernetes.io/network-unavailable` | NoSchedule | 1.12+ | DaemonSet pods, who uses host network, tolerate network-unavailable attributes by default scheduler. |
{{< table caption="Tolerations for DaemonSet pods" >}}
| Toleration key | Effect | Details |
| --------------------------------------------------------------------------------------------------------------------- | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------- |
| [`node.kubernetes.io/not-ready`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-not-ready) | `NoExecute` | DaemonSet Pods can be scheduled onto nodes that are not healthy or ready to accept Pods. Any DaemonSet Pods running on such nodes will not be evicted. |
| [`node.kubernetes.io/unreachable`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-unreachable) | `NoExecute` | DaemonSet Pods can be scheduled onto nodes that are unreachable from the node controller. Any DaemonSet Pods running on such nodes will not be evicted. |
| [`node.kubernetes.io/disk-pressure`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-disk-pressure) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes with disk pressure issues. |
| [`node.kubernetes.io/memory-pressure`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-memory-pressure) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes with memory pressure issues. |
| [`node.kubernetes.io/pid-pressure`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-pid-pressure) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes with process pressure issues. |
| [`node.kubernetes.io/unschedulable`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-unschedulable) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes that are unschedulable. |
| [`node.kubernetes.io/network-unavailable`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-network-unavailable) | `NoSchedule` | **Only added for DaemonSet Pods that request host networking**, i.e., Pods having `spec.hostNetwork: true`. Such DaemonSet Pods can be scheduled onto nodes with unavailable network.|
{{< /table >}}
You can add your own tolerations to the Pods of a Daemonset as well, by
defining these in the Pod template of the DaemonSet.
Because the DaemonSet controller sets the
`node.kubernetes.io/unschedulable:NoSchedule` toleration automatically,
Kubernetes can run DaemonSet Pods on nodes that are marked as _unschedulable_.
If you use a DaemonSet to provide an important node-level function, such as
[cluster networking](/docs/concepts/cluster-administration/networking/), it is
helpful that Kubernetes places DaemonSet Pods on nodes before they are ready.
For example, without that special toleration, you could end up in a deadlock
situation where the node is not marked as ready because the network plugin is
not running there, and at the same time the network plugin is not running on
that node because the node is not yet ready.
## Communicating with Daemon Pods

View File

@ -794,7 +794,7 @@ These are some requirements and semantics of the API:
are evaluated in order. Once a rule matches a Pod failure, the remaining rules
are ignored. When no rule matches the Pod failure, the default
handling applies.
- you may want to restrict a rule to a specific container by specifing its name
- you may want to restrict a rule to a specific container by specifying its name
in`spec.podFailurePolicy.rules[*].containerName`. When not specified the rule
applies to all containers. When specified, it should match one the container
or `initContainer` names in the Pod template.

View File

@ -69,7 +69,7 @@ kubectl get rs
And see the frontend one you created:
```shell
```
NAME DESIRED CURRENT READY AGE
frontend 3 3 3 6s
```
@ -118,7 +118,7 @@ kubectl get pods
You should see Pod information similar to:
```shell
```
NAME READY STATUS RESTARTS AGE
frontend-b2zdv 1/1 Running 0 6m36s
frontend-vcmts 1/1 Running 0 6m36s
@ -160,7 +160,7 @@ While you can create bare Pods with no problems, it is strongly recommended to m
labels which match the selector of one of your ReplicaSets. The reason for this is because a ReplicaSet is not limited
to owning Pods specified by its template-- it can acquire other Pods in the manner specified in the previous sections.
Take the previous frontend ReplicaSet example, and the Pods specified in the following manifest:
Take the previous frontend ReplicaSet example, and the Pods specified in the following manifest:
{{< codenew file="pods/pod-rs.yaml" >}}
@ -229,9 +229,9 @@ As with all other Kubernetes API objects, a ReplicaSet needs the `apiVersion`, `
For ReplicaSets, the `kind` is always a ReplicaSet.
When the control plane creates new Pods for a ReplicaSet, the `.metadata.name` of the
ReplicaSet is part of the basis for naming those Pods. The name of a ReplicaSet must be a valid
ReplicaSet is part of the basis for naming those Pods. The name of a ReplicaSet must be a valid
[DNS subdomain](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names)
value, but this can produce unexpected results for the Pod hostnames. For best compatibility,
value, but this can produce unexpected results for the Pod hostnames. For best compatibility,
the name should follow the more restrictive rules for a
[DNS label](/docs/concepts/overview/working-with-objects/names#dns-label-names).
@ -288,8 +288,8 @@ When using the REST API or the `client-go` library, you must set `propagationPol
```shell
kubectl proxy --port=8080
curl -X DELETE 'localhost:8080/apis/apps/v1/namespaces/default/replicasets/frontend' \
> -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' \
> -H "Content-Type: application/json"
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' \
-H "Content-Type: application/json"
```
### Deleting just a ReplicaSet
@ -303,11 +303,11 @@ For example:
```shell
kubectl proxy --port=8080
curl -X DELETE 'localhost:8080/apis/apps/v1/namespaces/default/replicasets/frontend' \
> -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Orphan"}' \
> -H "Content-Type: application/json"
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Orphan"}' \
-H "Content-Type: application/json"
```
Once the original is deleted, you can create a new ReplicaSet to replace it. As long
Once the original is deleted, you can create a new ReplicaSet to replace it. As long
as the old and new `.spec.selector` are the same, then the new one will adopt the old Pods.
However, it will not make any effort to make existing Pods match a new, different pod template.
To update Pods to a new spec in a controlled way, use a
@ -335,19 +335,19 @@ prioritize scaling down pods based on the following general algorithm:
1. If the pods' creation times differ, the pod that was created more recently
comes before the older pod (the creation times are bucketed on an integer log scale
when the `LogarithmicScaleDown` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled)
If all of the above match, then selection is random.
### Pod deletion cost
### Pod deletion cost
{{< feature-state for_k8s_version="v1.22" state="beta" >}}
Using the [`controller.kubernetes.io/pod-deletion-cost`](/docs/reference/labels-annotations-taints/#pod-deletion-cost)
Using the [`controller.kubernetes.io/pod-deletion-cost`](/docs/reference/labels-annotations-taints/#pod-deletion-cost)
annotation, users can set a preference regarding which pods to remove first when downscaling a ReplicaSet.
The annotation should be set on the pod, the range is [-2147483647, 2147483647]. It represents the cost of
deleting a pod compared to other pods belonging to the same ReplicaSet. Pods with lower deletion
cost are preferred to be deleted before pods with higher deletion cost.
cost are preferred to be deleted before pods with higher deletion cost.
The implicit value for this annotation for pods that don't set it is 0; negative values are permitted.
Invalid values will be rejected by the API server.
@ -360,13 +360,13 @@ This feature is beta and enabled by default. You can disable it using the
- This is honored on a best-effort basis, so it does not offer any guarantees on pod deletion order.
- Users should avoid updating the annotation frequently, such as updating it based on a metric value,
because doing so will generate a significant number of pod updates on the apiserver.
{{< /note >}}
{{< /note >}}
#### Example Use Case
The different pods of an application could have different utilization levels. On scale down, the application
The different pods of an application could have different utilization levels. On scale down, the application
may prefer to remove the pods with lower utilization. To avoid frequently updating the pods, the application
should update `controller.kubernetes.io/pod-deletion-cost` once before issuing a scale down (setting the
should update `controller.kubernetes.io/pod-deletion-cost` once before issuing a scale down (setting the
annotation to a value proportional to pod utilization level). This works if the application itself controls
the down scaling; for example, the driver pod of a Spark deployment.
@ -400,7 +400,7 @@ kubectl autoscale rs frontend --max=10 --min=3 --cpu-percent=50
[`Deployment`](/docs/concepts/workloads/controllers/deployment/) is an object which can own ReplicaSets and update
them and their Pods via declarative, server-side rolling updates.
While ReplicaSets can be used independently, today they're mainly used by Deployments as a mechanism to orchestrate Pod
While ReplicaSets can be used independently, today they're mainly used by Deployments as a mechanism to orchestrate Pod
creation, deletion and updates. When you use Deployments you don't have to worry about managing the ReplicaSets that
they create. Deployments own and manage their ReplicaSets.
As such, it is recommended to use Deployments when you want ReplicaSets.
@ -422,7 +422,7 @@ expected to terminate on their own (that is, batch jobs).
### DaemonSet
Use a [`DaemonSet`](/docs/concepts/workloads/controllers/daemonset/) instead of a ReplicaSet for Pods that provide a
machine-level function, such as machine monitoring or machine logging. These Pods have a lifetime that is tied
machine-level function, such as machine monitoring or machine logging. These Pods have a lifetime that is tied
to a machine lifetime: the Pod needs to be running on the machine before other Pods start, and are
safe to terminate when the machine is otherwise ready to be rebooted/shutdown.
@ -444,4 +444,3 @@ As such, ReplicaSets are preferred over ReplicationControllers
object definition to understand the API for replica sets.
* Read about [PodDisruptionBudget](/docs/concepts/workloads/pods/disruptions/) and how
you can use it to manage application availability during disruptions.

View File

@ -1,75 +1,87 @@
---
reviewers:
- janetkuo
title: Automatic Clean-up for Finished Jobs
title: Automatic Cleanup for Finished Jobs
content_type: concept
weight: 70
description: >-
A time-to-live mechanism to clean up old Jobs that have finished execution.
---
<!-- overview -->
{{< feature-state for_k8s_version="v1.23" state="stable" >}}
TTL-after-finished {{<glossary_tooltip text="controller" term_id="controller">}} provides a
TTL (time to live) mechanism to limit the lifetime of resource objects that
have finished execution. TTL controller only handles
{{< glossary_tooltip text="Jobs" term_id="job" >}}.
When your Job has finished, it's useful to keep that Job in the API (and not immediately delete the Job)
so that you can tell whether the Job succeeded or failed.
Kubernetes' TTL-after-finished {{<glossary_tooltip text="controller" term_id="controller">}} provides a
TTL (time to live) mechanism to limit the lifetime of Job objects that
have finished execution.
<!-- body -->
## TTL-after-finished Controller
## Cleanup for finished Jobs
The TTL-after-finished controller is only supported for Jobs. A cluster operator can use this feature to clean
The TTL-after-finished controller is only supported for Jobs. You can use this mechanism to clean
up finished Jobs (either `Complete` or `Failed`) automatically by specifying the
`.spec.ttlSecondsAfterFinished` field of a Job, as in this
[example](/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically).
The TTL-after-finished controller will assume that a job is eligible to be cleaned up
TTL seconds after the job has finished, in other words, when the TTL has expired. When the
TTL-after-finished controller cleans up a job, it will delete it cascadingly, that is to say it will delete
its dependent objects together with it. Note that when the job is deleted,
its lifecycle guarantees, such as finalizers, will be honored.
The TTL seconds can be set at any time. Here are some examples for setting the
The TTL-after-finished controller assumes that a Job is eligible to be cleaned up
TTL seconds after the Job has finished. The timer starts once the
status condition of the Job changes to show that the Job is either `Complete` or `Failed`; once the TTL has
expired, that Job becomes eligible for
[cascading](/docs/concepts/architecture/garbage-collection/#cascading-deletion) removal. When the
TTL-after-finished controller cleans up a job, it will delete it cascadingly, that is to say it will delete
its dependent objects together with it.
Kubernetes honors object lifecycle guarantees on the Job, such as waiting for
[finalizers](/docs/concepts/overview/working-with-objects/finalizers/).
You can set the TTL seconds at any time. Here are some examples for setting the
`.spec.ttlSecondsAfterFinished` field of a Job:
* Specify this field in the job manifest, so that a Job can be cleaned up
* Specify this field in the Job manifest, so that a Job can be cleaned up
automatically some time after it finishes.
* Set this field of existing, already finished jobs, to adopt this new
feature.
* Manually set this field of existing, already finished Jobs, so that they become eligible
for cleanup.
* Use a
[mutating admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks)
to set this field dynamically at job creation time. Cluster administrators can
[mutating admission webhook](/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook)
to set this field dynamically at Job creation time. Cluster administrators can
use this to enforce a TTL policy for finished jobs.
* Use a
[mutating admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks)
to set this field dynamically after the job has finished, and choose
different TTL values based on job status, labels, etc.
[mutating admission webhook](/docs/reference/access-authn-authz/admission-controllers/#mutatingadmissionwebhook)
to set this field dynamically after the Job has finished, and choose
different TTL values based on job status, labels. For this case, the webhook needs
to detect changes to the `.status` of the Job and only set a TTL when the Job
is being marked as completed.
* Write your own controller to manage the cleanup TTL for Jobs that match a particular
{{< glossary_tooltip term_id="selector" text="selector-selector" >}}.
## Caveat
## Caveats
### Updating TTL Seconds
### Updating TTL for finished Jobs
Note that the TTL period, e.g. `.spec.ttlSecondsAfterFinished` field of Jobs,
can be modified after the job is created or has finished. However, once the
Job becomes eligible to be deleted (when the TTL has expired), the system won't
guarantee that the Jobs will be kept, even if an update to extend the TTL
returns a successful API response.
You can modify the TTL period, e.g. `.spec.ttlSecondsAfterFinished` field of Jobs,
after the job is created or has finished. If you extend the TTL period after the
existing `ttlSecondsAfterFinished` period has expired, Kubernetes doesn't guarantee
to retain that Job, even if an update to extend the TTL returns a successful API
response.
### Time Skew
### Time skew
Because TTL-after-finished controller uses timestamps stored in the Kubernetes jobs to
Because the TTL-after-finished controller uses timestamps stored in the Kubernetes jobs to
determine whether the TTL has expired or not, this feature is sensitive to time
skew in the cluster, which may cause TTL-after-finish controller to clean up job objects
skew in your cluster, which may cause the control plane to clean up Job objects
at the wrong time.
Clocks aren't always correct, but the difference should be
very small. Please be aware of this risk when setting a non-zero TTL.
## {{% heading "whatsnext" %}}
* [Clean up Jobs automatically](/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically)
* [Design doc](https://github.com/kubernetes/enhancements/blob/master/keps/sig-apps/592-ttl-after-finish/README.md)
* Read [Clean up Jobs automatically](/docs/concepts/workloads/controllers/job/#clean-up-finished-jobs-automatically)
* Refer to the [Kubernetes Enhancement Proposal](https://github.com/kubernetes/enhancements/blob/master/keps/sig-apps/592-ttl-after-finish/README.md)
(KEP) for adding this mechanism.

View File

@ -289,14 +289,31 @@ section.
## Privileged mode for containers
In Linux, any container in a Pod can enable privileged mode using the `privileged` (Linux) flag on the [security context](/docs/tasks/configure-pod-container/security-context/) of the container spec. This is useful for containers that want to use operating system administrative capabilities such as manipulating the network stack or accessing hardware devices.
If your cluster has the `WindowsHostProcessContainers` feature enabled, you can create a [Windows HostProcess pod](/docs/tasks/configure-pod-container/create-hostprocess-pod) by setting the `windowsOptions.hostProcess` flag on the security context of the pod spec. All containers in these pods must run as Windows HostProcess containers. HostProcess pods run directly on the host and can also be used to perform administrative tasks as is done with Linux privileged containers.
{{< note >}}
Your {{< glossary_tooltip text="container runtime" term_id="container-runtime" >}} must support the concept of a privileged container for this setting to be relevant.
{{< /note >}}
Any container in a pod can run in privileged mode to use operating system administrative capabilities
that would otherwise be inaccessible. This is available for both Windows and Linux.
### Linux priviledged containers
In Linux, any container in a Pod can enable privileged mode using the `privileged` (Linux) flag
on the [security context](/docs/tasks/configure-pod-container/security-context/) of the
container spec. This is useful for containers that want to use operating system administrative
capabilities such as manipulating the network stack or accessing hardware devices.
### Windows priviledged containers
{{< feature-state for_k8s_version="v1.26" state="stable" >}}
In Windows, you can create a [Windows HostProcess pod](/docs/tasks/configure-pod-container/create-hostprocess-pod)
by setting the `windowsOptions.hostProcess` flag on the security context of the pod spec. All containers in these
pods must run as Windows HostProcess containers. HostProcess pods run directly on the host and can also be used
to perform administrative tasks as is done with Linux privileged containers. In order to use this feature, the
`WindowsHostProcessContainers` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) must be enabled.
## Static Pods
_Static Pods_ are managed directly by the kubelet daemon on a specific node,

View File

@ -63,6 +63,9 @@ via either mechanism are:
`metadata.labels['<KEY>']`
: the text value of the pod's {{< glossary_tooltip text="label" term_id="label" >}} named `<KEY>` (for example, `metadata.labels['mylabel']`)
The following information is available through environment variables
**but not as a downwardAPI volume fieldRef**:
`spec.serviceAccountName`
: the name of the pod's {{< glossary_tooltip text="service account" term_id="service-account" >}}
@ -75,8 +78,8 @@ via either mechanism are:
`status.podIP`
: the pod's primary IP address (usually, its IPv4 address)
In addition, the following information is available through
a `downwardAPI` volume `fieldRef`, but **not as environment variables**:
The following information is available through a `downwardAPI` volume
`fieldRef`, **but not as environment variables**:
`metadata.labels`
: all of the pod's labels, formatted as `label-key="escaped-label-value"` with one label per line

View File

@ -267,6 +267,11 @@ after successful sandbox creation and network configuration by the runtime
plugin). For a Pod without init containers, the kubelet sets the `Initialized`
condition to `True` before sandbox creation and network configuration starts.
### Pod scheduling readiness {#pod-scheduling-readiness-gate}
{{< feature-state for_k8s_version="v1.26" state="alpha" >}}
See [Pod Scheduling Readiness](/docs/concepts/scheduling-eviction/pod-scheduling-readiness/) for more information.
## Container probes

View File

@ -0,0 +1,117 @@
---
title: Pod Quality of Service Classes
content_type: concept
weight: 85
---
<!-- overview -->
This page introduces _Quality of Service (QoS) classes_ in Kubernetes, and explains
how Kubernetes assigns a QoS class to each Pods as a consequence of the resource
constraints that you specify for the containers in that Pod. Kubernetes relies on this
classification to make decisions about which Pods to evict when there are not enough
available resources on a Node.
<!-- body -->
## Quality of Service classes
Kubernetes classifies the Pods that you run and allocates each Pod into a specific
_quality of service (QoS) class_. Kubernetes uses that classification to influence how different
pods are handled. Kubernetes does this classification based on the
[resource requests](/docs/concepts/configuration/manage-resources-containers/)
of the {{< glossary_tooltip text="Containers" term_id="container" >}} in that Pod, along with
how those requests relate to resource limits.
This is known as {{< glossary_tooltip text="Quality of Service" term_id="qos-class" >}}
(QoS) class. Kubernetes assigns every Pod a QoS class based on the resource requests
and limits of its component Containers. QoS classes are used by Kubernetes to decide
which Pods to evict from a Node experiencing
[Node Pressure](/docs/concepts/scheduling-eviction/node-pressure-eviction/). The possible
QoS classes are `Guaranteed`, `Burstable`, and `BestEffort`. When a Node runs out of resources,
Kubernetes will first evict `BestEffort` Pods running on that Node, followed by `Burstable` and
finally `Guaranteed` Pods. When this eviction is due to resource pressure, only Pods exceeding
resource requests are candidates for eviction.
### Guaranteed
Pods that are `Guaranteed` have the strictest resource limits and are least likely
to face eviction. They are guaranteed not to be killed until they exceed their limits
or there are no lower-priority Pods that can be preempted from the Node. They may
not acquire resources beyond their specified limits. These Pods can also make
use of exclusive CPUs using the
[`static`](/docs/tasks/administer-cluster/cpu-management-policies/#static-policy) CPU management policy.
#### Criteria
For a Pod to be given a QoS class of `Guaranteed`:
* Every Container in the Pod must have a memory limit and a memory request.
* For every Container in the Pod, the memory limit must equal the memory request.
* Every Container in the Pod must have a CPU limit and a CPU request.
* For every Container in the Pod, the CPU limit must equal the CPU request.
### Burstable
Pods that are `Burstable` have some lower-bound resource guarantees based on the request, but
do not require a specific limit. If a limit is not specified, it defaults to a
limit equivalent to the capacity of the Node, which allows the Pods to flexibly increase
their resources if resources are available. In the event of Pod eviction due to Node
resource pressure, these Pods are evicted only after all `BestEffort` Pods are evicted.
Because a `Burstable` Pod can include a Container that has no resource limits or requests, a Pod
that is `Burstable` can try to use any amount of node resources.
#### Criteria
A Pod is given a QoS class of `Burstable` if:
* The Pod does not meet the criteria for QoS class `Guaranteed`.
* At least one Container in the Pod has a memory or CPU request or limit.
### BestEffort
Pods in the `BestEffort` QoS class can use node resources that aren't specifically assigned
to Pods in other QoS classes. For example, if you have a node with 16 CPU cores available to the
kubelet, and you assign assign 4 CPU cores to a `Guaranteed` Pod, then a Pod in the `BestEffort`
QoS class can try to use any amount of the remaining 12 CPU cores.
The kubelet prefers to evict `BestEffort` Pods if the node comes under resource pressure.
#### Criteria
A Pod has a QoS class of `BestEffort` if it doesn't meet the criteria for either `Guaranteed`
or `Burstable`. In other words, a Pod is `BestEffort` only if none of the Containers in the Pod have a
memory limit or a memory request, and none of the Containers in the Pod have a
CPU limit or a CPU request.
Containers in a Pod can request other resources (not CPU or memory) and still be classified as
`BestEffort`.
## Some behavior is independent of QoS class {#class-independent-behavior}
Certain behavior is independent of the QoS class assigned by Kubernetes. For example:
* Any Container exceeding a resource limit will be killed and restarted by the kubelet without
affecting other Containers in that Pod.
* If a Container exceeds its resource request and the node it runs on faces
resource pressure, the Pod it is in becomes a candidate for [eviction](/docs/concepts/scheduling-eviction/node-pressure-eviction/).
If this occurs, all Containers in the Pod will be terminated. Kubernetes may create a
replacement Pod, usually on a different node.
* The resource request of a Pod is equal to the sum of the resource requests of
its component Containers, and the resource limit of a Pod is equal to the sum of
the resource limits of its component Containers.
* The kube-scheduler does not consider QoS class when selecting which Pods to
[preempt](/docs/concepts/scheduling-eviction/pod-priority-preemption/#preemption).
Preemption can occur when a cluster does not have enough resources to run all the Pods
you defined.
## {{% heading "whatsnext" %}}
* Learn about [resource management for Pods and Containers](/docs/concepts/configuration/manage-resources-containers/).
* Learn about [Node-pressure eviction](/docs/concepts/scheduling-eviction/node-pressure-eviction/).
* Learn about [Pod priority and preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption/).
* Learn about [Pod disruptions](/docs/concepts/workload/pods/disruptions/).
* Learn how to [assign memory resources to containers and pods](/docs/tasks/configure-pod-container/assign-memory-resource/).
* Learn how to [assign CPU resources to containers and pods](/docs/tasks/configure-pod-container/assign-cpu-resource/).
* Learn how to [configure Quality of Service for Pods](/docs/tasks/configure-pod-container/quality-service-pod/).

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 12 KiB

View File

@ -373,21 +373,21 @@ An example request body:
```json
{
"apiVersion":"imagepolicy.k8s.io/v1alpha1",
"kind":"ImageReview",
"spec":{
"containers":[
"apiVersion": "imagepolicy.k8s.io/v1alpha1",
"kind": "ImageReview",
"spec": {
"containers": [
{
"image":"myrepo/myimage:v1"
"image": "myrepo/myimage:v1"
},
{
"image":"myrepo/myimage@sha256:beb6bd6a68f114c1dc2ea4b28db81bdf91de202a9014972bec5e4d9171d90ed"
"image": "myrepo/myimage@sha256:beb6bd6a68f114c1dc2ea4b28db81bdf91de202a9014972bec5e4d9171d90ed"
}
],
"annotations":{
"annotations": {
"mycluster.image-policy.k8s.io/ticket-1234": "break-glass"
},
"namespace":"mynamespace"
"namespace": "mynamespace"
}
}
```
@ -610,9 +610,9 @@ This file may be json or yaml and has the following format:
```yaml
podNodeSelectorPluginConfig:
clusterDefaultNodeSelector: name-of-node-selector
namespace1: name-of-node-selector
namespace2: name-of-node-selector
clusterDefaultNodeSelector: name-of-node-selector
namespace1: name-of-node-selector
namespace2: name-of-node-selector
```
Reference the `PodNodeSelector` configuration file from the file provided to the API server's
@ -663,23 +663,15 @@ admission plugin, which allows preventing pods from running on specifically tain
{{< feature-state for_k8s_version="v1.25" state="stable" >}}
This is the replacement for the deprecated [PodSecurityPolicy](#podsecuritypolicy) admission controller
defined in the next section. This admission controller acts on creation and modification of the pod and
determines if it should be admitted based on the requested security context and the
[Pod Security Standards](/docs/concepts/security/pod-security-standards/).
The PodSecurity admission controller checks new Pods before they are
admitted, determines if it should be admitted based on the requested security context and the restrictions on permitted
[Pod Security Standards](/docs/concepts/security/pod-security-standards/)
for the namespace that the Pod would be in.
See the [Pod Security Admission documentation](/docs/concepts/security/pod-security-admission/)
for more information.
See the [Pod Security Admission](/docs/concepts/security/pod-security-admission/)
documentation for more information.
### PodSecurityPolicy {#podsecuritypolicy}
{{< feature-state for_k8s_version="v1.21" state="deprecated" >}}
This admission controller acts on creation and modification of the pod and determines if it should be admitted
based on the requested security context and the available Pod Security Policies.
See also the [PodSecurityPolicy](/docs/concepts/security/pod-security-policy/) documentation
for more information.
PodSecurity replaced an older admission controller named PodSecurityPolicy.
### PodTolerationRestriction {#podtolerationrestriction}
@ -744,17 +736,37 @@ for more information.
### SecurityContextDeny {#securitycontextdeny}
This admission controller will deny any Pod that attempts to set certain escalating
[SecurityContext](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#securitycontext-v1-core)
fields, as shown in the
[Configure a Security Context for a Pod or Container](/docs/tasks/configure-pod-container/security-context/)
task.
If you don't use [Pod Security admission](/docs/concepts/security/pod-security-admission/),
[PodSecurityPolicies](/docs/concepts/security/pod-security-policy/), nor any external enforcement mechanism,
then you could use this admission controller to restrict the set of values a security context can take.
{{< feature-state for_k8s_version="v1.0" state="alpha" >}}
See [Pod Security Standards](/docs/concepts/security/pod-security-standards/) for more context on restricting
pod privileges.
{{< caution >}}
This admission controller plugin is **outdated** and **incomplete**, it may be
unusable or not do what you would expect. It was originally designed to prevent
the use of some, but not all, security-sensitive fields. Indeed, fields like
`privileged`, were not filtered at creation and the plugin was not updated with
the most recent fields, and new APIs like the `ephemeralContainers` field for a
Pod.
The [Pod Security Admission](/docs/concepts/security/pod-security-admission/)
plugin enforcing the [Pod Security Standards](/docs/concepts/security/pod-security-standards/)
`Restricted` profile captures what this plugin was trying to achieve in a better
and up-to-date way.
{{< /caution >}}
This admission controller will deny any Pod that attempts to set the following
[SecurityContext](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context)
fields:
- `.spec.securityContext.supplementalGroups`
- `.spec.securityContext.seLinuxOptions`
- `.spec.securityContext.runAsUser`
- `.spec.securityContext.fsGroup`
- `.spec.(init)Containers[*].securityContext.seLinuxOptions`
- `.spec.(init)Containers[*].securityContext.runAsUser`
For more historical context on this plugin, see
[The birth of PodSecurityPolicy](/blog/2022/08/23/podsecuritypolicy-the-historical-context/#the-birth-of-podsecuritypolicy)
from the Kubernetes blog article about PodSecurityPolicy and its removal. The
article details the PodSecurityPolicy historical context and the birth of the
`securityContext` field for Pods.
### ServiceAccount {#serviceaccount}

View File

@ -104,54 +104,54 @@ Kubernetes provides built-in signers that each have a well-known `signerName`:
1. `kubernetes.io/kube-apiserver-client`: signs certificates that will be honored as client certificates by the API server.
Never auto-approved by {{< glossary_tooltip term_id="kube-controller-manager" >}}.
1. Trust distribution: signed certificates must be honored as client certificates by the API server. The CA bundle is not distributed by any other means.
1. Permitted subjects - no subject restrictions, but approvers and signers may choose not to approve or sign.
Certain subjects like cluster-admin level users or groups vary between distributions and installations,
but deserve additional scrutiny before approval and signing.
The `CertificateSubjectRestriction` admission plugin is enabled by default to restrict `system:masters`,
but it is often not the only cluster-admin subject in a cluster.
1. Permitted x509 extensions - honors subjectAltName and key usage extensions and discards other extensions.
1. Permitted key usages - must include `["client auth"]`. Must not include key usages beyond `["digital signature", "key encipherment", "client auth"]`.
1. Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the `--cluster-signing-duration` option or, if specified, the `spec.expirationSeconds` field of the CSR object.
1. CA bit allowed/disallowed - not allowed.
1. Trust distribution: signed certificates must be honored as client certificates by the API server. The CA bundle is not distributed by any other means.
1. Permitted subjects - no subject restrictions, but approvers and signers may choose not to approve or sign.
Certain subjects like cluster-admin level users or groups vary between distributions and installations,
but deserve additional scrutiny before approval and signing.
The `CertificateSubjectRestriction` admission plugin is enabled by default to restrict `system:masters`,
but it is often not the only cluster-admin subject in a cluster.
1. Permitted x509 extensions - honors subjectAltName and key usage extensions and discards other extensions.
1. Permitted key usages - must include `["client auth"]`. Must not include key usages beyond `["digital signature", "key encipherment", "client auth"]`.
1. Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the `--cluster-signing-duration` option or, if specified, the `spec.expirationSeconds` field of the CSR object.
1. CA bit allowed/disallowed - not allowed.
1. `kubernetes.io/kube-apiserver-client-kubelet`: signs client certificates that will be honored as client certificates by the
API server.
May be auto-approved by {{< glossary_tooltip term_id="kube-controller-manager" >}}.
1. Trust distribution: signed certificates must be honored as client certificates by the API server. The CA bundle
is not distributed by any other means.
1. Permitted subjects - organizations are exactly `["system:nodes"]`, common name starts with "`system:node:`".
1. Permitted x509 extensions - honors key usage extensions, forbids subjectAltName extensions and drops other extensions.
1. Permitted key usages - exactly `["key encipherment", "digital signature", "client auth"]`.
1. Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the `--cluster-signing-duration` option or, if specified, the `spec.expirationSeconds` field of the CSR object.
1. CA bit allowed/disallowed - not allowed.
1. Trust distribution: signed certificates must be honored as client certificates by the API server. The CA bundle
is not distributed by any other means.
1. Permitted subjects - organizations are exactly `["system:nodes"]`, common name starts with "`system:node:`".
1. Permitted x509 extensions - honors key usage extensions, forbids subjectAltName extensions and drops other extensions.
1. Permitted key usages - exactly `["key encipherment", "digital signature", "client auth"]`.
1. Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the `--cluster-signing-duration` option or, if specified, the `spec.expirationSeconds` field of the CSR object.
1. CA bit allowed/disallowed - not allowed.
1. `kubernetes.io/kubelet-serving`: signs serving certificates that are honored as a valid kubelet serving certificate
by the API server, but has no other guarantees.
Never auto-approved by {{< glossary_tooltip term_id="kube-controller-manager" >}}.
1. Trust distribution: signed certificates must be honored by the API server as valid to terminate connections to a kubelet.
The CA bundle is not distributed by any other means.
1. Permitted subjects - organizations are exactly `["system:nodes"]`, common name starts with "`system:node:`".
1. Permitted x509 extensions - honors key usage and DNSName/IPAddress subjectAltName extensions, forbids EmailAddress and
URI subjectAltName extensions, drops other extensions. At least one DNS or IP subjectAltName must be present.
1. Permitted key usages - exactly `["key encipherment", "digital signature", "server auth"]`.
1. Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the `--cluster-signing-duration` option or, if specified, the `spec.expirationSeconds` field of the CSR object.
1. CA bit allowed/disallowed - not allowed.
1. Trust distribution: signed certificates must be honored by the API server as valid to terminate connections to a kubelet.
The CA bundle is not distributed by any other means.
1. Permitted subjects - organizations are exactly `["system:nodes"]`, common name starts with "`system:node:`".
1. Permitted x509 extensions - honors key usage and DNSName/IPAddress subjectAltName extensions, forbids EmailAddress and
URI subjectAltName extensions, drops other extensions. At least one DNS or IP subjectAltName must be present.
1. Permitted key usages - exactly `["key encipherment", "digital signature", "server auth"]`.
1. Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the `--cluster-signing-duration` option or, if specified, the `spec.expirationSeconds` field of the CSR object.
1. CA bit allowed/disallowed - not allowed.
1. `kubernetes.io/legacy-unknown`: has no guarantees for trust at all. Some third-party distributions of Kubernetes
1. `kubernetes.io/legacy-unknown`: has no guarantees for trust at all. Some third-party distributions of Kubernetes
may honor client certificates signed by it. The stable CertificateSigningRequest API (version `certificates.k8s.io/v1` and later)
does not allow to set the `signerName` as `kubernetes.io/legacy-unknown`.
Never auto-approved by {{< glossary_tooltip term_id="kube-controller-manager" >}}.
1. Trust distribution: None. There is no standard trust or distribution for this signer in a Kubernetes cluster.
1. Permitted subjects - any
1. Permitted x509 extensions - honors subjectAltName and key usage extensions and discards other extensions.
1. Permitted key usages - any
1. Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the `--cluster-signing-duration` option or, if specified, the `spec.expirationSeconds` field of the CSR object.
1. CA bit allowed/disallowed - not allowed.
1. Trust distribution: None. There is no standard trust or distribution for this signer in a Kubernetes cluster.
1. Permitted subjects - any
1. Permitted x509 extensions - honors subjectAltName and key usage extensions and discards other extensions.
1. Permitted key usages - any
1. Expiration/certificate lifetime - for the kube-controller-manager implementation of this signer, set to the minimum
of the `--cluster-signing-duration` option or, if specified, the `spec.expirationSeconds` field of the CSR object.
1. CA bit allowed/disallowed - not allowed.
{{< note >}}
Failures for all of these are only reported in kube-controller-manager logs.
@ -238,7 +238,11 @@ Some points to note:
- `usages` has to be '`client auth`'
- `expirationSeconds` could be made longer (i.e. `864000` for ten days) or shorter (i.e. `3600` for one hour)
- `request` is the base64 encoded value of the CSR file content.
You can get the content using this command: ```cat myuser.csr | base64 | tr -d "\n"```
You can get the content using this command:
```shell
cat myuser.csr | base64 | tr -d "\n"
```
### Approve certificate signing request

View File

@ -591,7 +591,7 @@ is not considered to match.
Use the object selector only if the webhook is opt-in, because end users may skip
the admission webhook by setting the labels.
This example shows a mutating webhook that would match a `CREATE` of any resource with the label `foo: bar`:
This example shows a mutating webhook that would match a `CREATE` of any resource (but not subresources) with the label `foo: bar`:
```yaml
apiVersion: admissionregistration.k8s.io/v1

View File

@ -193,9 +193,10 @@ spec:
matchResources:
namespaceSelector:
matchExpressions:
- key: environment,
operator: NotIn,
values: ["test"]
- key: environment
operator: NotIn
values:
- test
```
And have a parameter resource like:
@ -222,7 +223,7 @@ spec:
matchResources:
namespaceSelector:
matchExpressions:
- key: environment,
- key: environment
operator: Exists
```

View File

@ -383,7 +383,7 @@ Each feature gate is designed for enabling/disabling a specific feature:
to see the requesting subject's authentication information.
See [API access to authentication information for a client](/docs/reference/access-authn-authz/authentication/#self-subject-review)
for more details.
- `APIServerIdentity`: Assign each API server an ID in a cluster.
- `APIServerIdentity`: Assign each API server an ID in a cluster, using a [Lease](/docs/concepts/architecture/leases).
- `APIServerTracing`: Add support for distributed tracing in the API server.
See [Traces for Kubernetes System Components](/docs/concepts/cluster-administration/system-traces) for more details.
- `AdvancedAuditing`: Enable [advanced auditing](/docs/tasks/debug/debug-cluster/audit/#advanced-audit)
@ -697,15 +697,12 @@ Each feature gate is designed for enabling/disabling a specific feature:
- `RotateKubeletServerCertificate`: Enable the rotation of the server TLS certificate on the kubelet.
See [kubelet configuration](/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/#kubelet-configuration)
for more details.
- `SELinuxMountReadWriteOncePod`: Speed up container startup by mounting volumes with the correct
SELinux label instead of changing each file on the volumes recursively. The initial implementation
focused on ReadWriteOncePod volumes.
- `SELinuxMountReadWriteOncePod`: Speeds up container startup by allowing kubelet to mount volumes
for a Pod directly with the correct SELinux label instead of changing each file on the volumes
recursively. The initial implementation focused on ReadWriteOncePod volumes.
- `SeccompDefault`: Enables the use of `RuntimeDefault` as the default seccomp profile
for all workloads.
The seccomp profile is specified in the `securityContext` of a Pod and/or a Container.
- `SELinuxMountReadWriteOncePod`: Allows kubelet to mount volumes for a Pod directly with the
right SELinux label instead of applying the SELinux label recursively on every file on the
volume.
- `ServerSideApply`: Enables the [Sever Side Apply (SSA)](/docs/reference/using-api/server-side-apply/)
feature on the API Server.
- `ServerSideFieldValidation`: Enables server-side field validation. This means the validation

View File

@ -38,13 +38,6 @@ kubelet [flags]
</colgroup>
<tbody>
<tr>
<td colspan="2">--add-dir-header</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">If true, adds the file directory to the header of the log messages (DEPRECATED: will be removed in a future release, see <a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components">here</a>.)</td>
</tr>
<tr>
<td colspan="2">--address string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: 0.0.0.0 </td>
</tr>
@ -59,13 +52,6 @@ kubelet [flags]
<td></td><td style="line-height: 130%; word-wrap: break-word;">Comma-separated whitelist of unsafe sysctls or unsafe sysctl patterns (ending in <code>&ast;</code>). Use these at your own risk. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
<td colspan="2">--alsologtostderr</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Log to standard error as well as files (DEPRECATED: will be removed in a future release, see <a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components">here</a>.)</td>
</tr>
<tr>
<td colspan="2">--anonymous-auth&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: true</td>
</tr>
@ -91,7 +77,7 @@ kubelet [flags]
<td colspan="2">--authorization-mode string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>AlwaysAllow</code></td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Authorization mode for Kubelet server. Valid options are AlwaysAllow or Webhook. Webhook mode uses the SubjectAccessReview API to determine authorization. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Authorization mode for Kubelet server. Valid options are AlwaysAllow or Webhook. Webhook mode uses the SubjectAccessReview API to determine authorization. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
@ -140,7 +126,7 @@ kubelet [flags]
<td colspan="2">--cgroup-root string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>''</code></td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Optional root cgroup to use for pods. This is handled by the container runtime on a best effort basis. Default: '', which means use the container runtime default. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Optional root cgroup to use for pods. This is handled by the container runtime on a best effort basis. Default: '', which means use the container runtime default. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
@ -154,7 +140,7 @@ kubelet [flags]
<td colspan="2">--client-ca-file string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">If set, any request presenting a client certificate signed by one of the authorities in the client-ca-file is authenticated with an identity corresponding to the <code>CommonName</code> of the client certificate. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">If set, any request presenting a client certificate signed by one of the authorities in the client-ca-file is authenticated with an identity corresponding to the <code>CommonName</code> of the client certificate. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
@ -196,7 +182,7 @@ kubelet [flags]
<td colspan="2">--container-log-max-files int32&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: 5</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">&lt;Warning: Beta feature&gt; Set the maximum number of container log files that can be present for a container. The number must be &gt;= 2. This flag can only be used with <code>--container-runtime=remote</code>. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">&lt;Warning: Beta feature&gt; Set the maximum number of container log files that can be present for a container. The number must be &gt;= 2. This flag can only be used with <code>--container-runtime=remote</code>. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
@ -249,7 +235,7 @@ kubelet [flags]
</tr>
<tr>
<td colspan="2">--cpu-manager-policy-options mapStringString</td>
<td colspan="2">--cpu-manager-policy-options string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">A set of key=value CPU Manager policy options to use, to fine tune their behaviour. If not supplied, keep the default behaviour. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
@ -287,7 +273,7 @@ kubelet [flags]
<td colspan="2">--enforce-node-allocatable strings&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>pods</code></td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">A comma separated list of levels of node allocatable enforcement to be enforced by kubelet. Acceptable options are <code>none</code>, <code>pods</code>, <code>system-reserved</code>, and <code>kube-reserved</code>. If the latter two options are specified, <code>--system-reserved-cgroup</code> and <code>--kube-reserved-cgroup</code> must also be set, respectively. If <code>none</code> is specified, no additional options should be set. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/">here</a> for more details. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">A comma separated list of levels of node allocatable enforcement to be enforced by kubelet. Acceptable options are <code>none</code>, <code>pods</code>, <code>system-reserved</code>, and <code>kube-reserved</code>. If the latter two options are specified, <code>--system-reserved-cgroup</code> and <code>--kube-reserved-cgroup</code> must also be set, respectively. If <code>none</code> is specified, no additional options should be set. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/">here</a> for more details. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
@ -305,7 +291,7 @@ kubelet [flags]
</tr>
<tr>
<td colspan="2">--eviction-hard mapStringString&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>imagefs.available<15%,memory.available<100Mi,nodefs.available<10%</code></td>
<td colspan="2">--eviction-hard string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>imagefs.available<15%,memory.available<100Mi,nodefs.available<10%</code></td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">A set of eviction thresholds (e.g. <code>memory.available<1Gi</code>) that if met would trigger a pod eviction. On a Linux node, the default value also includes <code>nodefs.inodesFree<5%</code>. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
@ -319,7 +305,7 @@ kubelet [flags]
</tr>
<tr>
<td colspan="2">--eviction-minimum-reclaim mapStringString</td>
<td colspan="2">--eviction-minimum-reclaim string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">A set of minimum reclaims (e.g. <code>imagefs.available=2Gi</code>) that describes the minimum amount of resource the kubelet will reclaim when performing a pod eviction if that resource is under pressure. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
@ -333,14 +319,14 @@ kubelet [flags]
</tr>
<tr>
<td colspan="2">--eviction-soft mapStringString</td>
<td colspan="2">--eviction-soft string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">A set of eviction thresholds (e.g. <code>memory.available<1.5Gi</code>) that if met over a corresponding grace period would trigger a pod eviction. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
<td colspan="2">--eviction-soft-grace-period mapStringString</td>
<td colspan="2">--eviction-soft-grace-period string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">A set of eviction grace periods (e.g. <code>memory.available=1m30s</code>) that correspond to how long a soft eviction threshold must hold before triggering a pod eviction. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
@ -360,13 +346,6 @@ kubelet [flags]
<td></td><td style="line-height: 130%; word-wrap: break-word;">When set to <code>true</code>, hard eviction thresholds will be ignored while calculating node allocatable. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/">here</a> for more details. (DEPRECATED: will be removed in 1.24 or later)</td>
</tr>
<tr>
<td colspan="2">--experimental-kernel-memcg-notification</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Use kernelMemcgNotification configuration, this flag will be removed in 1.24 or later. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
<td colspan="2">--experimental-mounter-path string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>mount</code></td>
</tr>
@ -389,39 +368,34 @@ kubelet [flags]
APIListChunking=true|false (BETA - default=true)<br/>
APIPriorityAndFairness=true|false (BETA - default=true)<br/>
APIResponseCompression=true|false (BETA - default=true)<br/>
APIServerIdentity=true|false (ALPHA - default=false)<br/>
APISelfSubjectReview=true|false (ALPHA - default=false)<br/>
APIServerIdentity=true|false (BETA - default=true)<br/>
APIServerTracing=true|false (ALPHA - default=false)<br/>
AggregatedDiscoveryEndpoint=true|false (ALPHA - default=false)<br/>
AllAlpha=true|false (ALPHA - default=false)<br/>
AllBeta=true|false (BETA - default=false)<br/>
AnyVolumeDataSource=true|false (BETA - default=true)<br/>
AppArmor=true|false (BETA - default=true)<br/>
CPUManager=true|false (BETA - default=true)<br/>
CPUManagerPolicyAlphaOptions=true|false (ALPHA - default=false)<br/>
CPUManagerPolicyBetaOptions=true|false (BETA - default=true)<br/>
CPUManagerPolicyOptions=true|false (BETA - default=true)<br/>
CSIInlineVolume=true|false (BETA - default=true)<br/>
CSIMigration=true|false (BETA - default=true)<br/>
CSIMigrationAWS=true|false (BETA - default=true)<br/>
CSIMigrationAzureFile=true|false (BETA - default=true)<br/>
CSIMigrationGCE=true|false (BETA - default=true)<br/>
CSIMigrationPortworx=true|false (ALPHA - default=false)<br/>
CSIMigrationPortworx=true|false (BETA - default=false)<br/>
CSIMigrationRBD=true|false (ALPHA - default=false)<br/>
CSIMigrationvSphere=true|false (BETA - default=false)<br/>
CSINodeExpandSecret=true|false (ALPHA - default=false)<br/>
CSIVolumeHealth=true|false (ALPHA - default=false)<br/>
ComponentSLIs=true|false (ALPHA - default=false)<br/>
ContainerCheckpoint=true|false (ALPHA - default=false)<br/>
ContextualLogging=true|false (ALPHA - default=false)<br/>
CronJobTimeZone=true|false (ALPHA - default=false)<br/>
CronJobTimeZone=true|false (BETA - default=true)<br/>
CrossNamespaceVolumeDataSource=true|false (ALPHA - default=false)<br/>
CustomCPUCFSQuotaPeriod=true|false (ALPHA - default=false)<br/>
CustomResourceValidationExpressions=true|false (ALPHA - default=false)<br/>
DaemonSetUpdateSurge=true|false (BETA - default=true)<br/>
DelegateFSGroupToCSIDriver=true|false (BETA - default=true)<br/>
DevicePlugins=true|false (BETA - default=true)<br/>
DisableAcceleratorUsageMetrics=true|false (BETA - default=true)<br/>
CustomResourceValidationExpressions=true|false (BETA - default=true)<br/>
DisableCloudProviders=true|false (ALPHA - default=false)<br/>
DisableKubeletCloudCredentialProviders=true|false (ALPHA - default=false)<br/>
DownwardAPIHugePages=true|false (BETA - default=true)<br/>
EndpointSliceTerminatingCondition=true|false (BETA - default=true)<br/>
EphemeralContainers=true|false (BETA - default=true)<br/>
ExpandedDNSConfig=true|false (ALPHA - default=false)<br/>
DynamicResourceAllocation=true|false (ALPHA - default=false)<br/>
EventedPLEG=true|false (ALPHA - default=false)<br/>
ExpandedDNSConfig=true|false (BETA - default=true)<br/>
ExperimentalHostUserNamespaceDefaulting=true|false (BETA - default=false)<br/>
GRPCContainerProbe=true|false (BETA - default=true)<br/>
GracefulNodeShutdown=true|false (BETA - default=true)<br/>
@ -429,7 +403,7 @@ GracefulNodeShutdownBasedOnPodPriority=true|false (BETA - default=true)<br/>
HPAContainerMetrics=true|false (ALPHA - default=false)<br/>
HPAScaleToZero=true|false (ALPHA - default=false)<br/>
HonorPVReclaimPolicy=true|false (ALPHA - default=false)<br/>
IdentifyPodOS=true|false (BETA - default=true)<br/>
IPTablesOwnershipCleanup=true|false (ALPHA - default=false)<br/>
InTreePluginAWSUnregister=true|false (ALPHA - default=false)<br/>
InTreePluginAzureDiskUnregister=true|false (ALPHA - default=false)<br/>
InTreePluginAzureFileUnregister=true|false (ALPHA - default=false)<br/>
@ -439,53 +413,65 @@ InTreePluginPortworxUnregister=true|false (ALPHA - default=false)<br/>
InTreePluginRBDUnregister=true|false (ALPHA - default=false)<br/>
InTreePluginvSphereUnregister=true|false (ALPHA - default=false)<br/>
JobMutableNodeSchedulingDirectives=true|false (BETA - default=true)<br/>
JobPodFailurePolicy=true|false (BETA - default=true)<br/>
JobReadyPods=true|false (BETA - default=true)<br/>
JobTrackingWithFinalizers=true|false (BETA - default=false)<br/>
KubeletCredentialProviders=true|false (BETA - default=true)<br/>
KMSv2=true|false (ALPHA - default=false)<br/>
KubeletInUserNamespace=true|false (ALPHA - default=false)<br/>
KubeletPodResources=true|false (BETA - default=true)<br/>
KubeletPodResourcesGetAllocatable=true|false (BETA - default=true)<br/>
LegacyServiceAccountTokenNoAutoGeneration=true|false (BETA - default=true)<br/>
LocalStorageCapacityIsolation=true|false (BETA - default=true)<br/>
KubeletTracing=true|false (ALPHA - default=false)<br/>
LegacyServiceAccountTokenTracking=true|false (ALPHA - default=false)<br/>
LocalStorageCapacityIsolationFSQuotaMonitoring=true|false (ALPHA - default=false)<br/>
LogarithmicScaleDown=true|false (BETA - default=true)<br/>
LoggingAlphaOptions=true|false (ALPHA - default=false)<br/>
LoggingBetaOptions=true|false (BETA - default=true)<br/>
MatchLabelKeysInPodTopologySpread=true|false (ALPHA - default=false)<br/>
MaxUnavailableStatefulSet=true|false (ALPHA - default=false)<br/>
MemoryManager=true|false (BETA - default=true)<br/>
MemoryQoS=true|false (ALPHA - default=false)<br/>
MinDomainsInPodTopologySpread=true|false (ALPHA - default=false)<br/>
MixedProtocolLBService=true|false (BETA - default=true)<br/>
NetworkPolicyEndPort=true|false (BETA - default=true)<br/>
MinDomainsInPodTopologySpread=true|false (BETA - default=false)<br/>
MinimizeIPTablesRestore=true|false (ALPHA - default=false)<br/>
MultiCIDRRangeAllocator=true|false (ALPHA - default=false)<br/>
NetworkPolicyStatus=true|false (ALPHA - default=false)<br/>
NodeOutOfServiceVolumeDetach=true|false (ALPHA - default=false)<br/>
NodeInclusionPolicyInPodTopologySpread=true|false (BETA - default=true)<br/>
NodeOutOfServiceVolumeDetach=true|false (BETA - default=true)<br/>
NodeSwap=true|false (ALPHA - default=false)<br/>
OpenAPIEnums=true|false (BETA - default=true)<br/>
OpenAPIV3=true|false (BETA - default=true)<br/>
PDBUnhealthyPodEvictionPolicy=true|false (ALPHA - default=false)<br/>
PodAndContainerStatsFromCRI=true|false (ALPHA - default=false)<br/>
PodDeletionCost=true|false (BETA - default=true)<br/>
PodSecurity=true|false (BETA - default=true)<br/>
ProbeTerminationGracePeriod=true|false (BETA - default=false)<br/>
PodDisruptionConditions=true|false (BETA - default=true)<br/>
PodHasNetworkCondition=true|false (ALPHA - default=false)<br/>
PodSchedulingReadiness=true|false (ALPHA - default=false)<br/>
ProbeTerminationGracePeriod=true|false (BETA - default=true)<br/>
ProcMountType=true|false (ALPHA - default=false)<br/>
ProxyTerminatingEndpoints=true|false (ALPHA - default=false)<br/>
ProxyTerminatingEndpoints=true|false (BETA - default=true)<br/>
QOSReserved=true|false (ALPHA - default=false)<br/>
ReadWriteOncePod=true|false (ALPHA - default=false)<br/>
RecoverVolumeExpansionFailure=true|false (ALPHA - default=false)<br/>
RemainingItemCount=true|false (BETA - default=true)<br/>
RetroactiveDefaultStorageClass=true|false (BETA - default=true)<br/>
RotateKubeletServerCertificate=true|false (BETA - default=true)<br/>
SeccompDefault=true|false (ALPHA - default=false)<br/>
ServerSideFieldValidation=true|false (ALPHA - default=false)<br/>
ServiceIPStaticSubrange=true|false (ALPHA - default=false)<br/>
ServiceInternalTrafficPolicy=true|false (BETA - default=true)<br/>
SELinuxMountReadWriteOncePod=true|false (ALPHA - default=false)<br/>
SeccompDefault=true|false (BETA - default=true)<br/>
ServerSideFieldValidation=true|false (BETA - default=true)<br/>
SizeMemoryBackedVolumes=true|false (BETA - default=true)<br/>
StatefulSetAutoDeletePVC=true|false (ALPHA - default=false)<br/>
StatefulSetMinReadySeconds=true|false (BETA - default=true)<br/>
StatefulSetStartOrdinal=true|false (ALPHA - default=false)<br/>
StorageVersionAPI=true|false (ALPHA - default=false)<br/>
StorageVersionHash=true|false (BETA - default=true)<br/>
TopologyAwareHints=true|false (BETA - default=true)<br/>
TopologyManager=true|false (BETA - default=true)<br/>
TopologyManagerPolicyAlphaOptions=true|false (ALPHA - default=false)<br/>
TopologyManagerPolicyBetaOptions=true|false (BETA - default=false)<br/>
TopologyManagerPolicyOptions=true|false (ALPHA - default=false)<br/>
UserNamespacesStatelessPodsSupport=true|false (ALPHA - default=false)<br/>
ValidatingAdmissionPolicy=true|false (ALPHA - default=false)<br/>
VolumeCapacityPriority=true|false (ALPHA - default=false)<br/>
WinDSR=true|false (ALPHA - default=false)<br/>
WinOverlay=true|false (BETA - default=true)<br/>
WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
WindowsHostNetwork=true|false (ALPHA - default=true)<br/>
(DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
@ -623,7 +609,7 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
</tr>
<tr>
<td colspan="2">--kube-reserved mapStringString&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: &lt;None&gt;</td>
<td colspan="2">--kube-reserved string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: &lt;None&gt;</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">A set of <code>&lt;resource name&gt;=&lt;resource quantity&gt;</code> (e.g. <code>cpu=200m,memory=500Mi,ephemeral-storage=1Gi,pid='100'</code>) pairs that describe resources reserved for kubernetes system components. Currently <code>cpu</code>, <code>memory</code> and local <code>ephemeral-storage</code> for root file system are supported. See <a href="http://kubernetes.io/docs/user-guide/compute-resources">here</a> for more detail. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
@ -650,6 +636,13 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Optional absolute name of cgroups to create and run the Kubelet in. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
<td colspan="2">--local-storage-capacity-isolation&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>true</code></td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">If true, local ephemeral storage isolation is enabled. Otherwise, local storage isolation feature will be disabled. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)</td>
</tr>
<tr>
<td colspan="2">--lock-file string</td>
</tr>
@ -657,34 +650,6 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
<td></td><td style="line-height: 130%; word-wrap: break-word;">&lt;Warning: Alpha feature&gt; The path to file for kubelet to use as a lock file.</td>
</tr>
<tr>
<td colspan="2">--log-backtrace-at &lt;A string of format 'file:line'&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>":0"</code></td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">When logging hits line <code><file>:<N></code>, emit a stack trace. (DEPRECATED: will be removed in a future release, see <a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components">here</a>.)</td>
</tr>
<tr>
<td colspan="2">--log-dir string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">If non-empty, write log files in this directory. (DEPRECATED: will be removed in a future release, see <a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components">here</a>.)</td>
</tr>
<tr>
<td colspan="2">--log-file string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">If non-empty, use this log file. (DEPRECATED: will be removed in a future release, see <a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components">here</a>.)</td>
</tr>
<tr>
<td colspan="2">--log-file-max-size uint&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: 1800</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (DEPRECATED: will be removed in a future release, see <a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components">here</a>.)</td>
</tr>
<tr>
<td colspan="2">--log-flush-frequency duration&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>5s</code></td>
</tr>
@ -696,28 +661,21 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
<td colspan="2">--log-json-info-buffer-size string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>'0'</code></td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">[Experimental] In JSON format with split output streams, the info messages can be buffered for a while to increase performance. The default value of zero bytes disables buffering. The size can be specified as number of bytes (512), multiples of 1000 (1K), multiples of 1024 (2Ki), or powers of those (3M, 4G, 5Mi, 6Gi). (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">[Alpha] In JSON format with split output streams, the info messages can be buffered for a while to increase performance. The default value of zero bytes disables buffering. The size can be specified as number of bytes (512), multiples of 1000 (1K), multiples of 1024 (2Ki), or powers of those (3M, 4G, 5Mi, 6Gi). Enable the <code>LoggingAlphaOptions</code> feature gate to use this. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
<td colspan="2">--log-json-split-stream</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">[Experimental] In JSON format, write error messages to stderr and info messages to stdout. The default is to write a single stream to stdout. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">[Alpha] In JSON format, write error messages to stderr and info messages to stdout. The default is to write a single stream to stdout. Enable the <code>LoggingAlphaOptions</code> feature gate to use this. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
<td colspan="2">--logging-format string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>text</code></td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Sets the log format. Permitted formats: <code>text</code>, <code>json</code>.<br/>Non-default formats don't honor these flags: <code>--add-dir-header</code>, <code>--alsologtostderr</code>, <code>--log-backtrace-at</code>, <code>--log-dir</code>, <code>--log-file</code>, <code>--log-file-max-size</code>, <code>--logtostderr</code>, <code>--skip_headers</code>, <code>--skip_log_headers</code>, <code>--stderrthreshold</code>, <code>--log-flush-frequency</code>.<br/>Non-default choices are currently alpha and subject to change without warning. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
<td colspan="2">--logtostderr&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>true</code></td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">log to standard error instead of files. (DEPRECATED: will be removed in a future release, see <a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components">here</a>.)</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Sets the log format. Permitted formats: <code>text</code>, <code>json</code> (gated by <code>LoggingBetaOptions</code>). (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
@ -805,7 +763,7 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
</tr>
<tr>
<td colspan="2">--node-labels mapStringString</td>
<td colspan="2">--node-labels &lt;key=value pairs&gt;</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">&lt;Warning: Alpha feature&gt;Labels to add when registering the node in the cluster. Labels must be <code>key=value pairs</code> separated by <code>','</code>. Labels in the <code>'kubernetes.io'</code> namespace must begin with an allowed prefix (<code>'kubelet.kubernetes.io'</code>, <code>'node.kubernetes.io'</code>) or be in the specifically allowed set (<code>'beta.kubernetes.io/arch'</code>, <code>'beta.kubernetes.io/instance-type'</code>, <code>'beta.kubernetes.io/os'</code>, <code>'failure-domain.beta.kubernetes.io/region'</code>, <code>'failure-domain.beta.kubernetes.io/zone'</code>, <code>'kubernetes.io/arch'</code>, <code>'kubernetes.io/hostname'</code>, <code>'kubernetes.io/os'</code>, <code>'node.kubernetes.io/instance-type'</code>, <code>'topology.kubernetes.io/region'</code>, <code>'topology.kubernetes.io/zone'</code>)</td>
@ -825,13 +783,6 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Specifies how often kubelet posts node status to master. Note: be cautious when changing the constant, it must work with <code>nodeMonitorGracePeriod</code> in Node controller. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
<td colspan="2">--one-output</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">If true, only write logs to their native severity level (vs also writing to each lower severity level). (DEPRECATED: will be removed in a future release, see <a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components">here</a>.)</td>
</tr>
<tr>
<td colspan="2">--oom-score-adj int32&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: -999</td>
</tr>
@ -847,10 +798,10 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
</tr>
<tr>
<td colspan="2">--pod-infra-container-image string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>registry.k8s.io/pause:3.6</code></td>
<td colspan="2">--pod-infra-container-image string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>registry.k8s.io/pause:3.9</code></td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Specified image will not be pruned by the image garbage collector. When container-runtime is set to <code>docker</code>, all containers in each pod will use the network/IPC namespaces from this image. Other CRI implementations have their own configuration to set this image.</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Specified image will not be pruned by the image garbage collector. CRI implementations have their own configuration to set this image. (DEPRECATED: will be removed in 1.27. Image garbage collector will get sandbox image information from CRI.)</td>
</tr>
<tr>
@ -885,7 +836,7 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
<td colspan="2">--protect-kernel-defaults</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;"> Default kubelet behaviour for kernel tuning. If set, kubelet errors if any of kernel tunables is different than kubelet defaults. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Default kubelet behaviour for kernel tuning. If set, kubelet errors if any of kernel tunables is different than kubelet defaults. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
@ -896,7 +847,7 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
</tr>
<tr>
<td colspan="2">--qos-reserved mapStringString</td>
<td colspan="2">--qos-reserved string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">&lt;Warning: Alpha feature&gt; A set of <code>&lt;resource name&gt;=&lt;percentage&gt;</code> (e.g. <code>memory=50%</code>) pairs that describe how pod resource requests are reserved at the QoS level. Currently only <code>memory</code> is supported. Requires the <code>QOSReserved</code> feature gate to be enabled. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
@ -913,7 +864,7 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
<td colspan="2">--register-node&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>true</code></td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Register the node with the API server. If <code>--kubeconfig</code> is not provided, this flag is irrelevant, as the Kubelet won't have an API server to register with. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Register the node with the API server. If <code>--kubeconfig</code> is not provided, this flag is irrelevant, as the Kubelet won't have an API server to register with. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
@ -924,10 +875,10 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
</tr>
<tr>
<td colspan="2">--register-with-taints mapStringString</td>
<td colspan="2">--register-with-taints string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Register the node with the given list of taints (comma separated <code>&lt;key&gt;=&lt;value&gt;:&lt;effect&gt;</code>). No-op if <code>--register-node</code> is <code>false</code>. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Register the node with the given list of taints (comma separated <code>&lt;key&gt;=&lt;value&gt;:&lt;effect&gt;</code>). No-op if <code>--register-node</code> is <code>false</code>. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
@ -976,21 +927,21 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
<td colspan="2">--rotate-certificates</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">&lt;Warning: Beta feature&gt; Auto rotate the kubelet client certificates by requesting new certificates from the <code>kube-apiserver</code> when the certificate expiration approaches. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Auto rotate the kubelet client certificates by requesting new certificates from the <code>kube-apiserver</code> when the certificate expiration approaches. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
<td colspan="2">--rotate-server-certificates</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Auto-request and rotate the kubelet serving certificates by requesting new certificates from the <code>kube-apiserver</code> when the certificate expiration approaches. Requires the <code>RotateKubeletServerCertificate</code> feature gate to be enabled, and approval of the submitted <code>CertificateSigningRequest</code> objects. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">&lt;Warning: Beta feature&gt; Auto-request and rotate the kubelet serving certificates by requesting new certificates from the <code>kube-apiserver</code> when the certificate expiration approaches. Requires the <code>RotateKubeletServerCertificate</code> feature gate to be enabled, and approval of the submitted <code>CertificateSigningRequest</code> objects. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
<td colspan="2">--runonce</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">If <code>true</code>, exit after spawning pods from local manifests or remote urls. Exclusive with <code>--enable-server</code> (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">If <code>true</code>, exit after spawning pods from local manifests or remote urls. Exclusive with <code>--enable-server</code> (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
@ -1011,7 +962,7 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
<td colspan="2">--seccomp-default string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">&lt;Warning: Alpha feature&gt; Enable the use of <code>RuntimeDefault</code> as the default seccomp profile for all workloads. The <code>SeccompDefault</code> feature gate must be enabled to allow this flag, which is disabled by default.</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">&lt;Warning: Beta feature&gt; Enable the use of <code>RuntimeDefault</code> as the default seccomp profile for all workloads. The <code>SeccompDefault</code> feature gate must be enabled to allow this flag.</td>
</tr>
<tr>
@ -1021,27 +972,6 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Pull images one at a time. We recommend *not* changing the default value on nodes that run docker daemon with version &lt; 1.9 or an <code>aufs</code> storage backend. Issue #10959 has more details. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
<td colspan="2">--skip-headers</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">If <code>true</code>, avoid header prefixes in the log messages. (DEPRECATED: will be removed in a future release, see <a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components">here</a>.)</td>
</tr>
<tr>
<td colspan="2">--skip-log-headers</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">If <code>true</code>, avoid headers when opening log files. (DEPRECATED: will be removed in a future release, see <a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components">here</a>.)</td>
</tr>
<tr>
<td colspan="2">--stderrthreshold int&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: 2</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">logs at or above this threshold go to stderr. (DEPRECATED: will be removed in a future release, see <a href="https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components">here</a>.)</td>
</tr>
<tr>
<td colspan="2">--streaming-connection-idle-timeout duration&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>4h0m0s</code></td>
</tr>
@ -1064,7 +994,7 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
</tr>
<tr>
<td colspan="2">--system-reserved mapStringString&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: &lt;none&gt;</td>
<td colspan="2">--system-reserved string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: &lt;none&gt;</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">A set of <code>&lt;resource name&gt;=&lt;resource quantity&gt;</code> (e.g. <code>cpu=200m,memory=500Mi,ephemeral-storage=1Gi,pid='100'</code>) pairs that describe resources reserved for non-kubernetes components. Currently only <code>cpu</code> and <code>memory</code> are supported. See <a href="http://kubernetes.io/docs/user-guide/compute-resources">here</a> for more detail. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
@ -1085,15 +1015,15 @@ WindowsHostProcessContainers=true|false (BETA - default=true)<br/>
</tr>
<tr>
<td colspan="2">--tls-cipher-suites strings</td>
<td colspan="2">--tls-cipher-suites string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Comma-separated list of cipher suites for the server. If omitted, the default Go cipher suites will be used.<br/>
Preferred values:
`TLS_AES_128_GCM_SHA256`, `TLS_AES_256_GCM_SHA384`, `TLS_CHACHA20_POLY1305_SHA256`, `TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA`, `TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256`, `TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA`, `TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384`, `TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305`, `TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256`, `TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA`, `TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256`, `TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA`, `TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384`, `TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305`, `TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256`, `TLS_RSA_WITH_AES_128_CBC_SHA`, `TLS_RSA_WITH_AES_128_GCM_SHA256`, `TLS_RSA_WITH_AES_256_CBC_SHA`, `TLS_RSA_WITH_AES_256_GCM_SHA384`<br/>
<code>TLS_AES_128_GCM_SHA256</code>, <code>TLS_AES_256_GCM_SHA384</code>, <code>TLS_CHACHA20_POLY1305_SHA256</code>, <code>TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA</code>, <code>TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256</code>, <code>TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA</code>, <code>TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384</code>, <code>TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305</code>, <code>TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256</code>, <code>TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA</code>, <code>TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256</code>, <code>TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA</code>, <code>TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384</code>, <code>TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305</code>, <code>TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256</code>, <code>TLS_RSA_WITH_AES_128_CBC_SHA</code>, <code>TLS_RSA_WITH_AES_128_GCM_SHA256</code>, <code>TLS_RSA_WITH_AES_256_CBC_SHA</code>, <code>TLS_RSA_WITH_AES_256_GCM_SHA384</code><br/>
Insecure values:
`TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256`, `TLS_ECDHE_ECDSA_WITH_RC4_128_SHA`, `TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA`, `TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256`, `TLS_ECDHE_RSA_WITH_RC4_128_SHA`, `TLS_RSA_WITH_3DES_EDE_CBC_SHA`, `TLS_RSA_WITH_AES_128_CBC_SHA256`, `TLS_RSA_WITH_RC4_128_SHA`.<br/>
(DEPRECATED: This parameter should be set via the config file specified by the Kubelet's `--config` flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)
<code>TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256</code>, <code>TLS_ECDHE_ECDSA_WITH_RC4_128_SHA</code>, <code>TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA</code>, <code>TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256</code>, <code>TLS_ECDHE_RSA_WITH_RC4_128_SHA</code>, <code>TLS_RSA_WITH_3DES_EDE_CBC_SHA</code>, <code>TLS_RSA_WITH_AES_128_CBC_SHA256</code>, <code>TLS_RSA_WITH_RC4_128_SHA</code>.<br/>
(DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)
</tr>
<tr>
@ -1118,11 +1048,18 @@ Insecure values:
<td></td><td style="line-height: 130%; word-wrap: break-word;">Topology Manager policy to use. Possible values: <code>'none'</code>, <code>'best-effort'</code>, <code>'restricted'</code>, <code>'single-numa-node'</code>. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
<td colspan="2">--topology-manager-policy-options string</td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">A set of key=value Topology Manager policy options to use, to fine tune their behaviour. If not supplied, keep the default behaviour. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>
<td colspan="2">--topology-manager-scope string&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Default: <code>container</code></td>
</tr>
<tr>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Scope to which topology hints applied. Topology Manager collects hints from Hint Providers and applies them to defined scope to ensure the pod admission. Possible values: <code>'container'</code>, <code>'pod'</code>. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
<td></td><td style="line-height: 130%; word-wrap: break-word;">Scope to which topology hints are applied. Topology Manager collects hints from Hint Providers and applies them to the defined scope to ensure the pod admission. Possible values: <code>'container'</code>, <code>'pod'</code>. (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's <code>--config</code> flag. See <a href="https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/">kubelet-config-file</a> for more information.)</td>
</tr>
<tr>

View File

@ -0,0 +1,23 @@
---
title: Feature gate
id: feature-gate
date: 2023-01-12
full_link: /docs/reference/command-line-tools-reference/feature-gates/
short_description: >
A way to control whether or not a particular Kubernetes feature is enabled.
aka:
tags:
- fundamental
- operation
---
Feature gates are a set of keys (opaque string values) that you can use to control which
Kubernetes features are enabled in your cluster.
<!--more-->
You can turn these features on or off using the `--feature-gates` command line flag on each Kubernetes component.
Each Kubernetes component lets you enable or disable a set of feature gates that are relevant to that component.
The Kubernetes documentation lists all current
[feature gates](/docs/reference/command-line-tools-reference/feature-gates/) and what they control.

View File

@ -0,0 +1,20 @@
---
title: JSON Web Token (JWT)
id: jwt
date: 2023-01-17
full_link: https://www.rfc-editor.org/rfc/rfc7519
short_description: >
A means of representing claims to be transferred between two parties.
aka:
tags:
- security
- architecture
---
A means of representing claims to be transferred between two parties.
<!--more-->
JWTs can be digitally signed and encrypted. Kubernetes uses JWTs as
authentication tokens to verify the identity of entities that want to perform
actions in a cluster.

View File

@ -2,7 +2,7 @@
title: QoS Class
id: qos-class
date: 2019-04-15
full_link:
full_link: /docs/concepts/workloads/pods/pod-qos/
short_description: >
QoS Class (Quality of Service Class) provides a way for Kubernetes to classify pods within the cluster into several classes and make decisions about scheduling and eviction.

View File

@ -171,6 +171,16 @@ There are two possible values:
- `onstart`: The APIService should be reconciled when an API server starts up, but not otherwise.
- `true`: The API server should reconcile this APIService continuously.
### service.alpha.kubernetes.io/tolerate-unready-endpoints (deprecated)
Used on: StatefulSet
This annotation on a Service denotes if the Endpoints controller should go ahead and create Endpoints for unready Pods.
Endpoints of these Services retain their DNS records and continue receiving
traffic for the Service from the moment the kubelet starts all containers in the pod
and marks it _Running_, til the kubelet stops all containers and deletes the pod from
the API server.
### kubernetes.io/hostname {#kubernetesiohostname}
Example: `kubernetes.io/hostname: "ip-172-20-114-199.ec2.internal"`
@ -310,6 +320,50 @@ See [topology.kubernetes.io/zone](#topologykubernetesiozone).
{{< note >}} Starting in v1.17, this label is deprecated in favor of [topology.kubernetes.io/zone](#topologykubernetesiozone). {{< /note >}}
### pv.kubernetes.io/bind-completed {#pv-kubernetesiobind-completed}
Example: `pv.kubernetes.io/bind-completed: "yes"`
Used on: PersistentVolumeClaim
When this annotation is set on a PersistentVolumeClaim (PVC), that indicates that the lifecycle
of the PVC has passed through initial binding setup. When present, that information changes
how the control plane interprets the state of PVC objects.
The value of this annotation does not matter to Kubernetes.
### pv.kubernetes.io/bound-by-controller {#pv-kubernetesioboundby-controller}
Example: `pv.kubernetes.io/bound-by-controller: "yes"`
Used on: PersistentVolume, PersistentVolumeClaim
If this annotation is set on a PersistentVolume or PersistentVolumeClaim, it indicates that a storage binding
(PersistentVolume → PersistentVolumeClaim, or PersistentVolumeClaim → PersistentVolume) was installed
by the {{< glossary_tooltip text="controller" term_id="controller" >}}.
If the annotation isn't set, and there is a storage binding in place, the absence of that annotation means that
the binding was done manually. The value of this annotation does not matter.
### pv.kubernetes.io/provisioned-by {#pv-kubernetesiodynamically-provisioned}
Example: `pv.kubernetes.io/provisioned-by: "kubernetes.io/rbd"`
Used on: PersistentVolume
This annotation is added to a PersistentVolume(PV) that has been dynamically provisioned by Kubernetes.
Its value is the name of volume plugin that created the volume. It serves both user (to show where a PV
comes from) and Kubernetes (to recognize dynamically provisioned PVs in its decisions).
### pv.kubernetes.io/migrated-to {#pv-kubernetesio-migratedto}
Example: `pv.kubernetes.io/migrated-to: pd.csi.storage.gke.io`
Used on: PersistentVolume, PersistentVolumeClaim
It is added to a PersistentVolume(PV) and PersistentVolumeClaim(PVC) that is supposed to be
dynamically provisioned/deleted by its corresponding CSI driver through the `CSIMigration` feature gate.
When this annotation is set, the Kubernetes components will "stand-down" and the `external-provisioner`
will act on the objects.
### statefulset.kubernetes.io/pod-name {#statefulsetkubernetesiopod-name}
Example:
@ -393,6 +447,12 @@ Used on: PersistentVolumeClaim
This annotation will be added to dynamic provisioning required PVC.
### volume.kubernetes.io/selected-node
Used on: PersistentVolumeClaim
This annotation is added to a PVC that is triggered by a scheduler to be dynamically provisioned. Its value is the name of the selected node.
### volumes.kubernetes.io/controller-managed-attach-detach
Used on: Node
@ -784,9 +844,9 @@ you through the steps you follow to apply a seccomp profile to a Pod or to one o
its containers. That tutorial covers the supported mechanism for configuring seccomp in Kubernetes,
based on setting `securityContext` within the Pod's `.spec`.
### snapshot.storage.kubernetes.io/allowVolumeModeChange
### snapshot.storage.kubernetes.io/allow-volume-mode-change
Example: `snapshot.storage.kubernetes.io/allowVolumeModeChange: "true"`
Example: `snapshot.storage.kubernetes.io/allow-volume-mode-change: "true"`
Used on: VolumeSnapshotContent

View File

@ -6,7 +6,8 @@ weight: 50
<!-- overview -->
Every {{< glossary_tooltip term_id="node" text="node" >}} in a Kubernetes
cluster runs a [kube-proxy](/docs/reference/command-line-tools-reference/kube-proxy/)
{{< glossary_tooltip term_id="cluster" text="cluster" >}} runs a
[kube-proxy](/docs/reference/command-line-tools-reference/kube-proxy/)
(unless you have deployed your own alternative component in place of `kube-proxy`).
The `kube-proxy` component is responsible for implementing a _virtual IP_
@ -39,8 +40,10 @@ network proxying service on a computer. Although the `kube-proxy` executable su
to use as-is.
<a id="example"></a>
Some of the details in this reference refer to an example: the back end Pods for a stateless
image-processing workload, running with three replicas. Those replicas are
Some of the details in this reference refer to an example: the backend
{{< glossary_tooltip term_id="pod" text="Pods" >}} for a stateless
image-processing workloads, running with
three replicas. Those replicas are
fungible&mdash;frontends do not care which backend they use. While the actual Pods that
compose the backend set may change, the frontend clients should not need to be aware of that,
nor should they need to keep track of the set of backends themselves.
@ -61,8 +64,10 @@ Note that the kube-proxy starts up in different modes, which are determined by i
### `iptables` proxy mode {#proxy-mode-iptables}
In this mode, kube-proxy watches the Kubernetes control plane for the addition and
removal of Service and EndpointSlice objects. For each Service, it installs
In this mode, kube-proxy watches the Kubernetes
{{< glossary_tooltip term_id="control-plane" text="control plane" >}} for the addition and
removal of Service and EndpointSlice {{< glossary_tooltip term_id="object" text="objects." >}}
For each Service, it installs
iptables rules, which capture traffic to the Service's `clusterIP` and `port`,
and redirect that traffic to one of the Service's
backend sets. For each endpoint, it installs iptables rules which
@ -84,7 +89,7 @@ to verify that backend Pods are working OK, so that kube-proxy in iptables mode
only sees backends that test out as healthy. Doing this means you avoid
having traffic sent via kube-proxy to a Pod that's known to have failed.
{{< figure src="/images/docs/services-iptables-overview.svg" title="Services overview diagram for iptables proxy" class="diagram-medium" >}}
{{< figure src="/images/docs/services-iptables-overview.svg" title="Virtual IP mechanism for Services, using iptables mode" class="diagram-medium" >}}
#### Example {#packet-processing-iptables}
@ -134,11 +139,13 @@ attempts to resynchronize iptables rules with the kernel. If it is
every time any Service or Endpoint changes. This works fine in very
small clusters, but it results in a lot of redundant work when lots of
things change in a small time period. For example, if you have a
Service backed by a Deployment with 100 pods, and you delete the
Service backed by a {{< glossary_tooltip term_id="deployment" text="Deployment" >}}
with 100 pods, and you delete the
Deployment, then with `minSyncPeriod: 0s`, kube-proxy would end up
removing the Service's Endpoints from the iptables rules one by one,
for a total of 100 updates. With a larger `minSyncPeriod`, multiple
Pod deletion events would get aggregated together, so kube-proxy might
Pod deletion events would get aggregated
together, so kube-proxy might
instead end up making, say, 5 updates, each removing 20 endpoints,
which will be much more efficient in terms of CPU, and result in the
full set of changes being synchronized faster.
@ -182,7 +189,8 @@ enable the `MinimizeIPTablesRestore` [feature
gate](/docs/reference/command-line-tools-reference/feature-gates/) for
kube-proxy with `--feature-gates=MinimizeIPTablesRestore=true,…`.
If you enable that feature gate and you were previously overriding
If you enable that feature gate and
you were previously overriding
`minSyncPeriod`, you should try removing that override and letting
kube-proxy use the default value (`1s`) or at least a smaller value
than you were using before.
@ -229,7 +237,7 @@ kernel modules are available. If the IPVS kernel modules are not detected, then
falls back to running in iptables proxy mode.
{{< /note >}}
{{< figure src="/images/docs/services-ipvs-overview.svg" title="Services overview diagram for IPVS proxy" class="diagram-medium" >}}
{{< figure src="/images/docs/services-ipvs-overview.svg" title="Virtual IP address mechanism for Services, using IPVS mode" class="diagram-medium" >}}
## Session affinity
@ -274,7 +282,7 @@ someone else's choice. That is an isolation failure.
In order to allow you to choose a port number for your Services, we must
ensure that no two Services can collide. Kubernetes does that by allocating each
Service its own IP address from within the `service-cluster-ip-range`
CIDR range that is configured for the API server.
CIDR range that is configured for the {{< glossary_tooltip term_id="kube-apiserver" text="API Server" >}}.
To ensure each Service receives a unique IP, an internal allocator atomically
updates a global allocation map in {{< glossary_tooltip term_id="etcd" >}}
@ -353,7 +361,8 @@ N to 0 replicas of that deployment. In some cases, external load balancers can s
a node with 0 replicas in between health check probes. Routing traffic to terminating endpoints
ensures that Node's that are scaling down Pods can gracefully receive and drain traffic to
those terminating Pods. By the time the Pod completes termination, the external load balancer
should have seen the node's health check failing and fully removed the node from the backend pool.
should have seen the node's health check failing and fully removed the node from the backend
pool.
## {{% heading "whatsnext" %}}

View File

@ -77,7 +77,7 @@ their authors, not the Kubernetes team.
| Ruby | [github.com/abonas/kubeclient](https://github.com/abonas/kubeclient) |
| Ruby | [github.com/k8s-ruby/k8s-ruby](https://github.com/k8s-ruby/k8s-ruby) |
| Ruby | [github.com/kontena/k8s-client](https://github.com/kontena/k8s-client) |
| Rust | [github.com/clux/kube-rs](https://github.com/clux/kube-rs) |
| Rust | [github.com/kube-rs/kube](https://github.com/kube-rs/kube) |
| Rust | [github.com/ynqa/kubernetes-rust](https://github.com/ynqa/kubernetes-rust) |
| Scala | [github.com/hagay3/skuber](https://github.com/hagay3/skuber) |
| Scala | [github.com/hnaderi/scala-k8s](https://github.com/hnaderi/scala-k8s) |

View File

@ -366,12 +366,26 @@ There are two solutions:
First, the user defines a new configuration containing only the `replicas` field:
{{< codenew file="application/ssa/nginx-deployment-replicas-only.yaml" >}}
```yaml
# Save this file as 'nginx-deployment-replicas-only.yaml'.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 3
```
{{< note >}}
The YAML file for SSA in this case only contains the fields you want to change.
You are not supposed to provide a fully compliant Deployment manifest if you only
want to modify the `spec.replicas` field using SSA.
{{< /note >}}
The user applies that configuration using the field manager name `handover-to-hpa`:
```shell
kubectl apply -f https://k8s.io/examples/application/ssa/nginx-deployment-replicas-only.yaml \
kubectl apply -f nginx-deployment-replicas-only.yaml \
--server-side --field-manager=handover-to-hpa \
--validate=false
```

View File

@ -9,13 +9,13 @@ weight: 10
A cluster is a set of {{< glossary_tooltip text="nodes" term_id="node" >}} (physical
or virtual machines) running Kubernetes agents, managed by the
{{< glossary_tooltip text="control plane" term_id="control-plane" >}}.
Kubernetes {{< param "version" >}} supports clusters with up to 5000 nodes. More specifically,
Kubernetes {{< param "version" >}} supports clusters with up to 5,000 nodes. More specifically,
Kubernetes is designed to accommodate configurations that meet *all* of the following criteria:
* No more than 110 pods per node
* No more than 5000 nodes
* No more than 150000 total pods
* No more than 300000 total containers
* No more than 5,000 nodes
* No more than 150,000 total pods
* No more than 300,000 total containers
You can scale your cluster by adding or removing nodes. The way you do this depends
on how your cluster is deployed.

View File

@ -26,15 +26,15 @@ etcd cluster of three members that can be used by kubeadm during cluster creatio
## {{% heading "prerequisites" %}}
* Three hosts that can talk to each other over TCP ports 2379 and 2380. This
- Three hosts that can talk to each other over TCP ports 2379 and 2380. This
document assumes these default ports. However, they are configurable through
the kubeadm config file.
* Each host must have systemd and a bash compatible shell installed.
* Each host must [have a container runtime, kubelet, and kubeadm installed](/docs/setup/production-environment/tools/kubeadm/install-kubeadm/).
* Each host should have access to the Kubernetes container image registry (`registry.k8s.io`) or list/pull the required etcd image using
`kubeadm config images list/pull`. This guide will set up etcd instances as
[static pods](/docs/tasks/configure-pod-container/static-pod/) managed by a kubelet.
* Some infrastructure to copy files between hosts. For example `ssh` and `scp`
- Each host must have systemd and a bash compatible shell installed.
- Each host must [have a container runtime, kubelet, and kubeadm installed](/docs/setup/production-environment/tools/kubeadm/install-kubeadm/).
- Each host should have access to the Kubernetes container image registry (`registry.k8s.io`) or list/pull the required etcd image using
`kubeadm config images list/pull`. This guide will set up etcd instances as
[static pods](/docs/tasks/configure-pod-container/static-pod/) managed by a kubelet.
- Some infrastructure to copy files between hosts. For example `ssh` and `scp`
can satisfy this requirement.
<!-- steps -->
@ -42,7 +42,7 @@ etcd cluster of three members that can be used by kubeadm during cluster creatio
## Setting up the cluster
The general approach is to generate all certs on one node and only distribute
the *necessary* files to the other nodes.
the _necessary_ files to the other nodes.
{{< note >}}
kubeadm contains all the necessary cryptographic machinery to generate
@ -59,242 +59,241 @@ on Kubernetes dual-stack support see [Dual-stack support with kubeadm](/docs/set
1. Configure the kubelet to be a service manager for etcd.
{{< note >}}You must do this on every host where etcd should be running.{{< /note >}}
Since etcd was created first, you must override the service priority by creating a new unit file
that has higher precedence than the kubeadm-provided kubelet unit file.
Since etcd was created first, you must override the service priority by creating a new unit file
that has higher precedence than the kubeadm-provided kubelet unit file.
```sh
cat << EOF > /etc/systemd/system/kubelet.service.d/20-etcd-service-manager.conf
[Service]
ExecStart=
# Replace "systemd" with the cgroup driver of your container runtime. The default value in the kubelet is "cgroupfs".
# Replace the value of "--container-runtime-endpoint" for a different container runtime if needed.
ExecStart=/usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd --container-runtime=remote --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock
Restart=always
EOF
```sh
cat << EOF > /etc/systemd/system/kubelet.service.d/20-etcd-service-manager.conf
[Service]
ExecStart=
# Replace "systemd" with the cgroup driver of your container runtime. The default value in the kubelet is "cgroupfs".
# Replace the value of "--container-runtime-endpoint" for a different container runtime if needed.
ExecStart=/usr/bin/kubelet --address=127.0.0.1 --pod-manifest-path=/etc/kubernetes/manifests --cgroup-driver=systemd --container-runtime=remote --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock
Restart=always
EOF
systemctl daemon-reload
systemctl restart kubelet
```
systemctl daemon-reload
systemctl restart kubelet
```
Check the kubelet status to ensure it is running.
Check the kubelet status to ensure it is running.
```sh
systemctl status kubelet
```
```sh
systemctl status kubelet
```
1. Create configuration files for kubeadm.
Generate one kubeadm configuration file for each host that will have an etcd
member running on it using the following script.
Generate one kubeadm configuration file for each host that will have an etcd
member running on it using the following script.
```sh
# Update HOST0, HOST1 and HOST2 with the IPs of your hosts
export HOST0=10.0.0.6
export HOST1=10.0.0.7
export HOST2=10.0.0.8
```sh
# Update HOST0, HOST1 and HOST2 with the IPs of your hosts
export HOST0=10.0.0.6
export HOST1=10.0.0.7
export HOST2=10.0.0.8
# Update NAME0, NAME1 and NAME2 with the hostnames of your hosts
export NAME0="infra0"
export NAME1="infra1"
export NAME2="infra2"
# Update NAME0, NAME1 and NAME2 with the hostnames of your hosts
export NAME0="infra0"
export NAME1="infra1"
export NAME2="infra2"
# Create temp directories to store files that will end up on other hosts
mkdir -p /tmp/${HOST0}/ /tmp/${HOST1}/ /tmp/${HOST2}/
# Create temp directories to store files that will end up on other hosts
mkdir -p /tmp/${HOST0}/ /tmp/${HOST1}/ /tmp/${HOST2}/
HOSTS=(${HOST0} ${HOST1} ${HOST2})
NAMES=(${NAME0} ${NAME1} ${NAME2})
HOSTS=(${HOST0} ${HOST1} ${HOST2})
NAMES=(${NAME0} ${NAME1} ${NAME2})
for i in "${!HOSTS[@]}"; do
HOST=${HOSTS[$i]}
NAME=${NAMES[$i]}
cat << EOF > /tmp/${HOST}/kubeadmcfg.yaml
---
apiVersion: "kubeadm.k8s.io/v1beta3"
kind: InitConfiguration
nodeRegistration:
name: ${NAME}
localAPIEndpoint:
advertiseAddress: ${HOST}
---
apiVersion: "kubeadm.k8s.io/v1beta3"
kind: ClusterConfiguration
etcd:
local:
serverCertSANs:
- "${HOST}"
peerCertSANs:
- "${HOST}"
extraArgs:
initial-cluster: ${NAMES[0]}=https://${HOSTS[0]}:2380,${NAMES[1]}=https://${HOSTS[1]}:2380,${NAMES[2]}=https://${HOSTS[2]}:2380
initial-cluster-state: new
name: ${NAME}
listen-peer-urls: https://${HOST}:2380
listen-client-urls: https://${HOST}:2379
advertise-client-urls: https://${HOST}:2379
initial-advertise-peer-urls: https://${HOST}:2380
EOF
done
```
for i in "${!HOSTS[@]}"; do
HOST=${HOSTS[$i]}
NAME=${NAMES[$i]}
cat << EOF > /tmp/${HOST}/kubeadmcfg.yaml
---
apiVersion: "kubeadm.k8s.io/v1beta3"
kind: InitConfiguration
nodeRegistration:
name: ${NAME}
localAPIEndpoint:
advertiseAddress: ${HOST}
---
apiVersion: "kubeadm.k8s.io/v1beta3"
kind: ClusterConfiguration
etcd:
local:
serverCertSANs:
- "${HOST}"
peerCertSANs:
- "${HOST}"
extraArgs:
initial-cluster: ${NAMES[0]}=https://${HOSTS[0]}:2380,${NAMES[1]}=https://${HOSTS[1]}:2380,${NAMES[2]}=https://${HOSTS[2]}:2380
initial-cluster-state: new
name: ${NAME}
listen-peer-urls: https://${HOST}:2380
listen-client-urls: https://${HOST}:2379
advertise-client-urls: https://${HOST}:2379
initial-advertise-peer-urls: https://${HOST}:2380
EOF
done
```
1. Generate the certificate authority.
If you already have a CA then the only action that is copying the CA's `crt` and
`key` file to `/etc/kubernetes/pki/etcd/ca.crt` and
`/etc/kubernetes/pki/etcd/ca.key`. After those files have been copied,
proceed to the next step, "Create certificates for each member".
If you already have a CA then the only action that is copying the CA's `crt` and
`key` file to `/etc/kubernetes/pki/etcd/ca.crt` and
`/etc/kubernetes/pki/etcd/ca.key`. After those files have been copied,
proceed to the next step, "Create certificates for each member".
If you do not already have a CA then run this command on `$HOST0` (where you
generated the configuration files for kubeadm).
If you do not already have a CA then run this command on `$HOST0` (where you
generated the configuration files for kubeadm).
```
kubeadm init phase certs etcd-ca
```
```
kubeadm init phase certs etcd-ca
```
This creates two files:
This creates two files:
- `/etc/kubernetes/pki/etcd/ca.crt`
- `/etc/kubernetes/pki/etcd/ca.key`
- `/etc/kubernetes/pki/etcd/ca.crt`
- `/etc/kubernetes/pki/etcd/ca.key`
1. Create certificates for each member.
```sh
kubeadm init phase certs etcd-server --config=/tmp/${HOST2}/kubeadmcfg.yaml
kubeadm init phase certs etcd-peer --config=/tmp/${HOST2}/kubeadmcfg.yaml
kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST2}/kubeadmcfg.yaml
kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST2}/kubeadmcfg.yaml
cp -R /etc/kubernetes/pki /tmp/${HOST2}/
# cleanup non-reusable certificates
find /etc/kubernetes/pki -not -name ca.crt -not -name ca.key -type f -delete
```sh
kubeadm init phase certs etcd-server --config=/tmp/${HOST2}/kubeadmcfg.yaml
kubeadm init phase certs etcd-peer --config=/tmp/${HOST2}/kubeadmcfg.yaml
kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST2}/kubeadmcfg.yaml
kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST2}/kubeadmcfg.yaml
cp -R /etc/kubernetes/pki /tmp/${HOST2}/
# cleanup non-reusable certificates
find /etc/kubernetes/pki -not -name ca.crt -not -name ca.key -type f -delete
kubeadm init phase certs etcd-server --config=/tmp/${HOST1}/kubeadmcfg.yaml
kubeadm init phase certs etcd-peer --config=/tmp/${HOST1}/kubeadmcfg.yaml
kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST1}/kubeadmcfg.yaml
kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST1}/kubeadmcfg.yaml
cp -R /etc/kubernetes/pki /tmp/${HOST1}/
find /etc/kubernetes/pki -not -name ca.crt -not -name ca.key -type f -delete
kubeadm init phase certs etcd-server --config=/tmp/${HOST1}/kubeadmcfg.yaml
kubeadm init phase certs etcd-peer --config=/tmp/${HOST1}/kubeadmcfg.yaml
kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST1}/kubeadmcfg.yaml
kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST1}/kubeadmcfg.yaml
cp -R /etc/kubernetes/pki /tmp/${HOST1}/
find /etc/kubernetes/pki -not -name ca.crt -not -name ca.key -type f -delete
kubeadm init phase certs etcd-server --config=/tmp/${HOST0}/kubeadmcfg.yaml
kubeadm init phase certs etcd-peer --config=/tmp/${HOST0}/kubeadmcfg.yaml
kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST0}/kubeadmcfg.yaml
kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST0}/kubeadmcfg.yaml
# No need to move the certs because they are for HOST0
kubeadm init phase certs etcd-server --config=/tmp/${HOST0}/kubeadmcfg.yaml
kubeadm init phase certs etcd-peer --config=/tmp/${HOST0}/kubeadmcfg.yaml
kubeadm init phase certs etcd-healthcheck-client --config=/tmp/${HOST0}/kubeadmcfg.yaml
kubeadm init phase certs apiserver-etcd-client --config=/tmp/${HOST0}/kubeadmcfg.yaml
# No need to move the certs because they are for HOST0
# clean up certs that should not be copied off this host
find /tmp/${HOST2} -name ca.key -type f -delete
find /tmp/${HOST1} -name ca.key -type f -delete
```
# clean up certs that should not be copied off this host
find /tmp/${HOST2} -name ca.key -type f -delete
find /tmp/${HOST1} -name ca.key -type f -delete
```
1. Copy certificates and kubeadm configs.
The certificates have been generated and now they must be moved to their
respective hosts.
The certificates have been generated and now they must be moved to their
respective hosts.
```sh
USER=ubuntu
HOST=${HOST1}
scp -r /tmp/${HOST}/* ${USER}@${HOST}:
ssh ${USER}@${HOST}
USER@HOST $ sudo -Es
root@HOST $ chown -R root:root pki
root@HOST $ mv pki /etc/kubernetes/
```
```sh
USER=ubuntu
HOST=${HOST1}
scp -r /tmp/${HOST}/* ${USER}@${HOST}:
ssh ${USER}@${HOST}
USER@HOST $ sudo -Es
root@HOST $ chown -R root:root pki
root@HOST $ mv pki /etc/kubernetes/
```
1. Ensure all expected files exist.
The complete list of required files on `$HOST0` is:
The complete list of required files on `$HOST0` is:
```
/tmp/${HOST0}
└── kubeadmcfg.yaml
---
/etc/kubernetes/pki
├── apiserver-etcd-client.crt
├── apiserver-etcd-client.key
└── etcd
├── ca.crt
├── ca.key
├── healthcheck-client.crt
├── healthcheck-client.key
├── peer.crt
├── peer.key
├── server.crt
└── server.key
```
```
/tmp/${HOST0}
└── kubeadmcfg.yaml
---
/etc/kubernetes/pki
├── apiserver-etcd-client.crt
├── apiserver-etcd-client.key
└── etcd
├── ca.crt
├── ca.key
├── healthcheck-client.crt
├── healthcheck-client.key
├── peer.crt
├── peer.key
├── server.crt
└── server.key
```
On `$HOST1`:
On `$HOST1`:
```
$HOME
└── kubeadmcfg.yaml
---
/etc/kubernetes/pki
├── apiserver-etcd-client.crt
├── apiserver-etcd-client.key
└── etcd
├── ca.crt
├── healthcheck-client.crt
├── healthcheck-client.key
├── peer.crt
├── peer.key
├── server.crt
└── server.key
```
```
$HOME
└── kubeadmcfg.yaml
---
/etc/kubernetes/pki
├── apiserver-etcd-client.crt
├── apiserver-etcd-client.key
└── etcd
├── ca.crt
├── healthcheck-client.crt
├── healthcheck-client.key
├── peer.crt
├── peer.key
├── server.crt
└── server.key
```
On `$HOST2`:
On `$HOST2`:
```
$HOME
└── kubeadmcfg.yaml
---
/etc/kubernetes/pki
├── apiserver-etcd-client.crt
├── apiserver-etcd-client.key
└── etcd
├── ca.crt
├── healthcheck-client.crt
├── healthcheck-client.key
├── peer.crt
├── peer.key
├── server.crt
└── server.key
```
```
$HOME
└── kubeadmcfg.yaml
---
/etc/kubernetes/pki
├── apiserver-etcd-client.crt
├── apiserver-etcd-client.key
└── etcd
├── ca.crt
├── healthcheck-client.crt
├── healthcheck-client.key
├── peer.crt
├── peer.key
├── server.crt
└── server.key
```
1. Create the static pod manifests.
Now that the certificates and configs are in place it's time to create the
manifests. On each host run the `kubeadm` command to generate a static manifest
for etcd.
Now that the certificates and configs are in place it's time to create the
manifests. On each host run the `kubeadm` command to generate a static manifest
for etcd.
```sh
root@HOST0 $ kubeadm init phase etcd local --config=/tmp/${HOST0}/kubeadmcfg.yaml
root@HOST1 $ kubeadm init phase etcd local --config=$HOME/kubeadmcfg.yaml
root@HOST2 $ kubeadm init phase etcd local --config=$HOME/kubeadmcfg.yaml
```
```sh
root@HOST0 $ kubeadm init phase etcd local --config=/tmp/${HOST0}/kubeadmcfg.yaml
root@HOST1 $ kubeadm init phase etcd local --config=$HOME/kubeadmcfg.yaml
root@HOST2 $ kubeadm init phase etcd local --config=$HOME/kubeadmcfg.yaml
```
1. Optional: Check the cluster health.
If `etcdctl` isn't available, you can run this tool inside a container image.
You would do that directly with your container runtime using a tool such as
`crictl run` and not through Kubernetes
```sh
docker run --rm -it \
--net host \
-v /etc/kubernetes:/etc/kubernetes registry.k8s.io/etcd:${ETCD_TAG} etcdctl \
ETCDCTL_API=3 etcdctl \
--cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--endpoints https://${HOST0}:2379 endpoint health --cluster
--endpoints https://${HOST0}:2379 endpoint health
...
https://[HOST0 IP]:2379 is healthy: successfully committed proposal: took = 16.283339ms
https://[HOST1 IP]:2379 is healthy: successfully committed proposal: took = 19.44402ms
https://[HOST2 IP]:2379 is healthy: successfully committed proposal: took = 35.926451ms
```
- Set `${ETCD_TAG}` to the version tag of your etcd image. For example `3.4.3-0`. To see the etcd image and tag that kubeadm uses execute `kubeadm config images list --kubernetes-version ${K8S_VERSION}`, where `${K8S_VERSION}` is for example `v1.17.0`.
- Set `${HOST0}`to the IP address of the host you are testing.
## {{% heading "whatsnext" %}}
Once you have an etcd cluster with 3 working members, you can continue setting up a
highly available control plane using the
[external etcd method with kubeadm](/docs/setup/production-environment/tools/kubeadm/high-availability/).

View File

@ -7,20 +7,18 @@ min-kubernetes-server-version: 1.19
<!-- overview -->
An [Ingress](/docs/concepts/services-networking/ingress/) is an API object that defines rules which allow external access
to services in a cluster. An [Ingress controller](/docs/concepts/services-networking/ingress-controllers/) fulfills the rules set in the Ingress.
This page shows you how to set up a simple Ingress which routes requests to Service web or web2 depending on the HTTP URI.
An [Ingress](/docs/concepts/services-networking/ingress/) is an API object that defines rules
which allow external access to services in a cluster. An
[Ingress controller](/docs/concepts/services-networking/ingress-controllers/)
fulfills the rules set in the Ingress.
This page shows you how to set up a simple Ingress which routes requests to Service 'web' or
'web2' depending on the HTTP URI.
## {{% heading "prerequisites" %}}
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
If you are using an older Kubernetes version, switch to the documentation
for that version.
If you are using an older Kubernetes version, switch to the documentation for that version.
### Create a Minikube cluster
@ -37,49 +35,60 @@ Locally
1. To enable the NGINX Ingress controller, run the following command:
```shell
minikube addons enable ingress
```
```shell
minikube addons enable ingress
```
1. Verify that the NGINX Ingress controller is running
{{< tabs name="tab_with_md" >}}
{{% tab name="minikube v1.19 or later" %}}
```shell
kubectl get pods -n ingress-nginx
```
{{< note >}}It can take up to a minute before you see these pods running OK.{{< /note >}}
```shell
kubectl get pods -n ingress-nginx
```
{{< note >}}
It can take up to a minute before you see these pods running OK.
{{< /note >}}
The output is similar to:
```
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-g9g49 0/1 Completed 0 11m
ingress-nginx-admission-patch-rqp78 0/1 Completed 1 11m
ingress-nginx-controller-59b45fb494-26npt 1/1 Running 0 11m
```
```none
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-g9g49 0/1 Completed 0 11m
ingress-nginx-admission-patch-rqp78 0/1 Completed 1 11m
ingress-nginx-controller-59b45fb494-26npt 1/1 Running 0 11m
```
{{% /tab %}}
{{% tab name="minikube v1.18.1 or earlier" %}}
```shell
kubectl get pods -n kube-system
```
{{< note >}}It can take up to a minute before you see these pods running OK.{{< /note >}}
```shell
kubectl get pods -n kube-system
```
{{< note >}}
It can take up to a minute before you see these pods running OK.
{{< /note >}}
The output is similar to:
```
NAME READY STATUS RESTARTS AGE
default-http-backend-59868b7dd6-xb8tq 1/1 Running 0 1m
kube-addon-manager-minikube 1/1 Running 0 3m
kube-dns-6dcb57bcc8-n4xd4 3/3 Running 0 2m
kubernetes-dashboard-5498ccf677-b8p5h 1/1 Running 0 2m
nginx-ingress-controller-5984b97644-rnkrg 1/1 Running 0 1m
storage-provisioner 1/1 Running 0 2m
```
```none
NAME READY STATUS RESTARTS AGE
default-http-backend-59868b7dd6-xb8tq 1/1 Running 0 1m
kube-addon-manager-minikube 1/1 Running 0 3m
kube-dns-6dcb57bcc8-n4xd4 3/3 Running 0 2m
kubernetes-dashboard-5498ccf677-b8p5h 1/1 Running 0 2m
nginx-ingress-controller-5984b97644-rnkrg 1/1 Running 0 1m
storage-provisioner 1/1 Running 0 2m
```
Make sure that you see a Pod with a name that starts with `nginx-ingress-controller-`.
Make sure that you see a Pod with a name that starts with `nginx-ingress-controller-`.
{{% /tab %}}
{{< /tabs >}}
## Deploy a hello, world app
@ -92,7 +101,7 @@ storage-provisioner 1/1 Running 0 2m
The output should be:
```
```none
deployment.apps/web created
```
@ -104,19 +113,19 @@ storage-provisioner 1/1 Running 0 2m
The output should be:
```
```none
service/web exposed
```
1. Verify the Service is created and is available on a node port:
```shell
```shell
kubectl get service web
```
The output is similar to:
```
```none
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
web NodePort 10.104.133.249 <none> 8080:31637/TCP 12m
```
@ -129,26 +138,31 @@ storage-provisioner 1/1 Running 0 2m
The output is similar to:
```
```none
http://172.17.0.15:31637
```
{{< note >}}Katacoda environment only: at the top of the terminal panel, click the plus sign, and then click **Select port to view on Host 1**. Enter the NodePort, in this case `31637`, and then click **Display Port**.{{< /note >}}
{{< note >}}
Katacoda environment only: at the top of the terminal panel, click the plus sign,
and then click **Select port to view on Host 1**. Enter the NodePort value,
in this case `31637`, and then click **Display Port**.
{{< /note >}}
The output is similar to:
```
```none
Hello, world!
Version: 1.0.0
Hostname: web-55b8c6998d-8k564
```
You can now access the sample app via the Minikube IP address and NodePort. The next step lets you access
the app using the Ingress resource.
You can now access the sample application via the Minikube IP address and NodePort.
The next step lets you access the application using the Ingress resource.
## Create an Ingress
The following manifest defines an Ingress that sends traffic to your Service via hello-world.info.
The following manifest defines an Ingress that sends traffic to your Service via
`hello-world.info`.
1. Create `example-ingress.yaml` from the following file:
@ -162,7 +176,7 @@ The following manifest defines an Ingress that sends traffic to your Service via
The output should be:
```
```none
ingress.networking.k8s.io/example-ingress created
```
@ -172,11 +186,13 @@ The following manifest defines an Ingress that sends traffic to your Service via
kubectl get ingress
```
{{< note >}}This can take a couple of minutes.{{< /note >}}
{{< note >}}
This can take a couple of minutes.
{{< /note >}}
You should see an IPv4 address in the ADDRESS column; for example:
You should see an IPv4 address in the `ADDRESS` column; for example:
```
```none
NAME CLASS HOSTS ADDRESS PORTS AGE
example-ingress <none> hello-world.info 172.17.0.15 80 38s
```
@ -184,30 +200,35 @@ The following manifest defines an Ingress that sends traffic to your Service via
1. Add the following line to the bottom of the `/etc/hosts` file on
your computer (you will need administrator access):
```
```none
172.17.0.15 hello-world.info
```
{{< note >}}If you are running Minikube locally, use `minikube ip` to get the external IP. The IP address displayed within the ingress list will be the internal IP.{{< /note >}}
{{< note >}}
If you are running Minikube locally, use `minikube ip` to get the external IP.
The IP address displayed within the ingress list will be the internal IP.
{{< /note >}}
After you make this change, your web browser sends requests for
hello-world.info URLs to Minikube.
After you make this change, your web browser sends requests for
`hello-world.info` URLs to Minikube.
1. Verify that the Ingress controller is directing traffic:
```shell
curl hello-world.info
```
```shell
curl hello-world.info
```
You should see:
You should see:
```
Hello, world!
Version: 1.0.0
Hostname: web-55b8c6998d-8k564
```
```none
Hello, world!
Version: 1.0.0
Hostname: web-55b8c6998d-8k564
```
{{< note >}}If you are running Minikube locally, you can visit hello-world.info from your browser.{{< /note >}}
{{< note >}}
If you are running Minikube locally, you can visit `hello-world.info` from your browser.
{{< /note >}}
## Create a second Deployment
@ -216,9 +237,10 @@ The following manifest defines an Ingress that sends traffic to your Service via
```shell
kubectl create deployment web2 --image=gcr.io/google-samples/hello-app:2.0
```
The output should be:
```
```none
deployment.apps/web2 created
```
@ -230,7 +252,7 @@ The following manifest defines an Ingress that sends traffic to your Service via
The output should be:
```
```none
service/web2 exposed
```
@ -240,13 +262,13 @@ The following manifest defines an Ingress that sends traffic to your Service via
following lines at the end:
```yaml
- path: /v2
pathType: Prefix
backend:
service:
name: web2
port:
number: 8080
- path: /v2
pathType: Prefix
backend:
service:
name: web2
port:
number: 8080
```
1. Apply the changes:
@ -257,7 +279,7 @@ The following manifest defines an Ingress that sends traffic to your Service via
You should see:
```
```none
ingress.networking/example-ingress configured
```
@ -271,7 +293,7 @@ The following manifest defines an Ingress that sends traffic to your Service via
The output is similar to:
```
```none
Hello, world!
Version: 1.0.0
Hostname: web-55b8c6998d-8k564
@ -285,16 +307,16 @@ The following manifest defines an Ingress that sends traffic to your Service via
The output is similar to:
```
```none
Hello, world!
Version: 2.0.0
Hostname: web2-75cd47646f-t8cjk
```
{{< note >}}If you are running Minikube locally, you can visit hello-world.info and hello-world.info/v2 from your browser.{{< /note >}}
{{< note >}}
If you are running Minikube locally, you can visit `hello-world.info` and
`hello-world.info/v2` from your browser.
{{< /note >}}
## {{% heading "whatsnext" %}}

View File

@ -10,26 +10,15 @@ This page shows how to create a Kubernetes Service object that external
clients can use to access an application running in a cluster. The Service
provides load balancing for an application that has two running instances.
## {{% heading "prerequisites" %}}
{{< include "task-tutorial-prereqs.md" >}}
## {{% heading "objectives" %}}
* Run two instances of a Hello World application.
* Create a Service object that exposes a node port.
* Use the Service object to access the running application.
- Run two instances of a Hello World application.
- Create a Service object that exposes a node port.
- Use the Service object to access the running application.
<!-- lessoncontent -->
@ -41,9 +30,11 @@ Here is the configuration file for the application Deployment:
1. Run a Hello World application in your cluster:
Create the application Deployment using the file above:
```shell
kubectl apply -f https://k8s.io/examples/service/access/hello-application.yaml
```
The preceding command creates a
{{< glossary_tooltip text="Deployment" term_id="deployment" >}}
and an associated
@ -52,30 +43,35 @@ Here is the configuration file for the application Deployment:
{{< glossary_tooltip text="Pods" term_id="pod" >}}
each of which runs the Hello World application.
1. Display information about the Deployment:
```shell
kubectl get deployments hello-world
kubectl describe deployments hello-world
```
1. Display information about your ReplicaSet objects:
```shell
kubectl get replicasets
kubectl describe replicasets
```
1. Create a Service object that exposes the deployment:
```shell
kubectl expose deployment hello-world --type=NodePort --name=example-service
```
1. Display information about the Service:
```shell
kubectl describe services example-service
```
The output is similar to this:
```shell
```none
Name: example-service
Namespace: default
Labels: run=load-balancer-example
@ -90,19 +86,24 @@ Here is the configuration file for the application Deployment:
Session Affinity: None
Events: <none>
```
Make a note of the NodePort value for the service. For example,
in the preceding output, the NodePort value is 31496.
1. List the pods that are running the Hello World application:
```shell
kubectl get pods --selector="run=load-balancer-example" --output=wide
```
The output is similar to this:
```shell
```none
NAME READY STATUS ... IP NODE
hello-world-2895499144-bsbk5 1/1 Running ... 10.200.1.4 worker1
hello-world-2895499144-m1pwt 1/1 Running ... 10.200.2.5 worker2
```
1. Get the public IP address of one of your nodes that is running
a Hello World pod. How you get this address depends on how you set
up your cluster. For example, if you are using Minikube, you can
@ -117,13 +118,16 @@ Here is the configuration file for the application Deployment:
cloud providers offer different ways of configuring firewall rules.
1. Use the node address and node port to access the Hello World application:
```shell
curl http://<public-node-ip>:<node-port>
```
where `<public-node-ip>` is the public IP address of your node,
and `<node-port>` is the NodePort value for your service. The
response to a successful request is a hello message:
```shell
```none
Hello Kubernetes!
```
@ -133,12 +137,8 @@ As an alternative to using `kubectl expose`, you can use a
[service configuration file](/docs/concepts/services-networking/service/)
to create a Service.
## {{% heading "cleanup" %}}
To delete the Service, enter this command:
kubectl delete services example-service
@ -148,9 +148,6 @@ the Hello World application, enter this command:
kubectl delete deployment hello-world
## {{% heading "whatsnext" %}}
Follow the

View File

@ -100,4 +100,4 @@ release with a newer device plugin API version, device plugins must be upgraded
both version before the node is upgraded in order to guarantee that device allocations
continue to complete successfully during the upgrade.
Refer to [API compatiblity](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins.md/#api-compatibility) and [Kubelet Device Manager API Versions](/docs/reference/node/device-plugin-api-versions.md) for more details.
Refer to [API compatibility](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#api-compatibility) and [Kubelet Device Manager API Versions](/docs/reference/node/device-plugin-api-versions/) for more details.

View File

@ -4,24 +4,16 @@ content_type: task
weight: 70
---
<!-- overview -->
This page shows how to specify extended resources for a Node.
Extended resources allow cluster administrators to advertise node-level
resources that would otherwise be unknown to Kubernetes.
## {{% heading "prerequisites" %}}
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
<!-- steps -->
## Get the names of your Nodes
@ -39,7 +31,7 @@ the Kubernetes API server. For example, suppose one of your Nodes has four dongl
attached. Here's an example of a PATCH request that advertises four dongle resources
for your Node.
```shell
```
PATCH /api/v1/nodes/<your-node-name>/status HTTP/1.1
Accept: application/json
Content-Type: application/json-patch+json
@ -69,9 +61,9 @@ Replace `<your-node-name>` with the name of your Node:
```shell
curl --header "Content-Type: application/json-patch+json" \
--request PATCH \
--data '[{"op": "add", "path": "/status/capacity/example.com~1dongle", "value": "4"}]' \
http://localhost:8001/api/v1/nodes/<your-node-name>/status
--request PATCH \
--data '[{"op": "add", "path": "/status/capacity/example.com~1dongle", "value": "4"}]' \
http://localhost:8001/api/v1/nodes/<your-node-name>/status
```
{{< note >}}
@ -100,9 +92,9 @@ Once again, the output shows the dongle resource:
```yaml
Capacity:
cpu: 2
memory: 2049008Ki
example.com/dongle: 4
cpu: 2
memory: 2049008Ki
example.com/dongle: 4
```
Now, application developers can create Pods that request a certain
@ -178,9 +170,9 @@ Replace `<your-node-name>` with the name of your Node:
```shell
curl --header "Content-Type: application/json-patch+json" \
--request PATCH \
--data '[{"op": "remove", "path": "/status/capacity/example.com~1dongle"}]' \
http://localhost:8001/api/v1/nodes/<your-node-name>/status
--request PATCH \
--data '[{"op": "remove", "path": "/status/capacity/example.com~1dongle"}]' \
http://localhost:8001/api/v1/nodes/<your-node-name>/status
```
Verify that the dongle advertisement has been removed:
@ -191,20 +183,13 @@ kubectl describe node <your-node-name> | grep dongle
(you should not see any output)
## {{% heading "whatsnext" %}}
### For application developers
* [Assign Extended Resources to a Container](/docs/tasks/configure-pod-container/extended-resource/)
- [Assign Extended Resources to a Container](/docs/tasks/configure-pod-container/extended-resource/)
### For cluster administrators
* [Configure Minimum and Maximum Memory Constraints for a Namespace](/docs/tasks/administer-cluster/manage-resources/memory-constraint-namespace/)
* [Configure Minimum and Maximum CPU Constraints for a Namespace](/docs/tasks/administer-cluster/manage-resources/cpu-constraint-namespace/)
- [Configure Minimum and Maximum Memory Constraints for a Namespace](/docs/tasks/administer-cluster/manage-resources/memory-constraint-namespace/)
- [Configure Minimum and Maximum CPU Constraints for a Namespace](/docs/tasks/administer-cluster/manage-resources/cpu-constraint-namespace/)

View File

@ -83,8 +83,8 @@ providers:
#
# A match exists between an image and a matchImage when all of the below are true:
# - Both contain the same number of domain parts and each part matches.
# - The URL path of an imageMatch must be a prefix of the target image URL path.
# - If the imageMatch contains a port, then the port must match in the image as well.
# - The URL path of an matchImages must be a prefix of the target image URL path.
# - If the matchImages contains a port, then the port must match in the image as well.
#
# Example values of matchImages:
# - 123456789.dkr.ecr.us-east-1.amazonaws.com
@ -143,7 +143,7 @@ A match exists between an image name and a `matchImage` entry when all of the be
* Both contain the same number of domain parts and each part matches.
* The URL path of match image must be a prefix of the target image URL path.
* If the imageMatch contains a port, then the port must match in the image as well.
* If the matchImages contains a port, then the port must match in the image as well.
Some example values of `matchImages` patterns are:

View File

@ -111,11 +111,11 @@ This is the default policy and does not affect the memory allocation in any way.
It acts the same as if the Memory Manager is not present at all.
The `None` policy returns default topology hint. This special hint denotes that Hint Provider
(Memory Manger in this case) has no preference for NUMA affinity with any resource.
(Memory Manager in this case) has no preference for NUMA affinity with any resource.
#### Static policy {#policy-static}
In the case of the `Guaranteed` pod, the `Static` Memory Manger policy returns topology hints
In the case of the `Guaranteed` pod, the `Static` Memory Manager policy returns topology hints
relating to the set of NUMA nodes where the memory can be guaranteed,
and reserves the memory through updating the internal [NodeMap][2] object.

View File

@ -43,7 +43,7 @@ Decide whether you want to deploy a [cloud](#creating-a-calico-cluster-with-goog
## Creating a local Calico cluster with kubeadm
To get a local single-host Calico cluster in fifteen minutes using kubeadm, refer to the
[Calico Quickstart](https://docs.projectcalico.org/latest/getting-started/kubernetes/).
[Calico Quickstart](https://projectcalico.docs.tigera.io/getting-started/kubernetes/).

Some files were not shown because too many files have changed in this diff Show More