Merge branch 'kubernetes:main' into main

pull/40385/head
BAKPATINA Vincent de Paul 2023-02-21 22:47:15 +00:00 committed by GitHub
commit b9305a7dce
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
374 changed files with 13015 additions and 4707 deletions

View File

@ -44,6 +44,7 @@ aliases:
- divya-mohan0209
- kbhawkey
- mehabhalodiya
- mengjiao-liu
- natalisucks
- nate-double-u
- onlydole
@ -143,6 +144,7 @@ aliases:
- chenxuc
- howieyuen
# idealhack
- kinzhi
- mengjiao-liu
- my-git9
# pigletfly
@ -166,6 +168,7 @@ aliases:
- edsoncelio
- femrtnz
- jcjesus
- mrerlison
- rikatz
- stormqueen1990
- yagonobre

View File

@ -878,3 +878,17 @@ div.alert > em.javascript-required {
color: #fff;
background: #326de6;
}
// Adjust Bing search result page
#bing-results-container {
padding: 1em;
}
#bing-pagination-container {
padding: 1em;
margin-bottom: 1em;
a.bing-page-anchor {
padding: 0.5em;
margin: 0.25em;
}
}

View File

@ -30,7 +30,9 @@ Whether testing locally or running a global enterprise, Kubernetes flexibility g
{{% blocks/feature image="suitcase" %}}
#### Run K8s Anywhere
Kubernetes is open source giving you the freedom to take advantage of on-premises, hybrid, or public cloud infrastructure, letting you effortlessly move workloads to where it matters to you.
Kubernetes is open source giving you the freedom to take advantage of on-premises, hybrid, or public cloud infrastructure, letting you effortlessly move workloads to where it matters to you.
To download Kubernetes, visit the [download](/releases/download/) section.
{{% /blocks/feature %}}

View File

@ -143,7 +143,7 @@ When a default StorageClass exists and a user creates a PersistentVolumeClaim wi
Kubernetes 1.4 maintains backwards compatibility with the alpha version of the dynamic provisioning feature to allow for a smoother transition to the beta version. The alpha behavior is triggered by the existance of the alpha dynamic provisioning annotation (volume. **alpha**.kubernetes.io/storage-class). Keep in mind that if the beta annotation (volume. **beta**.kubernetes.io/storage-class) is present, it takes precedence, and triggers the beta behavior.
Kubernetes 1.4 maintains backwards compatibility with the alpha version of the dynamic provisioning feature to allow for a smoother transition to the beta version. The alpha behavior is triggered by the existence of the alpha dynamic provisioning annotation (volume. **alpha**.kubernetes.io/storage-class). Keep in mind that if the beta annotation (volume. **beta**.kubernetes.io/storage-class) is present, it takes precedence, and triggers the beta behavior.

View File

@ -192,7 +192,7 @@ To modify/add your own DAGs, you can use `kubectl cp` to upload local files into
# Get Involved
This feature is just the beginning of multiple major efforts to improves Apache Airflow integration into Kubernetes. The Kubernetes Operator has been merged into the [1.10 release branch of Airflow](https://github.com/apache/incubator-airflow/tree/v1-10-test) (the executor in experimental mode), along with a fully k8s native scheduler called the Kubernetes Executor (article to come). These features are still in a stage where early adopters/contributers can have a huge influence on the future of these features.
This feature is just the beginning of multiple major efforts to improves Apache Airflow integration into Kubernetes. The Kubernetes Operator has been merged into the [1.10 release branch of Airflow](https://github.com/apache/incubator-airflow/tree/v1-10-test) (the executor in experimental mode), along with a fully k8s native scheduler called the Kubernetes Executor (article to come). These features are still in a stage where early adopters/contributors can have a huge influence on the future of these features.
For those interested in joining these efforts, I'd recommend checkint out these steps:

View File

@ -460,7 +460,7 @@ Now you can configure your DHCP. Basically you should set the `next-server` and
I use ISC-DHCP server, and here is an example `dhcpd.conf`:
```
shared-network ltsp-netowrk {
shared-network ltsp-network {
subnet 10.9.0.0 netmask 255.255.0.0 {
authoritative;
default-lease-time -1;

View File

@ -8,7 +8,7 @@ date: 2018-12-12
Kubernetes provides great primitives for deploying applications to a cluster: it can be as simple as `kubectl create -f app.yaml`. Deploy apps across multiple clusters has never been that simple. How should app workloads be distributed? Should the app resources be replicated into all clusters, replicated into selected clusters, or partitioned into clusters? How is access to the clusters managed? What happens if some of the resources that a user wants to distribute pre-exist, in some or all of the clusters, in some form?
In SIG Multicluster, our journey has revealed that there are multiple possible models to solve these problems and there probably is no single best-fit, all-scenario solution. [Federation](/docs/concepts/cluster-administration/federation/), however, is the single biggest Kubernetes open source sub-project, and has seen the maximum interest and contribution from the community in this problem space. The project initially reused the Kubernetes API to do away with any added usage complexity for an existing Kubernetes user. This approach was not viable, because of the problems summarised below:
In SIG Multicluster, our journey has revealed that there are multiple possible models to solve these problems and there probably is no single best-fit, all-scenario solution. [Kubernetes Cluster Federation (KubeFed for short)](https://github.com/kubernetes-sigs/kubefed), however, is the single biggest Kubernetes open source sub-project, and has seen the maximum interest and contribution from the community in this problem space. The project initially reused the Kubernetes API to do away with any added usage complexity for an existing Kubernetes user. This approach was not viable, because of the problems summarised below:
* Difficulties in re-implementing the Kubernetes API at the cluster level, as federation-specific extensions were stored in annotations.
* Limited flexibility in federated types, placement and reconciliation, due to 1:1 emulation of the Kubernetes API.

View File

@ -27,7 +27,7 @@ Our goal is for Kubernetes docs to be a trustworthy guide to Kubernetes features
### Re-homing content
Some content will be removed that readers may find helpful. To make sure readers have continous access to information, we're giving stakeholders until the [1.19 release deadline for docs](https://github.com/kubernetes/sig-release/tree/master/releases/release-1.19), **July 9th, 2020** to re-home any content slated for removal.
Some content will be removed that readers may find helpful. To make sure readers have continuous access to information, we're giving stakeholders until the [1.19 release deadline for docs](https://github.com/kubernetes/sig-release/tree/master/releases/release-1.19), **July 9th, 2020** to re-home any content slated for removal.
Over the next few months you'll see less third party content in the docs as contributors open PRs to remove content.

View File

@ -520,7 +520,7 @@ And the real strength of WSL2 integration, the port `8443` once open on WSL2 dis
Working on the command line is always good and very insightful. However, when dealing with Kubernetes we might want, at some point, to have a visual overview.
For that, Minikube embeded the [Kubernetes Dashboard](https://github.com/kubernetes/dashboard). Thanks to it, running and accessing the Dashboard is very simple:
For that, Minikube embedded the [Kubernetes Dashboard](https://github.com/kubernetes/dashboard). Thanks to it, running and accessing the Dashboard is very simple:
```bash
# Enable the Dashboard service

View File

@ -198,7 +198,7 @@ GUINEVERE SAENGER: I would want Jorge to be really on top of making sure that ev
Greater communication of timelines and just giving people more time and space to be able to get in their changes, or at least, seemingly give them more time and space by sending early warnings, is going to be helpful. Of course, he's going to have a slightly longer release, too, than I did. This might be related to a unique Q4 challenge. Overall, I would encourage him to take more breaks, to rely more on his release shadows, and split out the work in a fashion that allows everyone to have a turn and everyone to have a break as well.
**ADAM GLICK: What would your advice be to someone who is hearing your experience and is inspired to get involved with the Kubernetes release or contributer process?**
**ADAM GLICK: What would your advice be to someone who is hearing your experience and is inspired to get involved with the Kubernetes release or contributor process?**
GUINEVERE SAENGER: Those are two separate questions. So let me tackle the Kubernetes release question first. Kubernetes [SIG Release](https://github.com/kubernetes/sig-release/#readme) has, in my opinion, a really excellent onboarding program for new members. We have what is called the [Release Team Shadow Program](https://github.com/kubernetes/sig-release/blob/master/release-team/shadows.md). We also have the Release Engineering Shadow Program, or the Release Management Shadow Program. Those are two separate subprojects within SIG Release. And each subproject has a team of roles, and each role can have two to four shadows that are basically people who are part of that role team, and they are learning that role as they are doing it.

View File

@ -81,7 +81,7 @@ If the `ServerSideFieldValidation` feature gate is enabled starting 1.23, users
With the feature gate enabled, we also introduce the `fieldValidation` query parameter so that users can specify the desired behavior of the server on a per request basis. Valid values for the `fieldValidation` query parameter are:
- Ignore (default when feature gate is disabled, same as pre-1.23 behavior of dropping/ignoring unkonwn fields)
- Ignore (default when feature gate is disabled, same as pre-1.23 behavior of dropping/ignoring unknown fields)
- Warn (default when feature gate is enabled).
- Strict (this will fail the request with an Invalid Request error)

View File

@ -5,6 +5,7 @@ linkTitle: "Dockershim Removal FAQ"
date: 2022-02-17
slug: dockershim-faq
aliases: [ '/dockershim' ]
evergreen: true
---
**This supersedes the original

View File

@ -32,7 +32,7 @@ Caleb is also a co-organizer of the [CloudNative NZ](https://www.meetup.com/clou
## [Dylan Graham](https://github.com/DylanGraham)
Dylan Graham is a cloud engineer from Adeliade, Australia. He has been contributing to the upstream Kubernetes project since 2018.
Dylan Graham is a cloud engineer from Adelaide, Australia. He has been contributing to the upstream Kubernetes project since 2018.
He stated that being a part of such a large-scale project was initially overwhelming, but that the community's friendliness and openness assisted him in getting through it.

View File

@ -18,7 +18,7 @@ case where you're using the `OrderedReady` Pod management policy for a StatefulS
Here are some examples:
- I am using a StatefulSet to orchestrate a multi-instance, cache based application where the size of the cache is large. The cache
starts cold and requires some siginificant amount of time before the container can start. There could be more initial startup tasks
starts cold and requires some significant amount of time before the container can start. There could be more initial startup tasks
that are required. A RollingUpdate on this StatefulSet would take a lot of time before the application is fully updated. If the
StatefulSet supported updating more than one pod at a time, it would result in a much faster update.

View File

@ -145,7 +145,7 @@ workstream within the Gateway API subproject focused on Gateway API for Mesh
Management and Administration.
This group will deliver [enhancement
proposals](https://gateway-api.sigs.k8s.io/v1beta1/contributing/gep/) consisting
proposals](https://gateway-api.sigs.k8s.io/geps/overview/) consisting
of resources, additions, and modifications to the Gateway API specification for
mesh and mesh-adjacent use-cases.

View File

@ -89,7 +89,7 @@ To use cgroup v2 with Kubernetes, you must meet the following requirements:
* The kubelet and the container runtime are configured to use the [systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver)
The kubelet and container runtime use a [cgroup driver](/docs/setup/production-environment/container-runtimes#cgroup-drivers)
to set cgroup paramaters. When using cgroup v2, it's strongly recommended that both
to set cgroup parameters. When using cgroup v2, it's strongly recommended that both
the kubelet and your container runtime use the
[systemd cgroup driver](/docs/setup/production-environment/container-runtimes#systemd-cgroup-driver),
so that there's a single cgroup manager on the system. To configure the kubelet

View File

@ -438,7 +438,7 @@ kubectl apply -f crds/stable.example.com_appendonlylists.yaml
customresourcedefinition.apiextensions.k8s.io/appendonlylists.stable.example.com created
```
Creating an inital list with one element inside should succeed without problem:
Creating an initial list with one element inside should succeed without problem:
```shell
kubectl apply -f - <<EOF
---

View File

@ -30,7 +30,7 @@ cloud computing resources.
In this release we want to recognise the importance of all these building blocks on which Kubernetes
is developed and used, while at the same time raising awareness on the importance of taking the
energy consumption footprint into account: environmental sustainability is an inescapable concern of
creators and users of any software solution, and the environmental footprint of sofware, like
creators and users of any software solution, and the environmental footprint of software, like
Kubernetes, an area which we believe will play a significant role in future releases.
As a community, we always work to make each new release process better than before (in this release,

View File

@ -90,7 +90,7 @@ This dependency made the tracking of Job status unreliable, because Pods can be
deleted from the API for a number of reasons, including:
- The garbage collector removing orphan Pods when a Node goes down.
- The garbage collector removing terminated Pods when they reach a threshold.
- The Kubernetes scheduler preempting a Pod to accomodate higher priority Pods.
- The Kubernetes scheduler preempting a Pod to accommodate higher priority Pods.
- The taint manager evicting a Pod that doesn't tolerate a `NoExecute` taint.
- External controllers, not included as part of Kubernetes, or humans deleting
Pods.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 116 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 427 KiB

View File

@ -8,7 +8,7 @@ slug: security-behavior-analysis
**Author:**
David Hadas (IBM Research Labs)
_This post warns Devops from a false sense of security. Following security best practices when developing and configuring microservices do not result in non-vulnerable microservices. The post shows that although all deployed microservices are vulnerable, there is much that can be done to ensure microservices are not exploited. It explains how analyzing the behavior of clients and services from a security standpoint, named here **"Security-Behavior Analysis"**, can protect the deployed vulnerable microservices. It points to [Guard](http://knative.dev/security-guard), an open source project offering security-behavior monitoring and control of Kubernetes microservices presumed vulnerable._
_This post warns Devops from a false sense of security. Following security best practices when developing and configuring microservices do not result in non-vulnerable microservices. The post shows that although all deployed microservices are vulnerable, there is much that can be done to ensure microservices are not exploited. It explains how analyzing the behavior of clients and services from a security standpoint, named here **"Security-Behavior Analytics"**, can protect the deployed vulnerable microservices. It points to [Guard](http://knative.dev/security-guard), an open source project offering security-behavior monitoring and control of Kubernetes microservices presumed vulnerable._
As cyber attacks continue to intensify in sophistication, organizations deploying cloud services continue to grow their cyber investments aiming to produce safe and non-vulnerable services. However, the year-by-year growth in cyber investments does not result in a parallel reduction in cyber incidents. Instead, the number of cyber incidents continues to grow annually. Evidently, organizations are doomed to fail in this struggle - no matter how much effort is made to detect and remove cyber weaknesses from deployed services, it seems offenders always have the upper hand.
@ -22,7 +22,7 @@ In other words, consciously accept that you will never create completely invulne
Being vulnerable does not necessarily mean that your service will be exploited. Though your services are vulnerable in some ways unknown to you, offenders still need to identify these vulnerabilities and then exploit them. If offenders fail to exploit your service vulnerabilities, you win! In other words, having a vulnerability that cant be exploited, represents a risk that cant be realized.
{{< figure src="Example.png" alt="Image of an example of offender gaining foothold in a service" class="diagram-large" caption="Figure 1. An Offender gaining foothold in a vulnerable service" >}}
{{< figure src="security_behavior_figure_1.svg" alt="Image of an example of offender gaining foothold in a service" class="diagram-large" caption="Figure 1. An Offender gaining foothold in a vulnerable service" >}}
The above diagram shows an example in which the offender does not yet have a foothold in the service; that is, it is assumed that your service does not run code controlled by the offender on day 1. In our example the service has vulnerabilities in the API exposed to clients. To gain an initial foothold the offender uses a malicious client to try and exploit one of the service API vulnerabilities. The malicious client sends an exploit that triggers some unplanned behavior of the service.
@ -55,7 +55,7 @@ Fortunately, microservice architecture is well suited to security-behavior monit
Kubernetes is often used to support workloads designed with microservice architecture. By design, microservices aim to follow the UNIX philosophy of "Do One Thing And Do It Well". Each microservice has a bounded context and a clear interface. In other words, you can expect the microservice clients to send relatively regular requests and the microservice to present a relatively regular behavior as a response to these requests. Consequently, a microservice architecture is an excellent candidate for security-behavior monitoring.
{{< figure src="Microservices.png" alt="Image showing why microservices are well suited for security-behavior monitoring" class="diagram-large" caption="Figure 2. Microservices are well suited for security-behavior monitoring" >}}
{{< figure src="security_behavior_figure_2.svg" alt="Image showing why microservices are well suited for security-behavior monitoring" class="diagram-large" caption="Figure 2. Microservices are well suited for security-behavior monitoring" >}}
The diagram above clarifies how dividing a monolithic service to a set of microservices improves our ability to perform security-behavior monitoring and control. In a monolithic service approach, different client requests are intertwined, resulting in a diminished ability to identify irregular client behaviors. Without prior knowledge, an observer of the intertwined client requests will find it hard to distinguish between types of requests and their related characteristics. Further, internal client requests are not exposed to the observer. Lastly, the aggregated behavior of the monolithic service is a compound of the many different internal behaviors of its components, making it hard to identify irregular service behavior.

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 379 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 421 KiB

View File

@ -8,54 +8,134 @@ canonicalUrl: https://www.kubernetes.dev/blog/2023/02/03/sig-instrumentation-spo
**Author:** Imran Noor Mohamed (Delivery Hero)
Observability requires the right data at the right time for the right consumer (human or piece of software) to make the right decision. In the context of Kubernetes, having best practices for cluster observability across all Kubernetes components is crucial.
Observability requires the right data at the right time for the right consumer
(human or piece of software) to make the right decision. In the context of Kubernetes,
having best practices for cluster observability across all Kubernetes components is crucial.
SIG Instrumentation helps to address this issue by providing best practices and tools that all other SIGs use to instrument Kubernetes components-like the *API server*, *scheduler*, *kubelet* and *kube-controller-manager*.
SIG Instrumentation helps to address this issue by providing best practices and tools
that all other SIGs use to instrument Kubernetes components-like the *API server*,
*scheduler*, *kubelet* and *kube-controller-manager*.
In this SIG Instrumentation spotlight, [Imran Noor Mohamed](https://www.linkedin.com/in/imrannoormohamed/), SIG ContribEx-Comms tech lead talked with [Elana Hashman](https://twitter.com/ehashdn), and [Han Kang](https://www.linkedin.com/in/hankang), chairs of SIG Instrumentation, on how the SIG is organized, what are the current challenges and how anyone can get involved and contribute.
In this SIG Instrumentation spotlight, [Imran Noor Mohamed](https://www.linkedin.com/in/imrannoormohamed/),
SIG ContribEx-Comms tech lead talked with [Elana Hashman](https://twitter.com/ehashdn),
and [Han Kang](https://www.linkedin.com/in/hankang), chairs of SIG Instrumentation,
on how the SIG is organized, what are the current challenges and how anyone can get involved and contribute.
## About SIG Instrumentation
**Imran (INM)**: Hello, thank you for the opportunity of learning more about SIG Instrumentation. Could you tell us a bit about yourself, your role, and how you got involved in SIG Instrumentation?
**Imran (INM)**: Hello, thank you for the opportunity of learning more about SIG Instrumentation.
Could you tell us a bit about yourself, your role, and how you got involved in SIG Instrumentation?
**Han (HK)**: I started in SIG Instrumentation in 2018, and became a chair in 2020. I primarily got involved with SIG instrumentation due to a number of upstream issues with metrics which ended up affecting GKE in bad ways. As a result, we ended up launching an initiative to stabilize our metrics and make metrics a proper API.
**Han (HK)**: I started in SIG Instrumentation in 2018, and became a chair in 2020.
I primarily got involved with SIG instrumentation due to a number of upstream issues
with metrics which ended up affecting GKE in bad ways. As a result, we ended up
launching an initiative to stabilize our metrics and make metrics a proper API.
**Elana (EH)**: I also joined SIG Instrumentation in 2018 and became a chair at the same time as Han. I was working as a site reliability engineer (SRE) on bare metal Kubernetes clusters and was working to build out our observability stack. I encountered some issues with label joins where Kubernetes metrics didnt match kube-state-metrics ([KSM](https://github.com/kubernetes/kube-state-metrics)) and started participating in SIG meetings to improve things. I helped test performance improvements to kube-state-metrics and ultimately coauthored a KEP for overhauling metrics in the 1.14 release to improve usability.
**Elana (EH)**: I also joined SIG Instrumentation in 2018 and became a chair at the
same time as Han. I was working as a site reliability engineer (SRE) on bare metal
Kubernetes clusters and was working to build out our observability stack.
I encountered some issues with label joins where Kubernetes metrics didnt match
kube-state-metrics ([KSM](https://github.com/kubernetes/kube-state-metrics)) and
started participating in SIG meetings to improve things. I helped test performance
improvements to kube-state-metrics and ultimately coauthored a KEP for overhauling
metrics in the 1.14 release to improve usability.
**Imran (INM)**: Interesting! Does that mean SIG Instrumentation involves a lot of plumbing?
**Han (HK)**: I wouldnt say it involves a ton of plumbing, though it does touch basically every code base. We have our own dedicated directories for our metrics, logs, and tracing frameworks which we tend to work out of primarily. We do have to interact with other SIGs in order to propagate our changes which makes us more of a horizontal SIG.
**Han (HK)**: I wouldnt say it involves a ton of plumbing, though it does touch
basically every code base. We have our own dedicated directories for our metrics,
logs, and tracing frameworks which we tend to work out of primarily. We do have to
interact with other SIGs in order to propagate our changes which makes us more of
a horizontal SIG.
**Imran (INM)**: Speaking about interaction and coordination with other SIG could you describe how the SIGs is organized?
**Imran (INM)**: Speaking about interaction and coordination with other SIG could
you describe how the SIGs is organized?
**Elana (EH)**: In SIG Instrumentation, we have two chairs, Han and myself, as well as two tech leads, David Ashpole and Damien Grisonnet. We all work together as the SIGs leads in order to run meetings, triage issues and PRs, review and approve KEPs, plan for each release, present at KubeCon and community meetings, and write our annual report. Within the SIG we also have a number of important subprojects, each of which is stewarded by its subproject owners. For example, Marek Siarkowicz is a subproject owner of [metrics-server](https://github.com/kubernetes-sigs/metrics-server).
**Elana (EH)**: In SIG Instrumentation, we have two chairs, Han and myself, as well
as two tech leads, David Ashpole and Damien Grisonnet. We all work together as the
SIGs leads in order to run meetings, triage issues and PRs, review and approve KEPs,
plan for each release, present at KubeCon and community meetings, and write our annual
report. Within the SIG we also have a number of important subprojects, each of which is
stewarded by its subproject owners. For example, Marek Siarkowicz is a subproject owner
of [metrics-server](https://github.com/kubernetes-sigs/metrics-server).
Because were a horizontal SIG, some of our projects have a wide scope and require coordination from a dedicated group of contributors. For example, in order to guide the Kubernetes migration to structured logging, we chartered the [Structured Logging](https://github.com/kubernetes/community/blob/master/wg-structured-logging/README.md) Working Group (WG), organized by Marek and Patrick Ohly. The WG doesnt own any code, but helps with various components such as the *kubelet*, *scheduler*, etc. in migrating their code to use structured logs.
Because were a horizontal SIG, some of our projects have a wide scope and require
coordination from a dedicated group of contributors. For example, in order to guide
the Kubernetes migration to structured logging, we chartered the
[Structured Logging](https://github.com/kubernetes/community/blob/master/wg-structured-logging/README.md)
Working Group (WG), organized by Marek and Patrick Ohly. The WG doesnt own any code,
but helps with various components such as the *kubelet*, *scheduler*, etc. in migrating
their code to use structured logs.
**Imran (INM)**: Walking through the [charter](https://github.com/kubernetes/community/blob/master/sig-instrumentation/charter.md) alone its clear that SIG Instrumentation has a lot of sub-projects. Could you highlight some important ones?
**Imran (INM)**: Walking through the
[charter](https://github.com/kubernetes/community/blob/master/sig-instrumentation/charter.md)
alone its clear that SIG Instrumentation has a lot of sub-projects.
Could you highlight some important ones?
**Han (HK)**: We have many different sub-projects and we are in dire need of people who can come and help shepherd them. Our most important projects in-tree (that is, within the kubernetes/kubernetes repo) are metrics, tracing, and, structured logging. Our most important projects out-of-tree are (a) KSM (kube-state-metrics) and (b) metrics-server.
**Han (HK)**: We have many different sub-projects and we are in dire need of
people who can come and help shepherd them. Our most important projects in-tree
(that is, within the kubernetes/kubernetes repo) are metrics, tracing, and,
structured logging. Our most important projects out-of-tree are
(a) KSM (kube-state-metrics) and (b) metrics-server.
**Elana (EH)**: Echoing this, we would love to bring on more maintainers for kube-state-metrics and metrics-server. Our friends at WG Structured Logging are also looking for contributors. Other subprojects include klog, prometheus-adapter, and a new subproject that we just launched for collecting high-fidelity, scalable utilization metrics called [usage-metrics-collector](https://github.com/kubernetes-sigs/usage-metrics-collector). All are seeking new contributors!
**Elana (EH)**: Echoing this, we would love to bring on more maintainers for
kube-state-metrics and metrics-server. Our friends at WG Structured Logging are
also looking for contributors. Other subprojects include klog, prometheus-adapter,
and a new subproject that we just launched for collecting high-fidelity, scalable
utilization metrics called [usage-metrics-collector](https://github.com/kubernetes-sigs/usage-metrics-collector).
All are seeking new contributors!
## Current status and ongoing challenges
**Imran (INM)**: For release [1.26](https://github.com/kubernetes/sig-release/tree/master/releases/release-1.26) we can see that there are a relevant number of metrics, logs, and tracing [KEPs](https://www.k8s.dev/resources/keps/) in the pipeline. Would you like to point out important things for last release (maybe alpha & stable milestone candidates?)
**Imran (INM)**: For release [1.26](https://github.com/kubernetes/sig-release/tree/master/releases/release-1.26)
we can see that there are a relevant number of metrics, logs, and tracing
[KEPs](https://www.k8s.dev/resources/keps/) in the pipeline. Would you like to
point out important things for last release (maybe alpha & stable milestone candidates?)
**Han (HK)**: We can now generate [documentation](https://kubernetes.io/docs/reference/instrumentation/metrics/) for every single metric in the main Kubernetes code base! We have a pretty fancy static analysis pipeline that enables this functionality. Weve also added feature metrics so that you can look at your metrics to determine which features are enabled in your cluster at a given time. Lastly, we added a component-sli endpoint, which should make it easy for people to create availability SLOs for *control-plane* components.
**Han (HK)**: We can now generate [documentation](https://kubernetes.io/docs/reference/instrumentation/metrics/)
for every single metric in the main Kubernetes code base! We have a pretty fancy
static analysis pipeline that enables this functionality. Weve also added feature
metrics so that you can look at your metrics to determine which features are enabled
in your cluster at a given time. Lastly, we added a component-sli endpoint, which
should make it easy for people to create availability SLOs for *control-plane* components.
**Elana (EH)**: Weve also been working on tracing KEPs for both the *API server* and *kubelet*, though neither graduated in 1.26. Im also really excited about the work Han is doing with WG Reliability to extend and improve our metrics stability framework.
**Elana (EH)**: Weve also been working on tracing KEPs for both the *API server*
and *kubelet*, though neither graduated in 1.26. Im also really excited about the
work Han is doing with WG Reliability to extend and improve our metrics stability framework.
**Imran (INM)**: What do you think are the Kubernetes-specific challenges tackled by the SIG Instrumentation? What are the future efforts to solve them?
**Imran (INM)**: What do you think are the Kubernetes-specific challenges tackled by
the SIG Instrumentation? What are the future efforts to solve them?
**Han (HK)**: SIG instrumentation suffered a bit in the past from being a horizontal SIG. We did not have an obvious location to put our code and did not have a good mechanism to audit metrics that people would randomly add. Weve fixed this over the years and now we have dedicated spots for our code and a reliable mechanism for auditing new metrics. We also now offer stability guarantees for metrics. We hope to have full-blown tracing up and down the kubernetes stack, and metric support via exemplars.
**Han (HK)**: SIG instrumentation suffered a bit in the past from being a horizontal SIG.
We did not have an obvious location to put our code and did not have a good mechanism to
audit metrics that people would randomly add. Weve fixed this over the years and now we
have dedicated spots for our code and a reliable mechanism for auditing new metrics.
We also now offer stability guarantees for metrics. We hope to have full-blown tracing
up and down the kubernetes stack, and metric support via exemplars.
**Elana (EH)**: I think SIG Instrumentation is a really interesting SIG because it poses different kinds of opportunities to get involved than in other SIGs. You dont have to be a software developer to contribute to our SIG! All of our components and subprojects are focused on better understanding Kubernetes and its performance in production, which allowed me to get involved as one of the few SIG Chairs working as an SRE at that time. I like that we provide opportunities for newcomers to contribute through using, testing, and providing feedback on our subprojects, which is a lower barrier to entry. Because many of these projects are out-of-tree, I think one of our challenges is to figure out whats in scope for core Kubernetes SIGs instrumentation subprojects, whats missing, and then fill in the gaps.
**Elana (EH)**: I think SIG Instrumentation is a really interesting SIG because it
poses different kinds of opportunities to get involved than in other SIGs. You dont
have to be a software developer to contribute to our SIG! All of our components and
subprojects are focused on better understanding Kubernetes and its performance in
production, which allowed me to get involved as one of the few SIG Chairs working as
an SRE at that time. I like that we provide opportunities for newcomers to contribute
through using, testing, and providing feedback on our subprojects, which is a lower
barrier to entry. Because many of these projects are out-of-tree, I think one of our
challenges is to figure out whats in scope for core Kubernetes SIGs instrumentation
subprojects, whats missing, and then fill in the gaps.
## Community and contribution
**Imran (INM)**: Kubernetes values community over products. Any recommendation for anyone looking into getting involved in SIG Instrumentation work? Where should they start (new contributor-friendly areas within SIG?)
**Imran (INM)**: Kubernetes values community over products. Any recommendation
for anyone looking into getting involved in SIG Instrumentation work? Where
should they start (new contributor-friendly areas within SIG?)
**Han(HK) and Elana (EH)**: Come to our bi-weekly triage [meetings](https://github.com/kubernetes/community/tree/master/sig-instrumentation#meetings)! They arent recorded and are a great place to ask questions and learn about our ongoing work. We strive to be a friendly community and one of the easiest SIGs to get started with. You can check out our latest KubeCon NA 2022 [SIG Instrumentation Deep Dive](https://youtu.be/JIzrlWtAA8Y) to get more insight into our work. We also invite you to join our Slack channel #sig-instrumentation and feel free to reach out to any of our SIG leads or subproject owners directly.
**Han(HK) and Elana (EH)**: Come to our bi-weekly triage
[meetings](https://github.com/kubernetes/community/tree/master/sig-instrumentation#meetings)!
They arent recorded and are a great place to ask questions and learn about our ongoing work.
We strive to be a friendly community and one of the easiest SIGs to get started with.
You can check out our latest KubeCon NA 2022 [SIG Instrumentation Deep Dive](https://youtu.be/JIzrlWtAA8Y)
to get more insight into our work. We also invite you to join our Slack channel #sig-instrumentation
and feel free to reach out to any of our SIG leads or subproject owners directly.
Thank you so much for your time and insights into the workings of SIG Instrumentation!

View File

@ -0,0 +1,50 @@
---
layout: blog
title: "k8s.gcr.io Image Registry Will Be Frozen From the 3rd of April 2023"
date: 2023-02-06
slug: k8s-gcr-io-freeze-announcement
---
**Authors**: Mahamed Ali (Rackspace Technology)
The Kubernetes project runs a community-owned image registry called `registry.k8s.io` to host its container images. On the 3rd of April 2023, the old registry `k8s.gcr.io` will be frozen and no further images for Kubernetes and related subprojects will be pushed to the old registry.
This registry `registry.k8s.io` replaced the old one and has been generally available for several months. We have published a [blog post](/blog/2022/11/28/registry-k8s-io-faster-cheaper-ga/) about its benefits to the community and the Kubernetes project. This post also announced that future versions of Kubernetes will not be available in the old registry. Now that time has come.
What does this change mean for contributors:
- If you are a maintainer of a subproject, you will need to update your manifests and Helm charts to use the new registry.
What does this change mean for end users:
- 1.27 Kubernetes release will not be published to the old registry.
- Patch releases for 1.24, 1.25, and 1.26 will no longer be published to the old registry from April. Please read the timelines below for details of the final patch releases in the old registry.
- Starting in 1.25, the default image registry has been set to `registry.k8s.io`. This value is overridable in `kubeadm` and `kubelet` but setting it to `k8s.gcr.io` will fail for new releases after April as they wont be present in the old registry.
- If you want to increase the reliability of your cluster and remove dependency on the community-owned registry or you are running Kubernetes in networks where external traffic is restricted, you should consider hosting local image registry mirrors. Some cloud vendors may offer hosted solutions for this.
## Timeline of the changes
- `k8s.gcr.io` will be frozen on the 3rd of April 2023
- 1.27 is expected to be released on the 12th of April 2023
- The last 1.23 release on `k8s.gcr.io` will be 1.23.18 (1.23 goes end-of-life before the freeze)
- The last 1.24 release on `k8s.gcr.io` will be 1.24.12
- The last 1.25 release on `k8s.gcr.io` will be 1.25.8
- The last 1.26 release on `k8s.gcr.io` will be 1.26.3
## What's next
Please make sure your cluster does not have dependencies on old image registry. For example, you can run this command to list the images used by pods:
```shell
kubectl get pods --all-namespaces -o jsonpath="{.items[*].spec.containers[*].image}" |\
tr -s '[[:space:]]' '\n' |\
sort |\
uniq -c
```
There may be other dependencies on the old image registry. Make sure you review any potential dependencies to keep your cluster healthy and up to date.
## Acknowledgments
__Change is hard__, and evolving our image-serving platform is needed to ensure a sustainable future for the project. We strive to make things better for everyone using Kubernetes. Many contributors from all corners of our community have been working long and hard to ensure we are making the best decisions possible, executing plans, and doing our best to communicate those plans.
Thanks to Aaron Crickenberger, Arnaud Meukam, Benjamin Elder, Caleb Woodbine, Davanum Srinivas, Mahamed Ali, and Tim Hockin from SIG K8s Infra, Brian McQueen, and Sergey Kanzhelev from SIG Node, Lubomir Ivanov from SIG Cluster Lifecycle, Adolfo García Veytia, Jeremy Rickard, Sascha Grunert, and Stephen Augustus from SIG Release, Bob Killen and Kaslin Fields from SIG Contribex, Tim Allclair from the Security Response Committee. Also a big thank you to our friends acting as liaisons with our cloud provider partners: Jay Pipes from Amazon and Jon Johnson Jr. from Google.

View File

@ -0,0 +1,36 @@
---
layout: blog
title: "Free Katacoda Kubernetes Tutorials Are Shutting Down"
date: 2023-02-14
slug: kubernetes-katacoda-tutorials-stop-from-2023-03-31
evergreen: true
---
**Author**: Natali Vlatko, SIG Docs Co-Chair for Kubernetes
[Katacoda](https://katacoda.com/kubernetes), the popular learning platform from OReilly that has been helping people learn all about
Java, Docker, Kubernetes, Python, Go, C++, and more, [shut down for public use in June 2022](https://www.oreilly.com/online-learning/leveraging-katacoda-technology.html).
However, tutorials specifically for Kubernetes, linked from the Kubernetes website for our projects
users and contributors, remained available and active after this change. Unfortunately, this will no
longer be the case, and Katacoda tutorials for learning Kubernetes will cease working after March 31st, 2023.
The Kubernetes Project wishes to thank O'Reilly Media for the many years it has supported the community
via the Katacoda learning platform. You can read more about [the decision to shutter katacoda.com](https://www.oreilly.com/online-learning/leveraging-katacoda-technology.html)
on O'Reilly's own site. With this change, well be focusing on the work needed to remove links to
their various tutorials. We have a general issue tracking this topic at [#33936](https://github.com/kubernetes/website/issues/33936) and [GitHub discussion](https://github.com/kubernetes/website/discussions/38878). Were also
interested in researching what other learning platforms could be beneficial for the Kubernetes community,
replacing Katacoda with a link to a platform or service that has a similar user experience. However,
this research will take time, so were actively looking for volunteers to help with this work.
If a replacement is found, it will need to be supported by Kubernetes leadership, specifically,
SIG Contributor Experience, SIG Docs, and the Kubernetes Steering Committee.
The Katacoda shutdown affects 25 tutorial pages, their localizations, as well as the Katacoda
Scenario repository: [github.com/katacoda-scenarios/kubernetes-bootcamp-scenarios](https://github.com/katacoda-scenarios/kubernetes-bootcamp-scenarios). We recommend
that any links, guides, or documentation you have that points to the Katacoda learning platform be
updated immediately to reflect this change. While we have yet to find a replacement learning solution,
the Kubernetes website contains a lot of helpful documentation to support your continued learning and growth.
You can find all of our available documentation tutorials for Kubernetes at https://k8s.io/docs/tutorials/.
If you have any questions regarding the Katacoda shutdown, or subsequent link removal from Kubernetes
tutorial pages, please feel free to comment on the [general issue tracking the shutdown](https://github.com/kubernetes/website/issues/33936),
or visit the #sig-docs channel on the Kubernetes Slack.

View File

@ -103,6 +103,8 @@ updated to newer versions that support cgroup v2. For example:
* If you run [cAdvisor](https://github.com/google/cadvisor) as a stand-alone
DaemonSet for monitoring pods and containers, update it to v0.43.0 or later.
* If you use JDK, prefer to use JDK 11.0.16 and later or JDK 15 and later, which [fully support cgroup v2](https://bugs.openjdk.org/browse/JDK-8230305).
* If you are using the [uber-go/automaxprocs](https://github.com/uber-go/automaxprocs) package, make sure
the version you use is v1.5.1 or higher.
## Identify the cgroup version on Linux Nodes {#check-cgroup-version}

View File

@ -6,30 +6,32 @@ weight: 30
<!-- overview -->
Distributed systems often have a need for "leases", which provides a mechanism to lock shared resources and coordinate activity between nodes.
In Kubernetes, the "lease" concept is represented by `Lease` objects in the `coordination.k8s.io` API group, which are used for system-critical
capabilities like node heart beats and component-level leader election.
Distributed systems often have a need for _leases_, which provide a mechanism to lock shared resources
and coordinate activity between members of a set.
In Kubernetes, the lease concept is represented by [Lease](/docs/reference/kubernetes-api/cluster-resources/lease-v1/)
objects in the `coordination.k8s.io` {{< glossary_tooltip text="API Group" term_id="api-group" >}},
which are used for system-critical capabilities such as node heartbeats and component-level leader election.
<!-- body -->
## Node Heart Beats
## Node heartbeats {#node-heart-beats}
Kubernetes uses the Lease API to communicate kubelet node heart beats to the Kubernetes API server.
Kubernetes uses the Lease API to communicate kubelet node heartbeats to the Kubernetes API server.
For every `Node` , there is a `Lease` object with a matching name in the `kube-node-lease`
namespace. Under the hood, every kubelet heart beat is an UPDATE request to this `Lease` object, updating
namespace. Under the hood, every kubelet heartbeat is an **update** request to this `Lease` object, updating
the `spec.renewTime` field for the Lease. The Kubernetes control plane uses the time stamp of this field
to determine the availability of this `Node`.
See [Node Lease objects](/docs/concepts/architecture/nodes/#heartbeats) for more details.
## Leader Election
## Leader election
Leases are also used in Kubernetes to ensure only one instance of a component is running at any given time.
Kubernetes also uses Leases to ensure only one instance of a component is running at any given time.
This is used by control plane components like `kube-controller-manager` and `kube-scheduler` in
HA configurations, where only one instance of the component should be actively running while the other
instances are on stand-by.
## API Server Identity
## API server identity
{{< feature-state for_k8s_version="v1.26" state="beta" >}}
@ -43,22 +45,23 @@ You can inspect Leases owned by each kube-apiserver by checking for lease object
with the name `kube-apiserver-<sha256-hash>`. Alternatively you can use the label selector `k8s.io/component=kube-apiserver`:
```shell
$ kubectl -n kube-system get lease -l k8s.io/component=kube-apiserver
kubectl -n kube-system get lease -l k8s.io/component=kube-apiserver
```
```
NAME HOLDER AGE
kube-apiserver-c4vwjftbvpc5os2vvzle4qg27a kube-apiserver-c4vwjftbvpc5os2vvzle4qg27a_9cbf54e5-1136-44bd-8f9a-1dcd15c346b4 5m33s
kube-apiserver-dz2dqprdpsgnm756t5rnov7yka kube-apiserver-dz2dqprdpsgnm756t5rnov7yka_84f2a85d-37c1-4b14-b6b9-603e62e4896f 4m23s
kube-apiserver-fyloo45sdenffw2ugwaz3likua kube-apiserver-fyloo45sdenffw2ugwaz3likua_c5ffa286-8a9a-45d4-91e7-61118ed58d2e 4m43s
```
The SHA256 hash used in the lease name is based on the OS hostname as seen by kube-apiserver. Each kube-apiserver should be
The SHA256 hash used in the lease name is based on the OS hostname as seen by that API server. Each kube-apiserver should be
configured to use a hostname that is unique within the cluster. New instances of kube-apiserver that use the same hostname
will take over existing Leases using a new holder identity, as opposed to instantiating new lease objects. You can check the
will take over existing Leases using a new holder identity, as opposed to instantiating new Lease objects. You can check the
hostname used by kube-apisever by checking the value of the `kubernetes.io/hostname` label:
```shell
$ kubectl -n kube-system get lease kube-apiserver-c4vwjftbvpc5os2vvzle4qg27a -o yaml
kubectl -n kube-system get lease kube-apiserver-c4vwjftbvpc5os2vvzle4qg27a -o yaml
```
```yaml
apiVersion: coordination.k8s.io/v1
kind: Lease
@ -78,3 +81,23 @@ spec:
```
Expired leases from kube-apiservers that no longer exist are garbage collected by new kube-apiservers after 1 hour.
You can disable API server identity leases by disabling the `APIServerIdentity`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
## Workloads {#custom-workload}
Your own workload can define its own use of Leases. For example, you might run a custom
{{< glossary_tooltip term_id="controller" text="controller" >}} where a primary or leader member
performs operations that its peers do not. You define a Lease so that the controller replicas can select
or elect a leader, using the Kubernetes API for coordination.
If you do use a Lease, it's a good practice to define a name for the Lease that is obviously linked to
the product or component. For example, if you have a component named Example Foo, use a Lease named
`example-foo`.
If a cluster operator or another end user could deploy multiple instances of a component, select a name
prefix and pick a mechanism (such as hash of the name of the Deployment) to avoid name collisions
for the Leases.
You can use another approach so long as it achieves the same outcome: different software products do
not conflict with one another.

View File

@ -274,7 +274,7 @@ availability of each node, and to take action when failures are detected.
For nodes there are two forms of heartbeats:
* updates to the `.status` of a Node
* [Lease](/docs/reference/kubernetes-api/cluster-resources/lease-v1/) objects
* [Lease](/docs/concepts/architecture/leases/) objects
within the `kube-node-lease`
{{< glossary_tooltip term_id="namespace" text="namespace">}}.
Each Node has an associated Lease object.

View File

@ -17,11 +17,11 @@ This page lists some of the available add-ons and links to their respective inst
## Networking and Network Policy
* [ACI](https://www.github.com/noironetworks/aci-containers) provides integrated container networking and network security with Cisco ACI.
* [Antrea](https://antrea.io/) operates at Layer 3/4 to provide networking and security services for Kubernetes, leveraging Open vSwitch as the networking data plane.
* [Antrea](https://antrea.io/) operates at Layer 3/4 to provide networking and security services for Kubernetes, leveraging Open vSwitch as the networking data plane. Antrea is a [CNCF project at the Sandbox level](https://www.cncf.io/projects/antrea/).
* [Calico](https://docs.projectcalico.org/latest/introduction/) is a networking and network policy provider. Calico supports a flexible set of networking options so you can choose the most efficient option for your situation, including non-overlay and overlay networks, with or without BGP. Calico uses the same engine to enforce network policy for hosts, pods, and (if using Istio & Envoy) applications at the service mesh layer.
* [Canal](https://projectcalico.docs.tigera.io/getting-started/kubernetes/flannel/flannel) unites Flannel and Calico, providing networking and network policy.
* [Cilium](https://github.com/cilium/cilium) is a networking, observability, and security solution with an eBPF-based data plane. Cilium provides a simple flat Layer 3 network with the ability to span multiple clusters in either a native routing or overlay/encapsulation mode, and can enforce network policies on L3-L7 using an identity-based security model that is decoupled from network addressing. Cilium can act as a replacement for kube-proxy; it also offers additional, opt-in observability and security features.
* [CNI-Genie](https://github.com/cni-genie/CNI-Genie) enables Kubernetes to seamlessly connect to a choice of CNI plugins, such as Calico, Canal, Flannel, or Weave.
* [Cilium](https://github.com/cilium/cilium) is a networking, observability, and security solution with an eBPF-based data plane. Cilium provides a simple flat Layer 3 network with the ability to span multiple clusters in either a native routing or overlay/encapsulation mode, and can enforce network policies on L3-L7 using an identity-based security model that is decoupled from network addressing. Cilium can act as a replacement for kube-proxy; it also offers additional, opt-in observability and security features. Cilium is a [CNCF project at the Incubation level](https://www.cncf.io/projects/cilium/).
* [CNI-Genie](https://github.com/cni-genie/CNI-Genie) enables Kubernetes to seamlessly connect to a choice of CNI plugins, such as Calico, Canal, Flannel, or Weave. CNI-Genie is a [CNCF project at the Sandbox level](https://www.cncf.io/projects/cni-genie/).
* [Contiv](https://contivpp.io/) provides configurable networking (native L3 using BGP, overlay using vxlan, classic L2, and Cisco-SDN/ACI) for various use cases and a rich policy framework. Contiv project is fully [open sourced](https://github.com/contiv). The [installer](https://github.com/contiv/install) provides both kubeadm and non-kubeadm based installation options.
* [Contrail](https://www.juniper.net/us/en/products-services/sdn/contrail/contrail-networking/), based on [Tungsten Fabric](https://tungsten.io), is an open source, multi-cloud network virtualization and policy management platform. Contrail and Tungsten Fabric are integrated with orchestration systems such as Kubernetes, OpenShift, OpenStack and Mesos, and provide isolation modes for virtual machines, containers/pods and bare metal workloads.
* [Flannel](https://github.com/flannel-io/flannel#deploying-flannel-manually) is an overlay network provider that can be used with Kubernetes.

View File

@ -638,6 +638,10 @@ poorly-behaved workloads that may be harming system health.
standard deviation of seat demand seen during the last concurrency
borrowing adjustment period.
* `apiserver_flowcontrol_demand_seats_smoothed` is a gauge vector
holding, for each priority level, the smoothed enveloped seat demand
determined at the last concurrency adjustment.
* `apiserver_flowcontrol_target_seats` is a gauge vector holding, for
each priority level, the concurrency target going into the borrowing
allocation problem.
@ -701,14 +705,15 @@ serves the following additional paths at its HTTP[S] ports.
The output is similar to this:
```none
PriorityLevelName, ActiveQueues, IsIdle, IsQuiescing, WaitingRequests, ExecutingRequests,
workload-low, 0, true, false, 0, 0,
global-default, 0, true, false, 0, 0,
exempt, <none>, <none>, <none>, <none>, <none>,
catch-all, 0, true, false, 0, 0,
system, 0, true, false, 0, 0,
leader-election, 0, true, false, 0, 0,
workload-high, 0, true, false, 0, 0,
PriorityLevelName, ActiveQueues, IsIdle, IsQuiescing, WaitingRequests, ExecutingRequests, DispatchedRequests, RejectedRequests, TimedoutRequests, CancelledRequests
catch-all, 0, true, false, 0, 0, 1, 0, 0, 0
exempt, <none>, <none>, <none>, <none>, <none>, <none>, <none>, <none>, <none>
global-default, 0, true, false, 0, 0, 46, 0, 0, 0
leader-election, 0, true, false, 0, 0, 4, 0, 0, 0
node-high, 0, true, false, 0, 0, 34, 0, 0, 0
system, 0, true, false, 0, 0, 48, 0, 0, 0
workload-high, 0, true, false, 0, 0, 500, 0, 0, 0
workload-low, 0, true, false, 0, 0, 0, 0, 0, 0
```
- `/debug/api_priority_and_fairness/dump_queues` - a listing of all the
@ -761,7 +766,34 @@ serves the following additional paths at its HTTP[S] ports.
system, system-nodes, 12, 0, system:node:127.0.0.1, 2020-07-23T15:31:03.583823404Z, system:node:127.0.0.1, create, /api/v1/namespaces/scaletest/configmaps,
system, system-nodes, 12, 1, system:node:127.0.0.1, 2020-07-23T15:31:03.594555947Z, system:node:127.0.0.1, create, /api/v1/namespaces/scaletest/configmaps,
```
### Debug logging
At `-v=3` or more verbose the server outputs an httplog line for every
request, and it includes the following attributes.
- `apf_fs`: the name of the flow schema to which the request was classified.
- `apf_pl`: the name of the priority level for that flow schema.
- `apf_iseats`: the number of seats determined for the initial
(normal) stage of execution of the request.
- `apf_fseats`: the number of seats determined for the final stage of
execution (accounting for the associated WATCH notifications) of the
request.
- `apf_additionalLatency`: the duration of the final stage of
execution of the request.
At higher levels of verbosity there will be log lines exposing details
of how APF handled the request, primarily for debug purposes.
### Response headers
APF adds the following two headers to each HTTP response message.
- `X-Kubernetes-PF-FlowSchema-UID` holds the UID of the FlowSchema
object to which the corresponding request was classified.
- `X-Kubernetes-PF-PriorityLevel-UID` holds the UID of the
PriorityLevelConfiguration object associated with that FlowSchema.
## {{% heading "whatsnext" %}}

View File

@ -807,4 +807,5 @@ memory limit (and possibly request) for that container.
and its [resource requirements](/docs/reference/kubernetes-api/workload-resources/pod-v1/#resources)
* Read about [project quotas](https://xfs.org/index.php/XFS_FAQ#Q:_Quota:_Do_quotas_work_on_XFS.3F) in XFS
* Read more about the [kube-scheduler configuration reference (v1beta3)](/docs/reference/config-api/kube-scheduler-config.v1beta3/)
* Read more about [Quality of Service classes for Pods](/docs/concepts/workloads/pods/pod-qos/)

View File

@ -165,15 +165,35 @@ for that Pod, including details of the problem fetching the Secret.
#### Optional Secrets {#restriction-secret-must-exist}
When you define a container environment variable based on a Secret,
you can mark it as _optional_. The default is for the Secret to be
required.
When you reference a Secret in a Pod, you can mark the Secret as _optional_,
such as in the following example. If an optional Secret doesn't exist,
Kubernetes ignores it.
None of a Pod's containers will start until all non-optional Secrets are
available.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: mypod
image: redis
volumeMounts:
- name: foo
mountPath: "/etc/foo"
readOnly: true
volumes:
- name: foo
secret:
secretName: mysecret
optional: true
```
If a Pod references a specific key in a Secret and that Secret does exist, but
is missing the named key, the Pod fails during startup.
By default, Secrets are required. None of a Pod's containers will start until
all non-optional Secrets are available.
If a Pod references a specific key in a non-optional Secret and that Secret
does exist, but is missing the named key, the Pod fails during startup.
### Using Secrets as files from a Pod {#using-secrets-as-files-from-a-pod}
@ -181,181 +201,8 @@ If you want to access data from a Secret in a Pod, one way to do that is to
have Kubernetes make the value of that Secret be available as a file inside
the filesystem of one or more of the Pod's containers.
To configure that, you:
1. Create a secret or use an existing one. Multiple Pods can reference the same secret.
1. Modify your Pod definition to add a volume under `.spec.volumes[]`. Name the volume anything,
and have a `.spec.volumes[].secret.secretName` field equal to the name of the Secret object.
1. Add a `.spec.containers[].volumeMounts[]` to each container that needs the secret. Specify
`.spec.containers[].volumeMounts[].readOnly = true` and
`.spec.containers[].volumeMounts[].mountPath` to an unused directory name where you would like the
secrets to appear.
1. Modify your image or command line so that the program looks for files in that directory. Each
key in the secret `data` map becomes the filename under `mountPath`.
This is an example of a Pod that mounts a Secret named `mysecret` in a volume:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: mypod
image: redis
volumeMounts:
- name: foo
mountPath: "/etc/foo"
readOnly: true
volumes:
- name: foo
secret:
secretName: mysecret
optional: false # default setting; "mysecret" must exist
```
Each Secret you want to use needs to be referred to in `.spec.volumes`.
If there are multiple containers in the Pod, then each container needs its
own `volumeMounts` block, but only one `.spec.volumes` is needed per Secret.
{{< note >}}
Versions of Kubernetes before v1.22 automatically created credentials for accessing
the Kubernetes API. This older mechanism was based on creating token Secrets that
could then be mounted into running Pods.
In more recent versions, including Kubernetes v{{< skew currentVersion >}}, API credentials
are obtained directly by using the [TokenRequest](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/) API,
and are mounted into Pods using a [projected volume](/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume).
The tokens obtained using this method have bounded lifetimes, and are automatically
invalidated when the Pod they are mounted into is deleted.
You can still [manually create](/docs/tasks/configure-pod-container/configure-service-account/#manually-create-a-service-account-api-token)
a service account token Secret; for example, if you need a token that never expires.
However, using the [TokenRequest](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/)
subresource to obtain a token to access the API is recommended instead.
You can use the [`kubectl create token`](/docs/reference/generated/kubectl/kubectl-commands#-em-token-em-)
command to obtain a token from the `TokenRequest` API.
{{< /note >}}
#### Projection of Secret keys to specific paths
You can also control the paths within the volume where Secret keys are projected.
You can use the `.spec.volumes[].secret.items` field to change the target path of each key:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: mypod
image: redis
volumeMounts:
- name: foo
mountPath: "/etc/foo"
readOnly: true
volumes:
- name: foo
secret:
secretName: mysecret
items:
- key: username
path: my-group/my-username
```
What will happen:
* the `username` key from `mysecret` is available to the container at the path
`/etc/foo/my-group/my-username` instead of at `/etc/foo/username`.
* the `password` key from that Secret object is not projected.
If `.spec.volumes[].secret.items` is used, only keys specified in `items` are projected.
To consume all keys from the Secret, all of them must be listed in the `items` field.
If you list keys explicitly, then all listed keys must exist in the corresponding Secret.
Otherwise, the volume is not created.
#### Secret files permissions
You can set the POSIX file access permission bits for a single Secret key.
If you don't specify any permissions, `0644` is used by default.
You can also set a default mode for the entire Secret volume and override per key if needed.
For example, you can specify a default mode like this:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: mypod
image: redis
volumeMounts:
- name: foo
mountPath: "/etc/foo"
volumes:
- name: foo
secret:
secretName: mysecret
defaultMode: 0400
```
The secret is mounted on `/etc/foo`; all the files created by the
secret volume mount have permission `0400`.
{{< note >}}
If you're defining a Pod or a Pod template using JSON, beware that the JSON
specification doesn't support octal notation. You can use the decimal value
for the `defaultMode` (for example, 0400 in octal is 256 in decimal) instead.
If you're writing YAML, you can write the `defaultMode` in octal.
{{< /note >}}
#### Consuming Secret values from volumes
Inside the container that mounts a secret volume, the secret keys appear as
files. The secret values are base64 decoded and stored inside these files.
This is the result of commands executed inside the container from the example above:
```shell
ls /etc/foo/
```
The output is similar to:
```
username
password
```
```shell
cat /etc/foo/username
```
The output is similar to:
```
admin
```
```shell
cat /etc/foo/password
```
The output is similar to:
```
1f2d1e2e67df
```
The program in a container is responsible for reading the secret data from these
files, as needed.
#### Mounted Secrets are updated automatically
For instructions, refer to
[Distribute credentials securely using Secrets](/docs/tasks/inject-data-application/distribute-credentials-secure/#create-a-pod-that-has-access-to-the-secret-data-through-a-volume).
When a volume contains data from a Secret, and that Secret is updated, Kubernetes tracks
this and updates the data in the volume, using an eventually-consistent approach.
@ -388,53 +235,23 @@ watch propagation delay, the configured cache TTL, or zero for direct polling).
To use a Secret in an {{< glossary_tooltip text="environment variable" term_id="container-env-variables" >}}
in a Pod:
1. Create a Secret (or use an existing one). Multiple Pods can reference the same Secret.
1. Modify your Pod definition in each container that you wish to consume the value of a secret
key to add an environment variable for each secret key you wish to consume. The environment
variable that consumes the secret key should populate the secret's name and key in `env[].valueFrom.secretKeyRef`.
1. Modify your image and/or command line so that the program looks for values in the specified
environment variables.
This is an example of a Pod that uses a Secret via environment variables:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: secret-env-pod
spec:
containers:
- name: mycontainer
image: redis
env:
- name: SECRET_USERNAME
valueFrom:
secretKeyRef:
name: mysecret
key: username
optional: false # same as default; "mysecret" must exist
# and include a key named "username"
- name: SECRET_PASSWORD
valueFrom:
secretKeyRef:
name: mysecret
key: password
optional: false # same as default; "mysecret" must exist
# and include a key named "password"
restartPolicy: Never
```
1. For each container in your Pod specification, add an environment variable
for each Secret key that you want to use to the
`env[].valueFrom.secretKeyRef` field.
1. Modify your image and/or command line so that the program looks for values
in the specified environment variables.
For instructions, refer to
[Define container environment variables using Secret data](/docs/tasks/inject-data-application/distribute-credentials-secure/#define-container-environment-variables-using-secret-data).
#### Invalid environment variables {#restriction-env-from-invalid}
Secrets used to populate environment variables by the `envFrom` field that have keys
that are considered invalid environment variable names will have those keys
skipped. The Pod is allowed to start.
If your environment variable definitions in your Pod specification are
considered to be invalid environment variable names, those keys aren't made
available to your container. The Pod is allowed to start.
If you define a Pod with an invalid variable name, the failed Pod startup includes
an event with the reason set to `InvalidVariableNames` and a message that lists the
skipped invalid keys. The following example shows a Pod that refers to a Secret
named `mysecret`, where `mysecret` contains 2 invalid keys: `1badkey` and `2alsobad`.
Kubernetes adds an Event with the reason set to `InvalidVariableNames` and a
message that lists the skipped invalid keys. The following example shows a Pod that refers to a Secret named `mysecret`, where `mysecret` contains 2 invalid keys: `1badkey` and `2alsobad`.
```shell
kubectl get events
@ -447,42 +264,6 @@ LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT
0s 0s 1 dapi-test-pod Pod Warning InvalidEnvironmentVariableNames kubelet, 127.0.0.1 Keys [1badkey, 2alsobad] from the EnvFrom secret default/mysecret were skipped since they are considered invalid environment variable names.
```
#### Consuming Secret values from environment variables
Inside a container that consumes a Secret using environment variables, the secret keys appear
as normal environment variables. The values of those variables are the base64 decoded values
of the secret data.
This is the result of commands executed inside the container from the example above:
```shell
echo "$SECRET_USERNAME"
```
The output is similar to:
```
admin
```
```shell
echo "$SECRET_PASSWORD"
```
The output is similar to:
```
1f2d1e2e67df
```
{{< note >}}
If a container already consumes a Secret in an environment variable,
a Secret update will not be seen by the container unless it is
restarted. There are third party solutions for triggering restarts when
secrets change.
{{< /note >}}
### Container image pull secrets {#using-imagepullsecrets}
If you want to fetch container images from a private repository, you need a way for
@ -518,43 +299,10 @@ You cannot use ConfigMaps or Secrets with {{< glossary_tooltip text="static Pods
## Use cases
### Use case: As container environment variables
### Use case: As container environment variables {#use-case-as-container-environment-variables}
Create a secret
```yaml
apiVersion: v1
kind: Secret
metadata:
name: mysecret
type: Opaque
data:
USER_NAME: YWRtaW4=
PASSWORD: MWYyZDFlMmU2N2Rm
```
Create the Secret:
```shell
kubectl apply -f mysecret.yaml
```
Use `envFrom` to define all of the Secret's data as container environment variables. The key from
the Secret becomes the environment variable name in the Pod.
```yaml
apiVersion: v1
kind: Pod
metadata:
name: secret-test-pod
spec:
containers:
- name: test-container
image: registry.k8s.io/busybox
command: [ "/bin/sh", "-c", "env" ]
envFrom:
- secretRef:
name: mysecret
restartPolicy: Never
```
You can create a Secret and use it to
[set environment variables for a container](/docs/tasks/inject-data-application/distribute-credentials-secure/#define-container-environment-variables-using-secret-data).
### Use case: Pod with SSH keys
@ -873,13 +621,28 @@ A `kubernetes.io/service-account-token` type of Secret is used to store a
token credential that identifies a
{{< glossary_tooltip text="service account" term_id="service-account" >}}.
Since 1.22, this type of Secret is no longer used to mount credentials into Pods,
and obtaining tokens via the [TokenRequest](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/)
API is recommended instead of using service account token Secret objects.
Tokens obtained from the `TokenRequest` API are more secure than ones stored in Secret objects,
because they have a bounded lifetime and are not readable by other API clients.
You can use the [`kubectl create token`](/docs/reference/generated/kubectl/kubectl-commands#-em-token-em-)
{{< note >}}
Versions of Kubernetes before v1.22 automatically created credentials for
accessing the Kubernetes API. This older mechanism was based on creating token
Secrets that could then be mounted into running Pods.
In more recent versions, including Kubernetes v{{< skew currentVersion >}}, API
credentials are obtained directly by using the
[TokenRequest](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/)
API, and are mounted into Pods using a
[projected volume](/docs/reference/access-authn-authz/service-accounts-admin/#bound-service-account-token-volume).
The tokens obtained using this method have bounded lifetimes, and are
automatically invalidated when the Pod they are mounted into is deleted.
You can still
[manually create](/docs/tasks/configure-pod-container/configure-service-account/#manually-create-a-service-account-api-token)
a service account token Secret; for example, if you need a token that never
expires. However, using the
[TokenRequest](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/)
subresource to obtain a token to access the API is recommended instead.
You can use the
[`kubectl create token`](/docs/reference/generated/kubectl/kubectl-commands#-em-token-em-)
command to obtain a token from the `TokenRequest` API.
{{< /note >}}
You should only create a service account token Secret object
if you can't use the `TokenRequest` API to obtain a token,

View File

@ -88,56 +88,56 @@ spec:
The general workflow of a device plugin includes the following steps:
1. Initialization. During this phase, the device plugin performs vendor-specific
initialization and setup to make sure the devices are in a ready state.
initialization and setup to make sure the devices are in a ready state.
1. The plugin starts a gRPC service, with a Unix socket under the host path
`/var/lib/kubelet/device-plugins/`, that implements the following interfaces:
`/var/lib/kubelet/device-plugins/`, that implements the following interfaces:
```gRPC
service DevicePlugin {
// GetDevicePluginOptions returns options to be communicated with Device Manager.
rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
```gRPC
service DevicePlugin {
// GetDevicePluginOptions returns options to be communicated with Device Manager.
rpc GetDevicePluginOptions(Empty) returns (DevicePluginOptions) {}
// ListAndWatch returns a stream of List of Devices
// Whenever a Device state change or a Device disappears, ListAndWatch
// returns the new list
rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}
// ListAndWatch returns a stream of List of Devices
// Whenever a Device state change or a Device disappears, ListAndWatch
// returns the new list
rpc ListAndWatch(Empty) returns (stream ListAndWatchResponse) {}
// Allocate is called during container creation so that the Device
// Plugin can run device specific operations and instruct Kubelet
// of the steps to make the Device available in the container
rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
// Allocate is called during container creation so that the Device
// Plugin can run device specific operations and instruct Kubelet
// of the steps to make the Device available in the container
rpc Allocate(AllocateRequest) returns (AllocateResponse) {}
// GetPreferredAllocation returns a preferred set of devices to allocate
// from a list of available ones. The resulting preferred allocation is not
// guaranteed to be the allocation ultimately performed by the
// devicemanager. It is only designed to help the devicemanager make a more
// informed allocation decision when possible.
rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {}
// GetPreferredAllocation returns a preferred set of devices to allocate
// from a list of available ones. The resulting preferred allocation is not
// guaranteed to be the allocation ultimately performed by the
// devicemanager. It is only designed to help the devicemanager make a more
// informed allocation decision when possible.
rpc GetPreferredAllocation(PreferredAllocationRequest) returns (PreferredAllocationResponse) {}
// PreStartContainer is called, if indicated by Device Plugin during registeration phase,
// before each container start. Device plugin can run device specific operations
// such as resetting the device before making devices available to the container.
rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {}
}
```
// PreStartContainer is called, if indicated by Device Plugin during registeration phase,
// before each container start. Device plugin can run device specific operations
// such as resetting the device before making devices available to the container.
rpc PreStartContainer(PreStartContainerRequest) returns (PreStartContainerResponse) {}
}
```
{{< note >}}
Plugins are not required to provide useful implementations for
`GetPreferredAllocation()` or `PreStartContainer()`. Flags indicating
the availability of these calls, if any, should be set in the `DevicePluginOptions`
message sent back by a call to `GetDevicePluginOptions()`. The `kubelet` will
always call `GetDevicePluginOptions()` to see which optional functions are
available, before calling any of them directly.
{{< /note >}}
{{< note >}}
Plugins are not required to provide useful implementations for
`GetPreferredAllocation()` or `PreStartContainer()`. Flags indicating
the availability of these calls, if any, should be set in the `DevicePluginOptions`
message sent back by a call to `GetDevicePluginOptions()`. The `kubelet` will
always call `GetDevicePluginOptions()` to see which optional functions are
available, before calling any of them directly.
{{< /note >}}
1. The plugin registers itself with the kubelet through the Unix socket at host
path `/var/lib/kubelet/device-plugins/kubelet.sock`.
path `/var/lib/kubelet/device-plugins/kubelet.sock`.
{{< note >}}
The ordering of the workflow is important. A plugin MUST start serving gRPC
service before registering itself with kubelet for successful registration.
{{< /note >}}
{{< note >}}
The ordering of the workflow is important. A plugin MUST start serving gRPC
service before registering itself with kubelet for successful registration.
{{< /note >}}
1. After successfully registering itself, the device plugin runs in serving mode, during which it keeps
monitoring device health and reports back to the kubelet upon any device state changes.
@ -297,7 +297,6 @@ However, calling `GetAllocatableResources` endpoint is not sufficient in case of
update and Kubelet needs to be restarted to reflect the correct resource capacity and allocatable.
{{< /note >}}
```gRPC
// AllocatableResourcesResponses contains informations about all the devices known by the kubelet
message AllocatableResourcesResponse {
@ -318,14 +317,14 @@ Preceding Kubernetes v1.23, to enable this feature `kubelet` must be started wit
```
`ContainerDevices` do expose the topology information declaring to which NUMA cells the device is
affine. The NUMA cells are identified using a opaque integer ID, which value is consistent to
affine. The NUMA cells are identified using a opaque integer ID, which value is consistent to
what device plugins report
[when they register themselves to the kubelet](/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#device-plugin-integration-with-the-topology-manager).
The gRPC service is served over a unix socket at `/var/lib/kubelet/pod-resources/kubelet.sock`.
Monitoring agents for device plugin resources can be deployed as a daemon, or as a DaemonSet.
The canonical directory `/var/lib/kubelet/pod-resources` requires privileged access, so monitoring
agents must run in a privileged security context. If a device monitoring agent is running as a
agents must run in a privileged security context. If a device monitoring agent is running as a
DaemonSet, `/var/lib/kubelet/pod-resources` must be mounted as a
{{< glossary_tooltip term_id="volume" >}} in the device monitoring agent's
[PodSpec](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podspec-v1-core).
@ -360,7 +359,7 @@ resource assignment decisions.
`TopologyInfo` supports setting a `nodes` field to either `nil` or a list of NUMA nodes. This
allows the Device Plugin to advertise a device that spans multiple NUMA nodes.
Setting `TopologyInfo` to `nil` or providing an empty list of NUMA nodes for a given device
Setting `TopologyInfo` to `nil` or providing an empty list of NUMA nodes for a given device
indicates that the Device Plugin does not have a NUMA affinity preference for that device.
An example `TopologyInfo` struct populated for a device by a Device Plugin:
@ -396,4 +395,3 @@ Here are some examples of device plugin implementations:
* Learn about the [Topology Manager](/docs/tasks/administer-cluster/topology-manager/)
* Read about using [hardware acceleration for TLS ingress](/blog/2019/04/24/hardware-accelerated-ssl/tls-termination-in-ingress-controllers-using-kubernetes-device-plugins-and-runtimeclass/)
with Kubernetes

View File

@ -8,7 +8,6 @@ weight: 60
You can use Kubernetes annotations to attach arbitrary non-identifying metadata
to objects. Clients such as tools and libraries can retrieve this metadata.
<!-- body -->
## Attaching metadata to objects
@ -74,10 +73,9 @@ If the prefix is omitted, the annotation Key is presumed to be private to the us
The `kubernetes.io/` and `k8s.io/` prefixes are reserved for Kubernetes core components.
For example, here's the configuration file for a Pod that has the annotation `imageregistry: https://hub.docker.com/` :
For example, here's a manifest for a Pod that has the annotation `imageregistry: https://hub.docker.com/` :
```yaml
apiVersion: v1
kind: Pod
metadata:
@ -90,14 +88,8 @@ spec:
image: nginx:1.14.2
ports:
- containerPort: 80
```
## {{% heading "whatsnext" %}}
Learn more about [Labels and Selectors](/docs/concepts/overview/working-with-objects/labels/).

View File

@ -79,7 +79,7 @@ Valid label value:
* unless empty, must begin and end with an alphanumeric character (`[a-z0-9A-Z]`),
* could contain dashes (`-`), underscores (`_`), dots (`.`), and alphanumerics between.
For example, here's the configuration file for a Pod that has two labels
For example, here's a manifest for a Pod that has two labels
`environment: production` and `app: nginx`:
```yaml
@ -259,7 +259,7 @@ or
```yaml
selector:
component: redis
component: redis
```
This selector (respectively in `json` or `yaml` format) is equivalent to
@ -278,8 +278,8 @@ selector:
matchLabels:
component: redis
matchExpressions:
- {key: tier, operator: In, values: [cache]}
- {key: environment, operator: NotIn, values: [dev]}
- { key: tier, operator: In, values: [cache] }
- { key: environment, operator: NotIn, values: [dev] }
```
`matchLabels` is a map of `{key,value}` pairs. A single `{key,value}` in the

View File

@ -24,6 +24,10 @@ For non-unique user-provided attributes, Kubernetes provides [labels](/docs/conc
{{< glossary_definition term_id="name" length="all" >}}
**Names must be unique across all [API versions](/docs/concepts/overview/kubernetes-api/#api-groups-and-versioning)
of the same resource. API resources are distinguished by their API group, resource type, namespace
(for namespaced resources), and name. In other words, API version is irrelevant in this context.**
{{< note >}}
In cases when objects represent a physical entity, like a Node representing a physical host, when the host is re-created under the same name without deleting and re-creating the Node, Kubernetes treats the new host as the old one, which may lead to inconsistencies.
{{< /note >}}

View File

@ -44,7 +44,7 @@ Kubernetes starts with four initial namespaces:
: Kubernetes includes this namespace so that you can start using your new cluster without first creating a namespace.
`kube-node-lease`
: This namespace holds [Lease](/docs/reference/kubernetes-api/cluster-resources/lease-v1/) objects associated with each node. Node leases allow the kubelet to send [heartbeats](/docs/concepts/architecture/nodes/#heartbeats) so that the control plane can detect node failure.
: This namespace holds [Lease](/docs/concepts/architecture/leases/) objects associated with each node. Node leases allow the kubelet to send [heartbeats](/docs/concepts/architecture/nodes/#heartbeats) so that the control plane can detect node failure.
`kube-public`
: This namespace is readable by *all* clients (including those not authenticated). This namespace is mostly reserved for cluster usage, in case that some resources should be visible and readable publicly throughout the whole cluster. The public aspect of this namespace is only a convention, not a requirement.

View File

@ -8,16 +8,15 @@ content_type: concept
weight: 20
---
<!-- overview -->
You can constrain a {{< glossary_tooltip text="Pod" term_id="pod" >}} so that it is
You can constrain a {{< glossary_tooltip text="Pod" term_id="pod" >}} so that it is
_restricted_ to run on particular {{< glossary_tooltip text="node(s)" term_id="node" >}},
or to _prefer_ to run on particular nodes.
There are several ways to do this and the recommended approaches all use
[label selectors](/docs/concepts/overview/working-with-objects/labels/) to facilitate the selection.
Often, you do not need to set any such constraints; the
{{< glossary_tooltip text="scheduler" term_id="kube-scheduler" >}} will automatically do a reasonable placement
{{< glossary_tooltip text="scheduler" term_id="kube-scheduler" >}} will automatically do a reasonable placement
(for example, spreading your Pods across nodes so as not place Pods on a node with insufficient free resources).
However, there are some circumstances where you may want to control which node
the Pod deploys to, for example, to ensure that a Pod ends up on a node with an SSD attached to it,
@ -28,10 +27,10 @@ or to co-locate Pods from two different services that communicate a lot into the
You can use any of the following methods to choose where Kubernetes schedules
specific Pods:
* [nodeSelector](#nodeselector) field matching against [node labels](#built-in-node-labels)
* [Affinity and anti-affinity](#affinity-and-anti-affinity)
* [nodeName](#nodename) field
* [Pod topology spread constraints](#pod-topology-spread-constraints)
- [nodeSelector](#nodeselector) field matching against [node labels](#built-in-node-labels)
- [Affinity and anti-affinity](#affinity-and-anti-affinity)
- [nodeName](#nodename) field
- [Pod topology spread constraints](#pod-topology-spread-constraints)
## Node labels {#built-in-node-labels}
@ -51,7 +50,7 @@ and a different value in other environments.
Adding labels to nodes allows you to target Pods for scheduling on specific
nodes or groups of nodes. You can use this functionality to ensure that specific
Pods only run on nodes with certain isolation, security, or regulatory
properties.
properties.
If you use labels for node isolation, choose label keys that the {{<glossary_tooltip text="kubelet" term_id="kubelet">}}
cannot modify. This prevents a compromised node from setting those labels on
@ -59,7 +58,7 @@ itself so that the scheduler schedules workloads onto the compromised node.
The [`NodeRestriction` admission plugin](/docs/reference/access-authn-authz/admission-controllers/#noderestriction)
prevents the kubelet from setting or modifying labels with a
`node-restriction.kubernetes.io/` prefix.
`node-restriction.kubernetes.io/` prefix.
To make use of that label prefix for node isolation:
@ -73,7 +72,7 @@ To make use of that label prefix for node isolation:
You can add the `nodeSelector` field to your Pod specification and specify the
[node labels](#built-in-node-labels) you want the target node to have.
Kubernetes only schedules the Pod onto nodes that have each of the labels you
specify.
specify.
See [Assign Pods to Nodes](/docs/tasks/configure-pod-container/assign-pods-nodes) for more
information.
@ -84,20 +83,20 @@ information.
labels. Affinity and anti-affinity expands the types of constraints you can
define. Some of the benefits of affinity and anti-affinity include:
* The affinity/anti-affinity language is more expressive. `nodeSelector` only
- The affinity/anti-affinity language is more expressive. `nodeSelector` only
selects nodes with all the specified labels. Affinity/anti-affinity gives you
more control over the selection logic.
* You can indicate that a rule is *soft* or *preferred*, so that the scheduler
- You can indicate that a rule is *soft* or *preferred*, so that the scheduler
still schedules the Pod even if it can't find a matching node.
* You can constrain a Pod using labels on other Pods running on the node (or other topological domain),
- You can constrain a Pod using labels on other Pods running on the node (or other topological domain),
instead of just node labels, which allows you to define rules for which Pods
can be co-located on a node.
The affinity feature consists of two types of affinity:
* *Node affinity* functions like the `nodeSelector` field but is more expressive and
- *Node affinity* functions like the `nodeSelector` field but is more expressive and
allows you to specify soft rules.
* *Inter-pod affinity/anti-affinity* allows you to constrain Pods against labels
- *Inter-pod affinity/anti-affinity* allows you to constrain Pods against labels
on other Pods.
### Node affinity
@ -106,12 +105,12 @@ Node affinity is conceptually similar to `nodeSelector`, allowing you to constra
Pod can be scheduled on based on node labels. There are two types of node
affinity:
* `requiredDuringSchedulingIgnoredDuringExecution`: The scheduler can't
schedule the Pod unless the rule is met. This functions like `nodeSelector`,
but with a more expressive syntax.
* `preferredDuringSchedulingIgnoredDuringExecution`: The scheduler tries to
find a node that meets the rule. If a matching node is not available, the
scheduler still schedules the Pod.
- `requiredDuringSchedulingIgnoredDuringExecution`: The scheduler can't
schedule the Pod unless the rule is met. This functions like `nodeSelector`,
but with a more expressive syntax.
- `preferredDuringSchedulingIgnoredDuringExecution`: The scheduler tries to
find a node that meets the rule. If a matching node is not available, the
scheduler still schedules the Pod.
{{<note>}}
In the preceding types, `IgnoredDuringExecution` means that if the node labels
@ -127,17 +126,17 @@ For example, consider the following Pod spec:
In this example, the following rules apply:
* The node *must* have a label with the key `topology.kubernetes.io/zone` and
the value of that label *must* be either `antarctica-east1` or `antarctica-west1`.
* The node *preferably* has a label with the key `another-node-label-key` and
the value `another-node-label-value`.
- The node *must* have a label with the key `topology.kubernetes.io/zone` and
the value of that label *must* be either `antarctica-east1` or `antarctica-west1`.
- The node *preferably* has a label with the key `another-node-label-key` and
the value `another-node-label-value`.
You can use the `operator` field to specify a logical operator for Kubernetes to use when
interpreting the rules. You can use `In`, `NotIn`, `Exists`, `DoesNotExist`,
`Gt` and `Lt`.
`NotIn` and `DoesNotExist` allow you to define node anti-affinity behavior.
Alternatively, you can use [node taints](/docs/concepts/scheduling-eviction/taint-and-toleration/)
Alternatively, you can use [node taints](/docs/concepts/scheduling-eviction/taint-and-toleration/)
to repel Pods from specific nodes.
{{<note>}}
@ -168,7 +167,7 @@ The final sum is added to the score of other priority functions for the node.
Nodes with the highest total score are prioritized when the scheduler makes a
scheduling decision for the Pod.
For example, consider the following Pod spec:
For example, consider the following Pod spec:
{{< codenew file="pods/pod-with-affinity-anti-affinity.yaml" >}}
@ -268,8 +267,8 @@ to unintended behavior.
Similar to [node affinity](#node-affinity) are two types of Pod affinity and
anti-affinity as follows:
* `requiredDuringSchedulingIgnoredDuringExecution`
* `preferredDuringSchedulingIgnoredDuringExecution`
- `requiredDuringSchedulingIgnoredDuringExecution`
- `preferredDuringSchedulingIgnoredDuringExecution`
For example, you could use
`requiredDuringSchedulingIgnoredDuringExecution` affinity to tell the scheduler to
@ -297,7 +296,7 @@ The affinity rule says that the scheduler can only schedule a Pod onto a node if
the node is in the same zone as one or more existing Pods with the label
`security=S1`. More precisely, the scheduler must place the Pod on a node that has the
`topology.kubernetes.io/zone=V` label, as long as there is at least one node in
that zone that currently has one or more Pods with the Pod label `security=S1`.
that zone that currently has one or more Pods with the Pod label `security=S1`.
The anti-affinity rule says that the scheduler should try to avoid scheduling
the Pod onto a node that is in the same zone as one or more Pods with the label
@ -314,9 +313,9 @@ You can use the `In`, `NotIn`, `Exists` and `DoesNotExist` values in the
In principle, the `topologyKey` can be any allowed label key with the following
exceptions for performance and security reasons:
* For Pod affinity and anti-affinity, an empty `topologyKey` field is not allowed in both `requiredDuringSchedulingIgnoredDuringExecution`
- For Pod affinity and anti-affinity, an empty `topologyKey` field is not allowed in both `requiredDuringSchedulingIgnoredDuringExecution`
and `preferredDuringSchedulingIgnoredDuringExecution`.
* For `requiredDuringSchedulingIgnoredDuringExecution` Pod anti-affinity rules,
- For `requiredDuringSchedulingIgnoredDuringExecution` Pod anti-affinity rules,
the admission controller `LimitPodHardAntiAffinityTopology` limits
`topologyKey` to `kubernetes.io/hostname`. You can modify or disable the
admission controller if you want to allow custom topologies.
@ -328,17 +327,18 @@ If omitted or empty, `namespaces` defaults to the namespace of the Pod where the
affinity/anti-affinity definition appears.
#### Namespace selector
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
You can also select matching namespaces using `namespaceSelector`, which is a label query over the set of namespaces.
The affinity term is applied to namespaces selected by both `namespaceSelector` and the `namespaces` field.
Note that an empty `namespaceSelector` ({}) matches all namespaces, while a null or empty `namespaces` list and
Note that an empty `namespaceSelector` ({}) matches all namespaces, while a null or empty `namespaces` list and
null `namespaceSelector` matches the namespace of the Pod where the rule is defined.
#### More practical use-cases
Inter-pod affinity and anti-affinity can be even more useful when they are used with higher
level collections such as ReplicaSets, StatefulSets, Deployments, etc. These
level collections such as ReplicaSets, StatefulSets, Deployments, etc. These
rules allow you to configure that a set of workloads should
be co-located in the same defined topology; for example, preferring to place two related
Pods onto the same node.
@ -430,10 +430,10 @@ spec:
Creating the two preceding Deployments results in the following cluster layout,
where each web server is co-located with a cache, on three separate nodes.
| node-1 | node-2 | node-3 |
|:--------------------:|:-------------------:|:------------------:|
| *webserver-1* | *webserver-2* | *webserver-3* |
| *cache-1* | *cache-2* | *cache-3* |
| node-1 | node-2 | node-3 |
| :-----------: | :-----------: | :-----------: |
| *webserver-1* | *webserver-2* | *webserver-3* |
| *cache-1* | *cache-2* | *cache-3* |
The overall effect is that each cache instance is likely to be accessed by a single client, that
is running on the same node. This approach aims to minimize both skew (imbalanced load) and latency.
@ -453,13 +453,18 @@ tries to place the Pod on that node. Using `nodeName` overrules using
Some of the limitations of using `nodeName` to select nodes are:
- If the named node does not exist, the Pod will not run, and in
some cases may be automatically deleted.
- If the named node does not have the resources to accommodate the
Pod, the Pod will fail and its reason will indicate why,
for example OutOfmemory or OutOfcpu.
- Node names in cloud environments are not always predictable or
stable.
- If the named node does not exist, the Pod will not run, and in
some cases may be automatically deleted.
- If the named node does not have the resources to accommodate the
Pod, the Pod will fail and its reason will indicate why,
for example OutOfmemory or OutOfcpu.
- Node names in cloud environments are not always predictable or stable.
{{< note >}}
`nodeName` is intended for use by custom schedulers or advanced use cases where
you need to bypass any configured schedulers. Bypassing the schedulers might lead to
failed Pods if the assigned Nodes get oversubscribed. You can use [node affinity](#node-affinity) or a the [`nodeselector` field](#nodeselector) to assign a Pod to a specific Node without bypassing the schedulers.
{{</ note >}}
Here is an example of a Pod spec using the `nodeName` field:
@ -489,12 +494,10 @@ to learn more about how these work.
## {{% heading "whatsnext" %}}
* Read more about [taints and tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/) .
* Read the design docs for [node affinity](https://git.k8s.io/design-proposals-archive/scheduling/nodeaffinity.md)
- Read more about [taints and tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/) .
- Read the design docs for [node affinity](https://git.k8s.io/design-proposals-archive/scheduling/nodeaffinity.md)
and for [inter-pod affinity/anti-affinity](https://git.k8s.io/design-proposals-archive/scheduling/podaffinity.md).
* Learn about how the [topology manager](/docs/tasks/administer-cluster/topology-manager/) takes part in node-level
resource allocation decisions.
* Learn how to use [nodeSelector](/docs/tasks/configure-pod-container/assign-pods-nodes/).
* Learn how to use [affinity and anti-affinity](/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/).
- Learn about how the [topology manager](/docs/tasks/administer-cluster/topology-manager/) takes part in node-level
resource allocation decisions.
- Learn how to use [nodeSelector](/docs/tasks/configure-pod-container/assign-pods-nodes/).
- Learn how to use [affinity and anti-affinity](/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/).

View File

@ -6,16 +6,16 @@ weight: 100
{{<glossary_definition term_id="node-pressure-eviction" length="short">}}</br>
The {{<glossary_tooltip term_id="kubelet" text="kubelet">}} monitors resources
like memory, disk space, and filesystem inodes on your cluster's nodes.
When one or more of these resources reach specific consumption levels, the
The {{<glossary_tooltip term_id="kubelet" text="kubelet">}} monitors resources
like memory, disk space, and filesystem inodes on your cluster's nodes.
When one or more of these resources reach specific consumption levels, the
kubelet can proactively fail one or more pods on the node to reclaim resources
and prevent starvation.
and prevent starvation.
During a node-pressure eviction, the kubelet sets the `PodPhase` for the
selected pods to `Failed`. This terminates the pods.
selected pods to `Failed`. This terminates the pods.
Node-pressure eviction is not the same as
Node-pressure eviction is not the same as
[API-initiated eviction](/docs/concepts/scheduling-eviction/api-eviction/).
The kubelet does not respect your configured `PodDisruptionBudget` or the pod's
@ -26,7 +26,7 @@ the kubelet respects your configured `eviction-max-pod-grace-period`. If you use
If the pods are managed by a {{< glossary_tooltip text="workload" term_id="workload" >}}
resource (such as {{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}}
or {{< glossary_tooltip text="Deployment" term_id="deployment" >}}) that
replaces failed pods, the control plane or `kube-controller-manager` creates new
replaces failed pods, the control plane or `kube-controller-manager` creates new
pods in place of the evicted pods.
{{<note>}}
@ -37,16 +37,16 @@ images when disk resources are starved.
The kubelet uses various parameters to make eviction decisions, like the following:
* Eviction signals
* Eviction thresholds
* Monitoring intervals
- Eviction signals
- Eviction thresholds
- Monitoring intervals
### Eviction signals {#eviction-signals}
Eviction signals are the current state of a particular resource at a specific
point in time. Kubelet uses eviction signals to make eviction decisions by
comparing the signals to eviction thresholds, which are the minimum amount of
the resource that should be available on the node.
comparing the signals to eviction thresholds, which are the minimum amount of
the resource that should be available on the node.
Kubelet uses the following eviction signals:
@ -60,9 +60,9 @@ Kubelet uses the following eviction signals:
| `pid.available` | `pid.available` := `node.stats.rlimit.maxpid` - `node.stats.rlimit.curproc` |
In this table, the `Description` column shows how kubelet gets the value of the
signal. Each signal supports either a percentage or a literal value. Kubelet
signal. Each signal supports either a percentage or a literal value. Kubelet
calculates the percentage value relative to the total capacity associated with
the signal.
the signal.
The value for `memory.available` is derived from the cgroupfs instead of tools
like `free -m`. This is important because `free -m` does not work in a
@ -78,7 +78,7 @@ memory is reclaimable under pressure.
The kubelet supports the following filesystem partitions:
1. `nodefs`: The node's main filesystem, used for local disk volumes, emptyDir,
log storage, and more. For example, `nodefs` contains `/var/lib/kubelet/`.
log storage, and more. For example, `nodefs` contains `/var/lib/kubelet/`.
1. `imagefs`: An optional filesystem that container runtimes use to store container
images and container writable layers.
@ -102,10 +102,10 @@ eviction decisions.
Eviction thresholds have the form `[eviction-signal][operator][quantity]`, where:
* `eviction-signal` is the [eviction signal](#eviction-signals) to use.
* `operator` is the [relational operator](https://en.wikipedia.org/wiki/Relational_operator#Standard_relational_operators)
- `eviction-signal` is the [eviction signal](#eviction-signals) to use.
- `operator` is the [relational operator](https://en.wikipedia.org/wiki/Relational_operator#Standard_relational_operators)
you want, such as `<` (less than).
* `quantity` is the eviction threshold amount, such as `1Gi`. The value of `quantity`
- `quantity` is the eviction threshold amount, such as `1Gi`. The value of `quantity`
must match the quantity representation used by Kubernetes. You can use either
literal values or percentages (`%`).
@ -120,22 +120,22 @@ You can configure soft and hard eviction thresholds.
A soft eviction threshold pairs an eviction threshold with a required
administrator-specified grace period. The kubelet does not evict pods until the
grace period is exceeded. The kubelet returns an error on startup if there is no
specified grace period.
specified grace period.
You can specify both a soft eviction threshold grace period and a maximum
allowed pod termination grace period for kubelet to use during evictions. If you
specify a maximum allowed grace period and the soft eviction threshold is met,
specify a maximum allowed grace period and the soft eviction threshold is met,
the kubelet uses the lesser of the two grace periods. If you do not specify a
maximum allowed grace period, the kubelet kills evicted pods immediately without
graceful termination.
You can use the following flags to configure soft eviction thresholds:
* `eviction-soft`: A set of eviction thresholds like `memory.available<1.5Gi`
- `eviction-soft`: A set of eviction thresholds like `memory.available<1.5Gi`
that can trigger pod eviction if held over the specified grace period.
* `eviction-soft-grace-period`: A set of eviction grace periods like `memory.available=1m30s`
- `eviction-soft-grace-period`: A set of eviction grace periods like `memory.available=1m30s`
that define how long a soft eviction threshold must hold before triggering a Pod eviction.
* `eviction-max-pod-grace-period`: The maximum allowed grace period (in seconds)
- `eviction-max-pod-grace-period`: The maximum allowed grace period (in seconds)
to use when terminating pods in response to a soft eviction threshold being met.
#### Hard eviction thresholds {#hard-eviction-thresholds}
@ -144,20 +144,20 @@ A hard eviction threshold has no grace period. When a hard eviction threshold is
met, the kubelet kills pods immediately without graceful termination to reclaim
the starved resource.
You can use the `eviction-hard` flag to configure a set of hard eviction
thresholds like `memory.available<1Gi`.
You can use the `eviction-hard` flag to configure a set of hard eviction
thresholds like `memory.available<1Gi`.
The kubelet has the following default hard eviction thresholds:
* `memory.available<100Mi`
* `nodefs.available<10%`
* `imagefs.available<15%`
* `nodefs.inodesFree<5%` (Linux nodes)
- `memory.available<100Mi`
- `nodefs.available<10%`
- `imagefs.available<15%`
- `nodefs.inodesFree<5%` (Linux nodes)
These default values of hard eviction thresholds will only be set if none
of the parameters is changed. If you changed the value of any parameter,
then the values of other parameters will not be inherited as the default
values and will be set to zero. In order to provide custom values, you
These default values of hard eviction thresholds will only be set if none
of the parameters is changed. If you changed the value of any parameter,
then the values of other parameters will not be inherited as the default
values and will be set to zero. In order to provide custom values, you
should provide all the thresholds respectively.
### Eviction monitoring interval
@ -169,9 +169,9 @@ which defaults to `10s`.
The kubelet reports node conditions to reflect that the node is under pressure
because hard or soft eviction threshold is met, independent of configured grace
periods.
periods.
The kubelet maps eviction signals to node conditions as follows:
The kubelet maps eviction signals to node conditions as follows:
| Node Condition | Eviction Signal | Description |
|-------------------|---------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
@ -179,7 +179,7 @@ The kubelet maps eviction signals to node conditions as follows:
| `DiskPressure` | `nodefs.available`, `nodefs.inodesFree`, `imagefs.available`, or `imagefs.inodesFree` | Available disk space and inodes on either the node's root filesystem or image filesystem has satisfied an eviction threshold |
| `PIDPressure` | `pid.available` | Available processes identifiers on the (Linux) node has fallen below an eviction threshold |
The kubelet updates the node conditions based on the configured
The kubelet updates the node conditions based on the configured
`--node-status-update-frequency`, which defaults to `10s`.
#### Node condition oscillation
@ -197,17 +197,17 @@ condition to a different state. The transition period has a default value of `5m
The kubelet tries to reclaim node-level resources before it evicts end-user pods.
When a `DiskPressure` node condition is reported, the kubelet reclaims node-level
resources based on the filesystems on the node.
resources based on the filesystems on the node.
#### With `imagefs`
If the node has a dedicated `imagefs` filesystem for container runtimes to use,
the kubelet does the following:
* If the `nodefs` filesystem meets the eviction thresholds, the kubelet garbage collects
dead pods and containers.
* If the `imagefs` filesystem meets the eviction thresholds, the kubelet
deletes all unused images.
- If the `nodefs` filesystem meets the eviction thresholds, the kubelet garbage collects
dead pods and containers.
- If the `imagefs` filesystem meets the eviction thresholds, the kubelet
deletes all unused images.
#### Without `imagefs`
@ -220,7 +220,7 @@ the kubelet frees up disk space in the following order:
### Pod selection for kubelet eviction
If the kubelet's attempts to reclaim node-level resources don't bring the eviction
signal below the threshold, the kubelet begins to evict end-user pods.
signal below the threshold, the kubelet begins to evict end-user pods.
The kubelet uses the following parameters to determine the pod eviction order:
@ -238,7 +238,7 @@ As a result, kubelet ranks and evicts pods in the following order:
{{<note>}}
The kubelet does not use the pod's QoS class to determine the eviction order.
You can use the QoS class to estimate the most likely pod eviction order when
You can use the QoS class to estimate the most likely pod eviction order when
reclaiming resources like memory. QoS does not apply to EphemeralStorage requests,
so the above scenario will not apply if the node is, for example, under `DiskPressure`.
{{</note>}}
@ -246,7 +246,7 @@ so the above scenario will not apply if the node is, for example, under `DiskPre
`Guaranteed` pods are guaranteed only when requests and limits are specified for
all the containers and they are equal. These pods will never be evicted because
of another pod's resource consumption. If a system daemon (such as `kubelet`
and `journald`) is consuming more resources than were reserved via
and `journald`) is consuming more resources than were reserved via
`system-reserved` or `kube-reserved` allocations, and the node only has
`Guaranteed` or `Burstable` pods using less resources than requests left on it,
then the kubelet must choose to evict one of these pods to preserve node stability
@ -277,14 +277,14 @@ disk usage (`local volumes + logs & writable layer of all containers`)
In some cases, pod eviction only reclaims a small amount of the starved resource.
This can lead to the kubelet repeatedly hitting the configured eviction thresholds
and triggering multiple evictions.
and triggering multiple evictions.
You can use the `--eviction-minimum-reclaim` flag or a [kubelet config file](/docs/tasks/administer-cluster/kubelet-config-file/)
to configure a minimum reclaim amount for each resource. When the kubelet notices
that a resource is starved, it continues to reclaim that resource until it
reclaims the quantity you specify.
reclaims the quantity you specify.
For example, the following configuration sets minimum reclaim amounts:
For example, the following configuration sets minimum reclaim amounts:
```yaml
apiVersion: kubelet.config.k8s.io/v1beta1
@ -302,10 +302,10 @@ evictionMinimumReclaim:
In this example, if the `nodefs.available` signal meets the eviction threshold,
the kubelet reclaims the resource until the signal reaches the threshold of `1Gi`,
and then continues to reclaim the minimum amount of `500Mi` it until the signal
reaches `1.5Gi`.
reaches `1.5Gi`.
Similarly, the kubelet reclaims the `imagefs` resource until the `imagefs.available`
signal reaches `102Gi`.
signal reaches `102Gi`.
The default `eviction-minimum-reclaim` is `0` for all resources.
@ -336,7 +336,7 @@ for each container. It then kills the container with the highest score.
This means that containers in low QoS pods that consume a large amount of memory
relative to their scheduling requests are killed first.
Unlike pod eviction, if a container is OOM killed, the `kubelet` can restart it
Unlike pod eviction, if a container is OOM killed, the `kubelet` can restart it
based on its `RestartPolicy`.
### Best practices {#node-pressure-eviction-good-practices}
@ -351,9 +351,9 @@ immediately induce memory pressure.
Consider the following scenario:
* Node memory capacity: `10Gi`
* Operator wants to reserve 10% of memory capacity for system daemons (kernel, `kubelet`, etc.)
* Operator wants to evict Pods at 95% memory utilization to reduce incidence of system OOM.
- Node memory capacity: `10Gi`
- Operator wants to reserve 10% of memory capacity for system daemons (kernel, `kubelet`, etc.)
- Operator wants to evict Pods at 95% memory utilization to reduce incidence of system OOM.
For this to work, the kubelet is launched as follows:
@ -363,18 +363,18 @@ For this to work, the kubelet is launched as follows:
```
In this configuration, the `--system-reserved` flag reserves `1.5Gi` of memory
for the system, which is `10% of the total memory + the eviction threshold amount`.
for the system, which is `10% of the total memory + the eviction threshold amount`.
The node can reach the eviction threshold if a pod is using more than its request,
or if the system is using more than `1Gi` of memory, which makes the `memory.available`
signal fall below `500Mi` and triggers the threshold.
signal fall below `500Mi` and triggers the threshold.
#### DaemonSet
Pod Priority is a major factor in making eviction decisions. If you do not want
the kubelet to evict pods that belong to a `DaemonSet`, give those pods a high
enough `priorityClass` in the pod spec. You can also use a lower `priorityClass`
or the default to only allow `DaemonSet` pods to run when there are enough
or the default to only allow `DaemonSet` pods to run when there are enough
resources.
### Known issues
@ -386,7 +386,7 @@ The following sections describe known issues related to out of resource handling
By default, the kubelet polls `cAdvisor` to collect memory usage stats at a
regular interval. If memory usage increases within that window rapidly, the
kubelet may not observe `MemoryPressure` fast enough, and the `OOMKiller`
will still be invoked.
will still be invoked.
You can use the `--kernel-memcg-notification` flag to enable the `memcg`
notification API on the kubelet to get notified immediately when a threshold
@ -394,29 +394,29 @@ is crossed.
If you are not trying to achieve extreme utilization, but a sensible measure of
overcommit, a viable workaround for this issue is to use the `--kube-reserved`
and `--system-reserved` flags to allocate memory for the system.
and `--system-reserved` flags to allocate memory for the system.
#### active_file memory is not considered as available memory
On Linux, the kernel tracks the number of bytes of file-backed memory on active
On Linux, the kernel tracks the number of bytes of file-backed memory on active
LRU list as the `active_file` statistic. The kubelet treats `active_file` memory
areas as not reclaimable. For workloads that make intensive use of block-backed
local storage, including ephemeral local storage, kernel-level caches of file
and block data means that many recently accessed cache pages are likely to be
counted as `active_file`. If enough of these kernel block buffers are on the
active LRU list, the kubelet is liable to observe this as high resource use and
areas as not reclaimable. For workloads that make intensive use of block-backed
local storage, including ephemeral local storage, kernel-level caches of file
and block data means that many recently accessed cache pages are likely to be
counted as `active_file`. If enough of these kernel block buffers are on the
active LRU list, the kubelet is liable to observe this as high resource use and
taint the node as experiencing memory pressure - triggering pod eviction.
For more details, see [https://github.com/kubernetes/kubernetes/issues/43916](https://github.com/kubernetes/kubernetes/issues/43916)
You can work around that behavior by setting the memory limit and memory request
the same for containers likely to perform intensive I/O activity. You will need
the same for containers likely to perform intensive I/O activity. You will need
to estimate or measure an optimal memory limit value for that container.
## {{% heading "whatsnext" %}}
* Learn about [API-initiated Eviction](/docs/concepts/scheduling-eviction/api-eviction/)
* Learn about [Pod Priority and Preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption/)
* Learn about [PodDisruptionBudgets](/docs/tasks/run-application/configure-pdb/)
* Learn about [Quality of Service](/docs/tasks/configure-pod-container/quality-service-pod/) (QoS)
* Check out the [Eviction API](/docs/reference/generated/kubernetes-api/{{<param "version">}}/#create-eviction-pod-v1-core)
- Learn about [API-initiated Eviction](/docs/concepts/scheduling-eviction/api-eviction/)
- Learn about [Pod Priority and Preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption/)
- Learn about [PodDisruptionBudgets](/docs/tasks/run-application/configure-pdb/)
- Learn about [Quality of Service](/docs/tasks/configure-pod-container/quality-service-pod/) (QoS)
- Check out the [Eviction API](/docs/reference/generated/kubernetes-api/{{<param "version">}}/#create-eviction-pod-v1-core)

View File

@ -10,7 +10,7 @@ weight: 80
<!-- overview -->
In the [scheduling-plugin](/docs/reference/scheduling/config/#scheduling-plugins) `NodeResourcesFit` of kube-scheduler, there are two
In the [scheduling-plugin](/docs/reference/scheduling/config/#scheduling-plugins) `NodeResourcesFit` of kube-scheduler, there are two
scoring strategies that support the bin packing of resources: `MostAllocated` and `RequestedToCapacityRatio`.
<!-- body -->
@ -42,7 +42,7 @@ profiles:
name: NodeResourcesFit
```
To learn more about other parameters and their default configuration, see the API documentation for
To learn more about other parameters and their default configuration, see the API documentation for
[`NodeResourcesFitArgs`](/docs/reference/config-api/kube-scheduler-config.v1beta3/#kubescheduler-config-k8s-io-v1beta3-NodeResourcesFitArgs).
## Enabling bin packing using RequestedToCapacityRatio
@ -55,10 +55,10 @@ configured function of the allocated resources. The behavior of the `RequestedTo
the `NodeResourcesFit` score function can be controlled by the
[scoringStrategy](/docs/reference/config-api/kube-scheduler-config.v1beta3/#kubescheduler-config-k8s-io-v1beta3-ScoringStrategy) field.
Within the `scoringStrategy` field, you can configure two parameters: `requestedToCapacityRatio` and
`resources`. The `shape` in the `requestedToCapacityRatio`
parameter allows the user to tune the function as least requested or most
requested based on `utilization` and `score` values. The `resources` parameter
consists of `name` of the resource to be considered during scoring and `weight`
`resources`. The `shape` in the `requestedToCapacityRatio`
parameter allows the user to tune the function as least requested or most
requested based on `utilization` and `score` values. The `resources` parameter
consists of `name` of the resource to be considered during scoring and `weight`
specify the weight of each resource.
Below is an example configuration that sets
@ -87,11 +87,11 @@ profiles:
name: NodeResourcesFit
```
Referencing the `KubeSchedulerConfiguration` file with the kube-scheduler
flag `--config=/path/to/config/file` will pass the configuration to the
Referencing the `KubeSchedulerConfiguration` file with the kube-scheduler
flag `--config=/path/to/config/file` will pass the configuration to the
scheduler.
To learn more about other parameters and their default configuration, see the API documentation for
To learn more about other parameters and their default configuration, see the API documentation for
[`NodeResourcesFitArgs`](/docs/reference/config-api/kube-scheduler-config.v1beta3/#kubescheduler-config-k8s-io-v1beta3-NodeResourcesFitArgs).
### Tuning the score function
@ -100,10 +100,10 @@ To learn more about other parameters and their default configuration, see the AP
```yaml
shape:
- utilization: 0
score: 0
- utilization: 100
score: 10
- utilization: 0
score: 0
- utilization: 100
score: 10
```
The above arguments give the node a `score` of 0 if `utilization` is 0% and 10 for
@ -120,7 +120,7 @@ shape:
`resources` is an optional parameter which defaults to:
``` yaml
```yaml
resources:
- name: cpu
weight: 1
@ -128,7 +128,7 @@ resources:
weight: 1
```
It can be used to add extended resources as follows:
It can be used to add extended resources as follows:
```yaml
resources:
@ -188,8 +188,8 @@ intel.com/foo = resourceScoringFunction((2+1),4)
= (100 - ((4-3)*100/4)
= (100 - 25)
= 75 # requested + used = 75% * available
= rawScoringFunction(75)
= 7 # floor(75/10)
= rawScoringFunction(75)
= 7 # floor(75/10)
memory = resourceScoringFunction((256+256),1024)
= (100 -((1024-512)*100/1024))
@ -251,4 +251,3 @@ NodeScore = (5 * 5) + (7 * 1) + (10 * 3) / (5 + 1 + 3)
- Read more about the [scheduling framework](/docs/concepts/scheduling-eviction/scheduling-framework/)
- Read more about [scheduler configuration](/docs/reference/scheduling/config/)

View File

@ -8,8 +8,8 @@ weight: 90
<!-- overview -->
The Kubernetes API server is the main point of entry to a cluster for external parties
(users and services) interacting with it.
The Kubernetes API server is the main point of entry to a cluster for external parties
(users and services) interacting with it.
As part of this role, the API server has several key built-in security controls, such as
audit logging and {{< glossary_tooltip text="admission controllers" term_id="admission-controller" >}}.
@ -48,13 +48,13 @@ API server. However, the Pod still runs on the node. For more information, refer
### Mitigations {#static-pods-mitigations}
- Only [enable the kubelet static Pod manifest functionality](/docs/tasks/configure-pod-container/static-pod/#static-pod-creation)
if required by the node.
if required by the node.
- If a node uses the static Pod functionality, restrict filesystem access to the static Pod manifest directory
or URL to users who need the access.
or URL to users who need the access.
- Restrict access to kubelet configuration parameters and files to prevent an attacker setting
a static Pod path or URL.
a static Pod path or URL.
- Regularly audit and centrally report all access to directories or web storage locations that host
static Pod manifests and kubelet configuration files.
static Pod manifests and kubelet configuration files.
## The kubelet API {#kubelet-api}
@ -73,7 +73,7 @@ Direct access to the kubelet API is not subject to admission control and is not
by Kubernetes audit logging. An attacker with direct access to this API may be able to
bypass controls that detect or prevent certain actions.
The kubelet API can be configured to authenticate requests in a number of ways.
The kubelet API can be configured to authenticate requests in a number of ways.
By default, the kubelet configuration allows anonymous access. Most Kubernetes providers
change the default to use webhook and certificate authentication. This lets the control plane
ensure that the caller is authorized to access the `nodes` API resource or sub-resources.
@ -86,7 +86,7 @@ The default anonymous access doesn't make this assertion with the control plane.
such as by monitoring services.
- Restrict access to the kubelet port. Only allow specified and trusted IP address
ranges to access the port.
- Ensure that [kubelet authentication](/docs/reference/access-authn-authz/kubelet-authn-authz/#kubelet-authentication).
- Ensure that [kubelet authentication](/docs/reference/access-authn-authz/kubelet-authn-authz/#kubelet-authentication).
is set to webhook or certificate mode.
- Ensure that the unauthenticated "read-only" Kubelet port is not enabled on the cluster.
@ -108,7 +108,7 @@ cluster admin rights by accessing cluster secrets or modifying access rules. Eve
elevating their Kubernetes RBAC privileges, an attacker who can modify etcd can retrieve any API object
or create new workloads inside the cluster.
Many Kubernetes providers configure
Many Kubernetes providers configure
etcd to use mutual TLS (both client and server verify each other's certificate for authentication).
There is no widely accepted implementation of authorization for the etcd API, although
the feature exists. Since there is no authorization model, any certificate
@ -124,10 +124,9 @@ that are only used for health checking can also grant full read and write access
- Consider restricting access to the etcd port at a network level, to only allow access
from specified and trusted IP address ranges.
## Container runtime socket {#runtime-socket}
On each node in a Kubernetes cluster, access to interact with containers is controlled
On each node in a Kubernetes cluster, access to interact with containers is controlled
by the container runtime (or runtimes, if you have configured more than one). Typically,
the container runtime exposes a Unix socket that the kubelet can access. An attacker with
access to this socket can launch new containers or interact with running containers.
@ -139,12 +138,12 @@ control plane components.
### Mitigations {#runtime-socket-mitigations}
- Ensure that you tightly control filesystem access to container runtime sockets.
When possible, restrict this access to the `root` user.
- Ensure that you tightly control filesystem access to container runtime sockets.
When possible, restrict this access to the `root` user.
- Isolate the kubelet from other components running on the node, using
mechanisms such as Linux kernel namespaces.
mechanisms such as Linux kernel namespaces.
- Ensure that you restrict or forbid the use of [`hostPath` mounts](/docs/concepts/storage/volumes/#hostpath)
that include the container runtime socket, either directly or by mounting a parent
directory. Also `hostPath` mounts must be set as read-only to mitigate risks
of attackers bypassing directory restrictions.
that include the container runtime socket, either directly or by mounting a parent
directory. Also `hostPath` mounts must be set as read-only to mitigate risks
of attackers bypassing directory restrictions.
- Restrict user access to nodes, and especially restrict superuser access to nodes.

View File

@ -131,4 +131,7 @@ current policy level:
- [Enforcing Pod Security Standards](/docs/setup/best-practices/enforcing-pod-security-standards)
- [Enforce Pod Security Standards by Configuring the Built-in Admission Controller](/docs/tasks/configure-pod-container/enforce-standards-admission-controller)
- [Enforce Pod Security Standards with Namespace Labels](/docs/tasks/configure-pod-container/enforce-standards-namespace-labels)
- [Migrate from PodSecurityPolicy to the Built-In PodSecurity Admission Controller](/docs/tasks/configure-pod-container/migrate-from-psp)
If you are running an older version of Kubernetes and want to upgrade
to a version of Kubernetes that does not include PodSecurityPolicies,
read [migrate from PodSecurityPolicy to the Built-In PodSecurity Admission Controller](/docs/tasks/configure-pod-container/migrate-from-psp).

View File

@ -0,0 +1,266 @@
---
title: Service Accounts
description: >
Learn about ServiceAccount objects in Kubernetes.
content_type: concept
weight: 10
---
<!-- overview -->
This page introduces the ServiceAccount object in Kubernetes, providing
information about how service accounts work, use cases, limitations,
alternatives, and links to resources for additional guidance.
<!-- body -->
## What are service accounts? {#what-are-service-accounts}
A service account is a type of non-human account that, in Kubernetes, provides
a distinct identity in a Kubernetes cluster. Application Pods, system
components, and entities inside and outside the cluster can use a specific
ServiceAccount's credentials to identify as that ServiceAccount. This identity
is useful in various situations, including authenticating to the API server or
implementing identity-based security policies.
Service accounts exist as ServiceAccount objects in the API server. Service
accounts have the following properties:
* **Namespaced:** Each service account is bound to a Kubernetes
{{<glossary_tooltip text="namespace" term_id="namespace">}}. Every namespace
gets a [`default` ServiceAccount](#default-service-accounts) upon creation.
* **Lightweight:** Service accounts exist in the cluster and are
defined in the Kubernetes API. You can quickly create service accounts to
enable specific tasks.
* **Portable:** A configuration bundle for a complex containerized workload
might include service account definitions for the system's components. The
lightweight nature of service accounts and the namespaced identities make
the configurations portable.
Service accounts are different from user accounts, which are authenticated
human users in the cluster. By default, user accounts don't exist in the Kubernetes
API server; instead, the API server treats user identities as opaque
data. You can authenticate as a user account using multiple methods. Some
Kubernetes distributions might add custom extension APIs to represent user
accounts in the API server.
{{< table caption="Comparison between service accounts and users" >}}
| Description | ServiceAccount | User or group |
| --- | --- | --- |
| Location | Kubernetes API (ServiceAccount object) | External |
| Access control | Kubernetes RBAC or other [authorization mechanisms](/docs/reference/access-authn-authz/authorization/#authorization-modules) | Kubernetes RBAC or other identity and access management mechanisms |
| Intended use | Workloads, automation | People |
{{< /table >}}
### Default service accounts {#default-service-accounts}
When you create a cluster, Kubernetes automatically creates a ServiceAccount
object named `default` for every namespace in your cluster. The `default`
service accounts in each namespace get no permissions by default other than the
[default API discovery permissions](/docs/reference/access-authn-authz/rbac/#default-roles-and-role-bindings)
that Kubernetes grants to all authenticated principals if role-based access control (RBAC) is enabled.
If you delete the `default` ServiceAccount object in a namespace, the
{{< glossary_tooltip text="control plane" term_id="control-plane" >}}
replaces it with a new one.
If you deploy a Pod in a namespace, and you don't
[manually assign a ServiceAccount to the Pod](#assign-to-pod), Kubernetes
assigns the `default` ServiceAccount for that namespace to the Pod.
## Use cases for Kubernetes service accounts {#use-cases}
As a general guideline, you can use service accounts to provide identities in
the following scenarios:
* Your Pods need to communicate with the Kubernetes API server, for example in
situations such as the following:
* Providing read-only access to sensitive information stored in Secrets.
* Granting [cross-namespace access](#cross-namespace), such as allowing a
Pod in namespace `example` to read, list, and watch for Lease objects in
the `kube-node-lease` namespace.
* Your Pods need to communicate with an external service. For example, a
workload Pod requires an identity for a commercially available cloud API,
and the commercial provider allows configuring a suitable trust relationship.
* [Authenticating to a private image registry using an `imagePullSecret`](/docs/tasks/configure-pod-container/configure-service-account/#add-imagepullsecrets-to-a-service-account).
* An external service needs to communicate with the Kubernetes API server. For
example, authenticating to the cluster as part of a CI/CD pipeline.
* You use third-party security software in your cluster that relies on the
ServiceAccount identity of different Pods to group those Pods into different
contexts.
## How to use service accounts {#how-to-use}
To use a Kubernetes service account, you do the following:
1. Create a ServiceAccount object using a Kubernetes
client like `kubectl` or a manifest that defines the object.
1. Grant permissions to the ServiceAccount object using an authorization
mechanism such as
[RBAC](https://kubernetes.io/docs/reference/access-authn-authz/rbac/).
1. Assign the ServiceAccount object to Pods during Pod creation.
If you're using the identity from an external service,
[retrieve the ServiceAccount token](#get-a-token) and use it from that
service instead.
For instructions, refer to
[Configure Service Accounts for Pods](/docs/tasks/configure-pod-container/configure-service-account/).
### Grant permissions to a ServiceAccount {#grant-permissions}
You can use the built-in Kubernetes
[role-based access control (RBAC)](/docs/reference/access-authn-authz/rbac/)
mechanism to grant the minimum permissions required by each service account.
You create a *role*, which grants access, and then *bind* the role to your
ServiceAccount. RBAC lets you define a minimum set of permissions so that the
service account permissions follow the principle of least privilege. Pods that
use that service account don't get more permissions than are required to
function correctly.
For instructions, refer to
[ServiceAccount permissions](/docs/reference/access-authn-authz/rbac/#service-account-permissions).
#### Cross-namespace access using a ServiceAccount {#cross-namespace}
You can use RBAC to allow service accounts in one namespace to perform actions
on resources in a different namespace in the cluster. For example, consider a
scenario where you have a service account and Pod in the `dev` namespace and
you want your Pod to see Jobs running in the `maintenance` namespace. You could
create a Role object that grants permissions to list Job objects. Then,
you'd create a RoleBinding object in the `maintenance` namespace to bind the
Role to the ServiceAccount object. Now, Pods in the `dev` namespace can list
Job objects in the `maintenance` namespace using that service account.
### Assign a ServiceAccount to a Pod {#assign-to-pod}
To assign a ServiceAccount to a Pod, you set the `spec.serviceAccountName`
field in the Pod specification. Kubernetes then automatically provides the
credentials for that ServiceAccount to the Pod. In v1.22 and later, Kubernetes
gets a short-lived, **automatically rotating** token using the `TokenRequest`
API and mounts the token as a
[projected volume](/docs/concepts/storage/projected-volumes/#serviceaccounttoken).
By default, Kubernetes provides the Pod
with the credentials for an assigned ServiceAccount, whether that is the
`default` ServiceAccount or a custom ServiceAccount that you specify.
To prevent Kubernetes from automatically injecting
credentials for a specified ServiceAccount or the `default` ServiceAccount, set the
`automountServiceAccountToken` field in your Pod specification to `false`.
<!-- OK to remove this historical detail after Kubernetes 1.31 is released -->
In versions earlier than 1.22, Kubernetes provides a long-lived, static token
to the Pod as a Secret.
#### Manually retrieve ServiceAccount credentials {#get-a-token}
If you need the credentials for a ServiceAccount to mount in a non-standard
location, or for an audience that isn't the API server, use one of the
following methods:
* [TokenRequest API](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/)
(recommended): Request a short-lived service account token from within
your own *application code*. The token expires automatically and can rotate
upon expiration.
If you have a legacy application that is not aware of Kubernetes, you
could use a sidecar container within the same pod to fetch these tokens
and make them available to the application workload.
* [Token Volume Projection](/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection)
(also recommended): In Kubernetes v1.20 and later, use the Pod specification to
tell the kubelet to add the service account token to the Pod as a
*projected volume*. Projected tokens expire automatically, and the kubelet
rotates the token before it expires.
* [Service Account Token Secrets](/docs/tasks/configure-pod-container/configure-service-account/#manually-create-a-service-account-api-token)
(not recommended): You can mount service account tokens as Kubernetes
Secrets in Pods. These tokens don't expire and don't rotate. This method
is not recommended, especially at scale, because of the risks associated
with static, long-lived credentials. In Kubernetes v1.24 and later, the
[LegacyServiceAccountTokenNoAutoGeneration feature gate](/docs/reference/command-line-tools-reference/feature-gates/#feature-gates-for-graduated-or-deprecated-features)
prevents Kubernetes from automatically creating these tokens for
ServiceAccounts. `LegacyServiceAccountTokenNoAutoGeneration` is enabled
by default; in other words, Kubernetes does not create these tokens.
## Authenticating service account credentials {#authenticating-credentials}
ServiceAccounts use signed
{{<glossary_tooltip term_id="jwt" text="JSON Web Tokens">}} (JWTs)
to authenticate to the Kubernetes API server, and to any other system where a
trust relationship exists. Depending on how the token was issued
(either time-limited using a `TokenRequest` or using a legacy mechanism with
a Secret), a ServiceAccount token might also have an expiry time, an audience,
and a time after which the token *starts* being valid. When a client that is
acting as a ServiceAccount tries to communicate with the Kubernetes API server,
the client includes an `Authorization: Bearer <token>` header with the HTTP
request. The API server checks the validity of that bearer token as follows:
1. Check the token signature.
1. Check whether the token has expired.
1. Check whether object references in the token claims are currently valid.
1. Check whether the token is currently valid.
1. Check the audience claims.
The TokenRequest API produces _bound tokens_ for a ServiceAccount. This
binding is linked to the lifetime of the client, such as a Pod, that is acting
as that ServiceAccount.
For tokens issued using the `TokenRequest` API, the API server also checks that
the specific object reference that is using the ServiceAccount still exists,
matching by the {{< glossary_tooltip term_id="uid" text="unique ID" >}} of that
object. For legacy tokens that are mounted as Secrets in Pods, the API server
checks the token against the Secret.
For more information about the authentication process, refer to
[Authentication](/docs/reference/access-authn-authz/authentication/#service-account-tokens).
### Authenticating service account credentials in your own code {#authenticating-in-code}
If you have services of your own that need to validate Kubernetes service
account credentials, you can use the following methods:
* [TokenReview API](/docs/reference/kubernetes-api/authentication-resources/token-review-v1/)
(recommended)
* OIDC discovery
The Kubernetes project recommends that you use the TokenReview API, because
this method invalidates tokens that are bound to API objects such as Secrets,
ServiceAccounts, and Pods when those objects are deleted. For example, if you
delete the Pod that contains a projected ServiceAccount token, the cluster
invalidates that token immediately and a TokenReview immediately fails.
If you use OIDC validation instead, your clients continue to treat the token
as valid until the token reaches its expiration timestamp.
Your application should always define the audience that it accepts, and should
check that the token's audiences match the audiences that the application
expects. This helps to minimize the scope of the token so that it can only be
used in your application and nowhere else.
## Alternatives
* Issue your own tokens using another mechanism, and then use
[Webhook Token Authentication](/docs/reference/access-authn-authz/authentication/#webhook-token-authentication)
to validate bearer tokens using your own validation service.
* Provide your own identities to Pods.
* [Use the SPIFFE CSI driver plugin to provide SPIFFE SVIDs as X.509 certificate pairs to Pods](https://cert-manager.io/docs/projects/csi-driver-spiffe/).
{{% thirdparty-content single="true" %}}
* [Use a service mesh such as Istio to provide certificates to Pods](https://istio.io/latest/docs/tasks/security/cert-management/plugin-ca-cert/).
* Authenticate from outside the cluster to the API server without using service account tokens:
* [Configure the API server to accept OpenID Connect (OIDC) tokens from your identity provider](/docs/reference/access-authn-authz/authentication/#openid-connect-tokens).
* Use service accounts or user accounts created using an external Identity
and Access Management (IAM) service, such as from a cloud provider, to
authenticate to your cluster.
* [Use the CertificateSigningRequest API with client certificates](/docs/tasks/tls/managing-tls-in-a-cluster/).
* [Configure the kubelet to retrieve credentials from an image registry](/docs/tasks/administer-cluster/kubelet-credential-provider/).
* Use a Device Plugin to access a virtual Trusted Platform Module (TPM), which
then allows authentication using a private key.
## {{% heading "whatsnext" %}}
* Learn how to [manage your ServiceAccounts as a cluster administrator](/docs/reference/access-authn-authz/service-accounts-admin/).
* Learn how to [assign a ServiceAccount to a Pod](/docs/tasks/configure-pod-container/configure-service-account/).
* Read the [ServiceAccount API reference](/docs/reference/kubernetes-api/authentication-resources/service-account-v1/).

View File

@ -306,7 +306,7 @@ When the Pod above is created, the container `test` gets the following contents
in its `/etc/resolv.conf` file:
```
nameserver 1.2.3.4
nameserver 192.0.2.1
search ns1.svc.cluster-domain.example my.dns.search.suffix
options ndots:2 edns0
```

View File

@ -104,7 +104,7 @@ the pod is also terminating.
{{< note >}}
Although `serving` is almost identical to `ready`, it was added to prevent break the existing meaning
Although `serving` is almost identical to `ready`, it was added to prevent breaking the existing meaning
of `ready`. It may be unexpected for existing clients if `ready` could be `true` for terminating
endpoints, since historically terminating endpoints were never included in the Endpoints or
EndpointSlice API to begin with. For this reason, `ready` is _always_ `false` for terminating

View File

@ -18,22 +18,23 @@ weight: 10
{{< glossary_definition term_id="service" length="short" >}}
With Kubernetes you don't need to modify your application to use an unfamiliar service discovery mechanism.
Kubernetes gives Pods their own IP addresses and a single DNS name for a set of Pods,
and can load-balance across them.
A key aim of Services in Kubernetes is that you don't need to modify your existing
application to use an unfamiliar service discovery mechanism.
You can run code in Pods, whether this is a code designed for a cloud-native world, or
an older app you've containerized. You use a Service to make that set of Pods available
on the network so that clients can interact with it.
<!-- body -->
## Motivation
Kubernetes {{< glossary_tooltip term_id="pod" text="Pods" >}} are created and destroyed
to match the desired state of your cluster. Pods are nonpermanent resources.
If you use a {{< glossary_tooltip term_id="deployment" >}} to run your app,
it can create and destroy Pods dynamically.
that Deployment can create and destroy Pods dynamically. From one moment to the next,
you don't know how many of those Pods are working and healthy; you might not even know
what those healthy Pods are named.
Kubernetes {{< glossary_tooltip term_id="pod" text="Pods" >}} are created and destroyed
to match the desired state of your cluster. Pods are emphemeral resources (you should not
expect that an individual Pod is reliable and durable).
Each Pod gets its own IP address, however in a Deployment, the set of Pods
running in one moment in time could be different from
the set of Pods running that application a moment later.
Each Pod gets its own IP address (Kubernetes expects network plugins to ensure this).
For a given Deployment in your cluster, the set of Pods running in one moment in
time could be different from the set of Pods running that application a moment later.
This leads to a problem: if some set of Pods (call them "backends") provides
functionality to other Pods (call them "frontends") inside your cluster,
@ -42,14 +43,13 @@ to, so that the frontend can use the backend part of the workload?
Enter _Services_.
## Service resources {#service-resource}
<!-- body -->
In Kubernetes, a Service is an abstraction which defines a logical set of Pods
and a policy by which to access them (sometimes this pattern is called
a micro-service). The set of Pods targeted by a Service is usually determined
by a {{< glossary_tooltip text="selector" term_id="selector" >}}.
To learn about other ways to define Service endpoints,
see [Services _without_ selectors](#services-without-selectors).
## Services in Kubernetes
The Service API, part of Kubernetes, is an abstraction to help you expose groups of
Pods over a network. Each Service object defines a logical set of endpoints (usually
these endpoints are Pods) along with a policy about how to make those pods accessible.
For example, consider a stateless image-processing backend which is running with
3 replicas. Those replicas are fungible&mdash;frontends do not care which backend
@ -59,6 +59,26 @@ track of the set of backends themselves.
The Service abstraction enables this decoupling.
The set of Pods targeted by a Service is usually determined
by a {{< glossary_tooltip text="selector" term_id="selector" >}} that you
define.
To learn about other ways to define Service endpoints,
see [Services _without_ selectors](#services-without-selectors).
If your workload speaks HTTP, you might choose to use an
[Ingress](/docs/concepts/services-networking/ingress/) to control how web traffic
reaches that workload.
Ingress is not a Service type, but it acts as the entry point for your
cluster. An Ingress lets you consolidate your routing rules into a single resource, so
that you can expose multiple components of your workload, running separately in your
cluster, behind a single listener.
The [Gateway](https://gateway-api.sigs.k8s.io/#what-is-the-gateway-api) API for Kubernetes
provides extra capabilities beyond Ingress and Service. You can add Gateway to your cluster -
it is a family of extension APIs, implemented using
{{< glossary_tooltip term_id="CustomResourceDefinition" text="CustomResourceDefinitions" >}} -
and then use these to configure access to network services that are running in your cluster.
### Cloud-native service discovery
If you're able to use Kubernetes APIs for service discovery in your application,
@ -69,16 +89,20 @@ whenever the set of Pods in a Service changes.
For non-native applications, Kubernetes offers ways to place a network port or load
balancer in between your application and the backend Pods.
Either way, your workload can use these [service discovery](#discovering-services)
mechanisms to find the target it wants to connect to.
## Defining a Service
A Service in Kubernetes is a REST object, similar to a Pod. Like all of the
REST objects, you can `POST` a Service definition to the API server to create
a new instance.
The name of a Service object must be a valid
[RFC 1035 label name](/docs/concepts/overview/working-with-objects/names#rfc-1035-label-names).
A Service in Kubernetes is an
{{< glossary_tooltip text="object" term_id="object" >}}
(the same way that a Pod or a ConfigMap is an object). You can create,
view or modify Service definitions using the Kubernetes API. Usually
you use a tool such as `kubectl` to make those API calls for you.
For example, suppose you have a set of Pods where each listens on TCP port 9376
and contains a label `app.kubernetes.io/name=MyApp`:
For example, suppose you have a set of Pods that each listen on TCP port 9376
and are labelled as `app.kubernetes.io/name=MyApp`. You can define a Service to
publish that TCP listener:
```yaml
apiVersion: v1
@ -94,16 +118,20 @@ spec:
targetPort: 9376
```
This specification creates a new Service object named "my-service", which
targets TCP port 9376 on any Pod with the `app.kubernetes.io/name=MyApp` label.
Applying this manifest creates a new Service named "my-service", which
targets TCP port 9376 on any Pod with the `app.kubernetes.io/name: MyApp` label.
Kubernetes assigns this Service an IP address (sometimes called the "cluster IP"),
which is used by the Service proxies
(see [Virtual IP addressing mechanism](#virtual-ip-addressing-mechanism) below).
Kubernetes assigns this Service an IP address (the _cluster IP_),
that is used by the virtual IP address mechanism. For more details on that mechanism,
read [Virtual IPs and Service Proxies](/docs/reference/networking/virtual-ips/).
The controller for that Service continuously scans for Pods that
match its selector, and then makes any necessary updates to the set of
EndpointSlices for the Service.
The name of a Service object must be a valid
[RFC 1035 label name](/docs/concepts/overview/working-with-objects/names#rfc-1035-label-names).
The controller for the Service selector continuously scans for Pods that
match its selector, and then POSTs any updates to an Endpoint object
also named "my-service".
{{< note >}}
A Service can map _any_ incoming `port` to a `targetPort`. By default and
@ -177,8 +205,8 @@ For example:
* You are migrating a workload to Kubernetes. While evaluating the approach,
you run only a portion of your backends in Kubernetes.
In any of these scenarios you can define a Service _without_ a Pod selector.
For example:
In any of these scenarios you can define a Service _without_ specifying a
selector to match Pods. For example:
```yaml
apiVersion: v1
@ -262,9 +290,9 @@ selector will fail due to this constraint. This prevents the Kubernetes API serv
from being used as a proxy to endpoints the caller may not be authorized to access.
{{< /note >}}
An ExternalName Service is a special case of Service that does not have
An `ExternalName` Service is a special case of Service that does not have
selectors and uses DNS names instead. For more information, see the
[ExternalName](#externalname) section later in this document.
[ExternalName](#externalname) section.
### EndpointSlices
@ -436,7 +464,7 @@ the port number for `http`, as well as the IP address.
The Kubernetes DNS server is the only way to access `ExternalName` Services.
You can find more information about `ExternalName` resolution in
[DNS Pods and Services](/docs/concepts/services-networking/dns-pod-service/).
[DNS for Services and Pods](/docs/concepts/services-networking/dns-pod-service/).
## Headless Services
@ -704,7 +732,7 @@ In a split-horizon DNS environment you would need two Services to be able to rou
and internal traffic to your endpoints.
To set an internal load balancer, add one of the following annotations to your Service
depending on the cloud Service provider you're using.
depending on the cloud service provider you're using:
{{< tabs name="service_tabs" >}}
{{% tab name="Default" %}}
@ -1151,9 +1179,9 @@ spec:
- name: http
protocol: TCP
port: 80
targetPort: 9376
targetPort: 49152
externalIPs:
- 80.11.12.10
- 198.51.100.32
```
## Session stickiness
@ -1178,12 +1206,17 @@ mechanism Kubernetes provides to expose a Service with a virtual IP address.
## {{% heading "whatsnext" %}}
* Follow the [Connecting Applications with Services](/docs/tutorials/services/connect-applications-service/) tutorial
* Read about [Ingress](/docs/concepts/services-networking/ingress/)
* Read about [EndpointSlices](/docs/concepts/services-networking/endpoint-slices/)
Learn more about Services and how they fit into Kubernetes:
* Follow the [Connecting Applications with Services](/docs/tutorials/services/connect-applications-service/) tutorial.
* Read about [Ingress](/docs/concepts/services-networking/ingress/), which
exposes HTTP and HTTPS routes from outside the cluster to Services within
your cluster.
* Read about [Gateway](https://gateway-api.sigs.k8s.io/), an extension to
Kubernetes that provides more flexibility than Ingress.
For more context:
* Read [Virtual IPs and Service Proxies](/docs/reference/networking/virtual-ips/)
* Read the [API reference](/docs/reference/kubernetes-api/service-resources/service-v1/) for the Service API
* Read the [API reference](/docs/reference/kubernetes-api/service-resources/endpoints-v1/) for the Endpoints API
* Read the [API reference](/docs/reference/kubernetes-api/service-resources/endpoint-slice-v1/) for the EndpointSlice API
For more context, read the following:
* [Virtual IPs and Service Proxies](/docs/reference/networking/virtual-ips/)
* [EndpointSlices](/docs/concepts/services-networking/endpoint-slices/)
* [Service API reference](/docs/reference/kubernetes-api/service-resources/service-v1/)
* [EndpointSlice API reference](/docs/reference/kubernetes-api/service-resources/endpoint-slice-v1/)
* [Endpoint API reference (legacy)](/docs/reference/kubernetes-api/service-resources/endpoints-v1/)

View File

@ -126,7 +126,7 @@ zone.
5. **A zone is not represented in hints:** If the kube-proxy is unable to find
at least one endpoint with a hint targeting the zone it is running in, it falls
to using endpoints from all zones. This is most likely to happen as you add
back to using endpoints from all zones. This is most likely to happen as you add
a new zone into your existing cluster.
## Constraints

View File

@ -16,25 +16,45 @@ weight: 20
<!-- overview -->
This document describes _persistent volumes_ in Kubernetes. Familiarity with [volumes](/docs/concepts/storage/volumes/) is suggested.
This document describes _persistent volumes_ in Kubernetes. Familiarity with
[volumes](/docs/concepts/storage/volumes/) is suggested.
<!-- body -->
## Introduction
Managing storage is a distinct problem from managing compute instances. The PersistentVolume subsystem provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed. To do this, we introduce two new API resources: PersistentVolume and PersistentVolumeClaim.
Managing storage is a distinct problem from managing compute instances.
The PersistentVolume subsystem provides an API for users and administrators
that abstracts details of how storage is provided from how it is consumed.
To do this, we introduce two new API resources: PersistentVolume and PersistentVolumeClaim.
A _PersistentVolume_ (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using [Storage Classes](/docs/concepts/storage/storage-classes/). It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A _PersistentVolume_ (PV) is a piece of storage in the cluster that has been
provisioned by an administrator or dynamically provisioned using
[Storage Classes](/docs/concepts/storage/storage-classes/). It is a resource in
the cluster just like a node is a cluster resource. PVs are volume plugins like
Volumes, but have a lifecycle independent of any individual Pod that uses the PV.
This API object captures the details of the implementation of the storage, be that
NFS, iSCSI, or a cloud-provider-specific storage system.
A _PersistentVolumeClaim_ (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany or ReadWriteMany, see [AccessModes](#access-modes)).
A _PersistentVolumeClaim_ (PVC) is a request for storage by a user. It is similar
to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can
request specific levels of resources (CPU and Memory). Claims can request specific
size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany or
ReadWriteMany, see [AccessModes](#access-modes)).
While PersistentVolumeClaims allow a user to consume abstract storage resources, it is common that users need PersistentVolumes with varying properties, such as performance, for different problems. Cluster administrators need to be able to offer a variety of PersistentVolumes that differ in more ways than size and access modes, without exposing users to the details of how those volumes are implemented. For these needs, there is the _StorageClass_ resource.
While PersistentVolumeClaims allow a user to consume abstract storage resources,
it is common that users need PersistentVolumes with varying properties, such as
performance, for different problems. Cluster administrators need to be able to
offer a variety of PersistentVolumes that differ in more ways than size and access
modes, without exposing users to the details of how those volumes are implemented.
For these needs, there is the _StorageClass_ resource.
See the [detailed walkthrough with working examples](/docs/tasks/configure-pod-container/configure-persistent-volume-storage/).
## Lifecycle of a volume and claim
PVs are resources in the cluster. PVCs are requests for those resources and also act as claim checks to the resource. The interaction between PVs and PVCs follows this lifecycle:
PVs are resources in the cluster. PVCs are requests for those resources and also act
as claim checks to the resource. The interaction between PVs and PVCs follows this lifecycle:
### Provisioning
@ -42,7 +62,9 @@ There are two ways PVs may be provisioned: statically or dynamically.
#### Static
A cluster administrator creates a number of PVs. They carry the details of the real storage, which is available for use by cluster users. They exist in the Kubernetes API and are available for consumption.
A cluster administrator creates a number of PVs. They carry the details of the
real storage, which is available for use by cluster users. They exist in the
Kubernetes API and are available for consumption.
#### Dynamic
@ -55,7 +77,8 @@ provisioning to occur. Claims that request the class `""` effectively disable
dynamic provisioning for themselves.
To enable dynamic storage provisioning based on storage class, the cluster administrator
needs to enable the `DefaultStorageClass` [admission controller](/docs/reference/access-authn-authz/admission-controllers/#defaultstorageclass)
needs to enable the `DefaultStorageClass`
[admission controller](/docs/reference/access-authn-authz/admission-controllers/#defaultstorageclass)
on the API server. This can be done, for example, by ensuring that `DefaultStorageClass` is
among the comma-delimited, ordered list of values for the `--enable-admission-plugins` flag of
the API server component. For more information on API server command-line flags,
@ -63,26 +86,51 @@ check [kube-apiserver](/docs/admin/kube-apiserver/) documentation.
### Binding
A user creates, or in the case of dynamic provisioning, has already created, a PersistentVolumeClaim with a specific amount of storage requested and with certain access modes. A control loop in the master watches for new PVCs, finds a matching PV (if possible), and binds them together. If a PV was dynamically provisioned for a new PVC, the loop will always bind that PV to the PVC. Otherwise, the user will always get at least what they asked for, but the volume may be in excess of what was requested. Once bound, PersistentVolumeClaim binds are exclusive, regardless of how they were bound. A PVC to PV binding is a one-to-one mapping, using a ClaimRef which is a bi-directional binding between the PersistentVolume and the PersistentVolumeClaim.
A user creates, or in the case of dynamic provisioning, has already created,
a PersistentVolumeClaim with a specific amount of storage requested and with
certain access modes. A control loop in the master watches for new PVCs, finds
a matching PV (if possible), and binds them together. If a PV was dynamically
provisioned for a new PVC, the loop will always bind that PV to the PVC. Otherwise,
the user will always get at least what they asked for, but the volume may be in
excess of what was requested. Once bound, PersistentVolumeClaim binds are exclusive,
regardless of how they were bound. A PVC to PV binding is a one-to-one mapping,
using a ClaimRef which is a bi-directional binding between the PersistentVolume
and the PersistentVolumeClaim.
Claims will remain unbound indefinitely if a matching volume does not exist. Claims will be bound as matching volumes become available. For example, a cluster provisioned with many 50Gi PVs would not match a PVC requesting 100Gi. The PVC can be bound when a 100Gi PV is added to the cluster.
Claims will remain unbound indefinitely if a matching volume does not exist.
Claims will be bound as matching volumes become available. For example, a
cluster provisioned with many 50Gi PVs would not match a PVC requesting 100Gi.
The PVC can be bound when a 100Gi PV is added to the cluster.
### Using
Pods use claims as volumes. The cluster inspects the claim to find the bound volume and mounts that volume for a Pod. For volumes that support multiple access modes, the user specifies which mode is desired when using their claim as a volume in a Pod.
Pods use claims as volumes. The cluster inspects the claim to find the bound
volume and mounts that volume for a Pod. For volumes that support multiple
access modes, the user specifies which mode is desired when using their claim
as a volume in a Pod.
Once a user has a claim and that claim is bound, the bound PV belongs to the user for as long as they need it. Users schedule Pods and access their claimed PVs by including a `persistentVolumeClaim` section in a Pod's `volumes` block. See [Claims As Volumes](#claims-as-volumes) for more details on this.
Once a user has a claim and that claim is bound, the bound PV belongs to the
user for as long as they need it. Users schedule Pods and access their claimed
PVs by including a `persistentVolumeClaim` section in a Pod's `volumes` block.
See [Claims As Volumes](#claims-as-volumes) for more details on this.
### Storage Object in Use Protection
The purpose of the Storage Object in Use Protection feature is to ensure that PersistentVolumeClaims (PVCs) in active use by a Pod and PersistentVolume (PVs) that are bound to PVCs are not removed from the system, as this may result in data loss.
The purpose of the Storage Object in Use Protection feature is to ensure that
PersistentVolumeClaims (PVCs) in active use by a Pod and PersistentVolume (PVs)
that are bound to PVCs are not removed from the system, as this may result in data loss.
{{< note >}}
PVC is in active use by a Pod when a Pod object exists that is using the PVC.
{{< /note >}}
If a user deletes a PVC in active use by a Pod, the PVC is not removed immediately. PVC removal is postponed until the PVC is no longer actively used by any Pods. Also, if an admin deletes a PV that is bound to a PVC, the PV is not removed immediately. PV removal is postponed until the PV is no longer bound to a PVC.
If a user deletes a PVC in active use by a Pod, the PVC is not removed immediately.
PVC removal is postponed until the PVC is no longer actively used by any Pods. Also,
if an admin deletes a PV that is bound to a PVC, the PV is not removed immediately.
PV removal is postponed until the PV is no longer bound to a PVC.
You can see that a PVC is protected when the PVC's status is `Terminating` and the `Finalizers` list includes `kubernetes.io/pvc-protection`:
You can see that a PVC is protected when the PVC's status is `Terminating` and the
`Finalizers` list includes `kubernetes.io/pvc-protection`:
```shell
kubectl describe pvc hostpath
@ -98,7 +146,8 @@ Finalizers: [kubernetes.io/pvc-protection]
...
```
You can see that a PV is protected when the PV's status is `Terminating` and the `Finalizers` list includes `kubernetes.io/pv-protection` too:
You can see that a PV is protected when the PV's status is `Terminating` and
the `Finalizers` list includes `kubernetes.io/pv-protection` too:
```shell
kubectl describe pv task-pv-volume
@ -122,29 +171,48 @@ Events: <none>
### Reclaiming
When a user is done with their volume, they can delete the PVC objects from the API that allows reclamation of the resource. The reclaim policy for a PersistentVolume tells the cluster what to do with the volume after it has been released of its claim. Currently, volumes can either be Retained, Recycled, or Deleted.
When a user is done with their volume, they can delete the PVC objects from the
API that allows reclamation of the resource. The reclaim policy for a PersistentVolume
tells the cluster what to do with the volume after it has been released of its claim.
Currently, volumes can either be Retained, Recycled, or Deleted.
#### Retain
The `Retain` reclaim policy allows for manual reclamation of the resource. When the PersistentVolumeClaim is deleted, the PersistentVolume still exists and the volume is considered "released". But it is not yet available for another claim because the previous claimant's data remains on the volume. An administrator can manually reclaim the volume with the following steps.
The `Retain` reclaim policy allows for manual reclamation of the resource.
When the PersistentVolumeClaim is deleted, the PersistentVolume still exists
and the volume is considered "released". But it is not yet available for
another claim because the previous claimant's data remains on the volume.
An administrator can manually reclaim the volume with the following steps.
1. Delete the PersistentVolume. The associated storage asset in external infrastructure (such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume) still exists after the PV is deleted.
1. Delete the PersistentVolume. The associated storage asset in external infrastructure
(such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume) still exists after the PV is deleted.
1. Manually clean up the data on the associated storage asset accordingly.
1. Manually delete the associated storage asset.
If you want to reuse the same storage asset, create a new PersistentVolume with the same storage asset definition.
If you want to reuse the same storage asset, create a new PersistentVolume with
the same storage asset definition.
#### Delete
For volume plugins that support the `Delete` reclaim policy, deletion removes both the PersistentVolume object from Kubernetes, as well as the associated storage asset in the external infrastructure, such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume. Volumes that were dynamically provisioned inherit the [reclaim policy of their StorageClass](#reclaim-policy), which defaults to `Delete`. The administrator should configure the StorageClass according to users' expectations; otherwise, the PV must be edited or patched after it is created. See [Change the Reclaim Policy of a PersistentVolume](/docs/tasks/administer-cluster/change-pv-reclaim-policy/).
For volume plugins that support the `Delete` reclaim policy, deletion removes
both the PersistentVolume object from Kubernetes, as well as the associated
storage asset in the external infrastructure, such as an AWS EBS, GCE PD,
Azure Disk, or Cinder volume. Volumes that were dynamically provisioned
inherit the [reclaim policy of their StorageClass](#reclaim-policy), which
defaults to `Delete`. The administrator should configure the StorageClass
according to users' expectations; otherwise, the PV must be edited or
patched after it is created. See
[Change the Reclaim Policy of a PersistentVolume](/docs/tasks/administer-cluster/change-pv-reclaim-policy/).
#### Recycle
{{< warning >}}
The `Recycle` reclaim policy is deprecated. Instead, the recommended approach is to use dynamic provisioning.
The `Recycle` reclaim policy is deprecated. Instead, the recommended approach
is to use dynamic provisioning.
{{< /warning >}}
If supported by the underlying volume plugin, the `Recycle` reclaim policy performs a basic scrub (`rm -rf /thevolume/*`) on the volume and makes it available again for a new claim.
If supported by the underlying volume plugin, the `Recycle` reclaim policy performs
a basic scrub (`rm -rf /thevolume/*`) on the volume and makes it available again for a new claim.
However, an administrator can configure a custom recycler Pod template using
the Kubernetes controller manager command line arguments as described in the
@ -173,7 +241,8 @@ spec:
mountPath: /scrub
```
However, the particular path specified in the custom recycler Pod template in the `volumes` part is replaced with the particular path of the volume that is being recycled.
However, the particular path specified in the custom recycler Pod template in the
`volumes` part is replaced with the particular path of the volume that is being recycled.
### PersistentVolume deletion protection finalizer
{{< feature-state for_k8s_version="v1.23" state="alpha" >}}
@ -181,10 +250,12 @@ However, the particular path specified in the custom recycler Pod template in th
Finalizers can be added on a PersistentVolume to ensure that PersistentVolumes
having `Delete` reclaim policy are deleted only after the backing storage are deleted.
The newly introduced finalizers `kubernetes.io/pv-controller` and `external-provisioner.volume.kubernetes.io/finalizer`
The newly introduced finalizers `kubernetes.io/pv-controller` and
`external-provisioner.volume.kubernetes.io/finalizer`
are only added to dynamically provisioned volumes.
The finalizer `kubernetes.io/pv-controller` is added to in-tree plugin volumes. The following is an example
The finalizer `kubernetes.io/pv-controller` is added to in-tree plugin volumes.
The following is an example
```shell
kubectl describe pv pvc-74a498d6-3929-47e8-8c02-078c1ece4d78
@ -213,6 +284,7 @@ Events: <none>
The finalizer `external-provisioner.volume.kubernetes.io/finalizer` is added for CSI volumes.
The following is an example:
```shell
Name: pvc-2f0bab97-85a8-4552-8044-eb8be45cf48d
Labels: <none>
@ -244,14 +316,17 @@ the `kubernetes.io/pv-controller` finalizer is replaced by the
### Reserving a PersistentVolume
The control plane can [bind PersistentVolumeClaims to matching PersistentVolumes](#binding) in the
cluster. However, if you want a PVC to bind to a specific PV, you need to pre-bind them.
The control plane can [bind PersistentVolumeClaims to matching PersistentVolumes](#binding)
in the cluster. However, if you want a PVC to bind to a specific PV, you need to pre-bind them.
By specifying a PersistentVolume in a PersistentVolumeClaim, you declare a binding between that specific PV and PVC.
If the PersistentVolume exists and has not reserved PersistentVolumeClaims through its `claimRef` field, then the PersistentVolume and PersistentVolumeClaim will be bound.
By specifying a PersistentVolume in a PersistentVolumeClaim, you declare a binding
between that specific PV and PVC. If the PersistentVolume exists and has not reserved
PersistentVolumeClaims through its `claimRef` field, then the PersistentVolume and
PersistentVolumeClaim will be bound.
The binding happens regardless of some volume matching criteria, including node affinity.
The control plane still checks that [storage class](/docs/concepts/storage/storage-classes/), access modes, and requested storage size are valid.
The control plane still checks that [storage class](/docs/concepts/storage/storage-classes/),
access modes, and requested storage size are valid.
```yaml
apiVersion: v1
@ -265,7 +340,10 @@ spec:
...
```
This method does not guarantee any binding privileges to the PersistentVolume. If other PersistentVolumeClaims could use the PV that you specify, you first need to reserve that storage volume. Specify the relevant PersistentVolumeClaim in the `claimRef` field of the PV so that other PVCs can not bind to it.
This method does not guarantee any binding privileges to the PersistentVolume.
If other PersistentVolumeClaims could use the PV that you specify, you first
need to reserve that storage volume. Specify the relevant PersistentVolumeClaim
in the `claimRef` field of the PV so that other PVCs can not bind to it.
```yaml
apiVersion: v1
@ -334,8 +412,9 @@ increased and that no resize is necessary.
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
Support for expanding CSI volumes is enabled by default but it also requires a specific CSI driver to support volume expansion. Refer to documentation of the specific CSI driver for more information.
Support for expanding CSI volumes is enabled by default but it also requires a
specific CSI driver to support volume expansion. Refer to documentation of the
specific CSI driver for more information.
#### Resizing a volume containing a file system
@ -364,22 +443,33 @@ FlexVolume resize is possible only when the underlying driver supports resize.
{{< /note >}}
{{< note >}}
Expanding EBS volumes is a time-consuming operation. Also, there is a per-volume quota of one modification every 6 hours.
Expanding EBS volumes is a time-consuming operation.
Also, there is a per-volume quota of one modification every 6 hours.
{{< /note >}}
#### Recovering from Failure when Expanding Volumes
If a user specifies a new size that is too big to be satisfied by underlying storage system, expansion of PVC will be continuously retried until user or cluster administrator takes some action. This can be undesirable and hence Kubernetes provides following methods of recovering from such failures.
If a user specifies a new size that is too big to be satisfied by underlying
storage system, expansion of PVC will be continuously retried until user or
cluster administrator takes some action. This can be undesirable and hence
Kubernetes provides following methods of recovering from such failures.
{{< tabs name="recovery_methods" >}}
{{% tab name="Manually with Cluster Administrator access" %}}
If expanding underlying storage fails, the cluster administrator can manually recover the Persistent Volume Claim (PVC) state and cancel the resize requests. Otherwise, the resize requests are continuously retried by the controller without administrator intervention.
If expanding underlying storage fails, the cluster administrator can manually
recover the Persistent Volume Claim (PVC) state and cancel the resize requests.
Otherwise, the resize requests are continuously retried by the controller without
administrator intervention.
1. Mark the PersistentVolume(PV) that is bound to the PersistentVolumeClaim(PVC) with `Retain` reclaim policy.
2. Delete the PVC. Since PV has `Retain` reclaim policy - we will not lose any data when we recreate the PVC.
3. Delete the `claimRef` entry from PV specs, so as new PVC can bind to it. This should make the PV `Available`.
4. Re-create the PVC with smaller size than PV and set `volumeName` field of the PVC to the name of the PV. This should bind new PVC to existing PV.
1. Mark the PersistentVolume(PV) that is bound to the PersistentVolumeClaim(PVC)
with `Retain` reclaim policy.
2. Delete the PVC. Since PV has `Retain` reclaim policy - we will not lose any data
when we recreate the PVC.
3. Delete the `claimRef` entry from PV specs, so as new PVC can bind to it.
This should make the PV `Available`.
4. Re-create the PVC with smaller size than PV and set `volumeName` field of the
PVC to the name of the PV. This should bind new PVC to existing PV.
5. Don't forget to restore the reclaim policy of the PV.
{{% /tab %}}
@ -387,7 +477,11 @@ If expanding underlying storage fails, the cluster administrator can manually re
{{% feature-state for_k8s_version="v1.23" state="alpha" %}}
{{< note >}}
Recovery from failing PVC expansion by users is available as an alpha feature since Kubernetes 1.23. The `RecoverVolumeExpansionFailure` feature must be enabled for this feature to work. Refer to the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) documentation for more information.
Recovery from failing PVC expansion by users is available as an alpha feature
since Kubernetes 1.23. The `RecoverVolumeExpansionFailure` feature must be
enabled for this feature to work. Refer to the
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
documentation for more information.
{{< /note >}}
If the feature gates `RecoverVolumeExpansionFailure` is
@ -397,7 +491,8 @@ smaller proposed size, edit `.spec.resources` for that PVC and choose a value th
value you previously tried.
This is useful if expansion to a higher value did not succeed because of capacity constraint.
If that has happened, or you suspect that it might have, you can retry expansion by specifying a
size that is within the capacity limits of underlying storage provider. You can monitor status of resize operation by watching `.status.resizeStatus` and events on the PVC.
size that is within the capacity limits of underlying storage provider. You can monitor status of
resize operation by watching `.status.resizeStatus` and events on the PVC.
Note that,
although you can specify a lower amount of storage than what was requested previously,
@ -406,7 +501,6 @@ Kubernetes does not support shrinking a PVC to less than its current size.
{{% /tab %}}
{{% /tabs %}}
## Types of Persistent Volumes
PersistentVolume types are implemented as plugins. Kubernetes currently supports the following plugins:
@ -423,7 +517,8 @@ PersistentVolume types are implemented as plugins. Kubernetes currently supports
* [`nfs`](/docs/concepts/storage/volumes/#nfs) - Network File System (NFS) storage
* [`rbd`](/docs/concepts/storage/volumes/#rbd) - Rados Block Device (RBD) volume
The following types of PersistentVolume are deprecated. This means that support is still available but will be removed in a future Kubernetes release.
The following types of PersistentVolume are deprecated.
This means that support is still available but will be removed in a future Kubernetes release.
* [`awsElasticBlockStore`](/docs/concepts/storage/volumes/#awselasticblockstore) - AWS Elastic Block Store (EBS)
(**deprecated** in v1.17)
@ -483,14 +578,21 @@ spec:
```
{{< note >}}
Helper programs relating to the volume type may be required for consumption of a PersistentVolume within a cluster. In this example, the PersistentVolume is of type NFS and the helper program /sbin/mount.nfs is required to support the mounting of NFS filesystems.
Helper programs relating to the volume type may be required for consumption of
a PersistentVolume within a cluster. In this example, the PersistentVolume is
of type NFS and the helper program /sbin/mount.nfs is required to support the
mounting of NFS filesystems.
{{< /note >}}
### Capacity
Generally, a PV will have a specific storage capacity. This is set using the PV's `capacity` attribute. Read the glossary term [Quantity](/docs/reference/glossary/?all=true#term-quantity) to understand the units expected by `capacity`.
Generally, a PV will have a specific storage capacity. This is set using the PV's
`capacity` attribute. Read the glossary term
[Quantity](/docs/reference/glossary/?all=true#term-quantity) to understand the units
expected by `capacity`.
Currently, storage size is the only resource that can be set or requested. Future attributes may include IOPS, throughput, etc.
Currently, storage size is the only resource that can be set or requested.
Future attributes may include IOPS, throughput, etc.
### Volume Mode
@ -515,12 +617,18 @@ for an example on how to use a volume with `volumeMode: Block` in a Pod.
### Access Modes
A PersistentVolume can be mounted on a host in any way supported by the resource provider. As shown in the table below, providers will have different capabilities and each PV's access modes are set to the specific modes supported by that particular volume. For example, NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV's capabilities.
A PersistentVolume can be mounted on a host in any way supported by the resource
provider. As shown in the table below, providers will have different capabilities
and each PV's access modes are set to the specific modes supported by that particular
volume. For example, NFS can support multiple read/write clients, but a specific
NFS PV might be exported on the server as read-only. Each PV gets its own set of
access modes describing that specific PV's capabilities.
The access modes are:
`ReadWriteOnce`
: the volume can be mounted as read-write by a single node. ReadWriteOnce access mode still can allow multiple pods to access the volume when the pods are running on the same node.
`ReadWriteOnce`
: the volume can be mounted as read-write by a single node. ReadWriteOnce access
mode still can allow multiple pods to access the volume when the pods are running on the same node.
`ReadOnlyMany`
: the volume can be mounted as read-only by many nodes.
@ -529,12 +637,14 @@ The access modes are:
: the volume can be mounted as read-write by many nodes.
`ReadWriteOncePod`
: the volume can be mounted as read-write by a single Pod. Use ReadWriteOncePod access mode if you want to ensure that only one pod across whole cluster can read that PVC or write to it. This is only supported for CSI volumes and Kubernetes version 1.22+.
: the volume can be mounted as read-write by a single Pod. Use ReadWriteOncePod
access mode if you want to ensure that only one pod across whole cluster can
read that PVC or write to it. This is only supported for CSI volumes and
Kubernetes version 1.22+.
The blog article [Introducing Single Pod Access Mode for PersistentVolumes](/blog/2021/09/13/read-write-once-pod-access-mode-alpha/) covers this in more detail.
The blog article
[Introducing Single Pod Access Mode for PersistentVolumes](/blog/2021/09/13/read-write-once-pod-access-mode-alpha/)
covers this in more detail.
In the CLI, the access modes are abbreviated to:
@ -547,13 +657,15 @@ In the CLI, the access modes are abbreviated to:
Kubernetes uses volume access modes to match PersistentVolumeClaims and PersistentVolumes.
In some cases, the volume access modes also constrain where the PersistentVolume can be mounted.
Volume access modes do **not** enforce write protection once the storage has been mounted.
Even if the access modes are specified as ReadWriteOnce, ReadOnlyMany, or ReadWriteMany, they don't set any constraints on the volume.
For example, even if a PersistentVolume is created as ReadOnlyMany, it is no guarantee that it will be read-only.
If the access modes are specified as ReadWriteOncePod, the volume is constrained and can be mounted on only a single Pod.
Even if the access modes are specified as ReadWriteOnce, ReadOnlyMany, or ReadWriteMany,
they don't set any constraints on the volume. For example, even if a PersistentVolume is
created as ReadOnlyMany, it is no guarantee that it will be read-only. If the access modes
are specified as ReadWriteOncePod, the volume is constrained and can be mounted on only a single Pod.
{{< /note >}}
> __Important!__ A volume can only be mounted using one access mode at a time, even if it supports many. For example, a GCEPersistentDisk can be mounted as ReadWriteOnce by a single node or ReadOnlyMany by many nodes, but not at the same time.
> __Important!__ A volume can only be mounted using one access mode at a time,
> even if it supports many. For example, a GCEPersistentDisk can be mounted as
> ReadWriteOnce by a single node or ReadOnlyMany by many nodes, but not at the same time.
| Volume Plugin | ReadWriteOnce | ReadOnlyMany | ReadWriteMany | ReadWriteOncePod |
| :--- | :---: | :---: | :---: | - |
@ -593,13 +705,16 @@ Current reclaim policies are:
* Retain -- manual reclamation
* Recycle -- basic scrub (`rm -rf /thevolume/*`)
* Delete -- associated storage asset such as AWS EBS, GCE PD, Azure Disk, or OpenStack Cinder volume is deleted
* Delete -- associated storage asset such as AWS EBS, GCE PD, Azure Disk,
or OpenStack Cinder volume is deleted
Currently, only NFS and HostPath support recycling. AWS EBS, GCE PD, Azure Disk, and Cinder volumes support deletion.
Currently, only NFS and HostPath support recycling. AWS EBS, GCE PD, Azure Disk,
and Cinder volumes support deletion.
### Mount Options
A Kubernetes administrator can specify additional mount options for when a Persistent Volume is mounted on a node.
A Kubernetes administrator can specify additional mount options for when a
Persistent Volume is mounted on a node.
{{< note >}}
Not all Persistent Volume types support mount options.
@ -627,10 +742,19 @@ it will become fully deprecated in a future Kubernetes release.
### Node Affinity
{{< note >}}
For most volume types, you do not need to set this field. It is automatically populated for [AWS EBS](/docs/concepts/storage/volumes/#awselasticblockstore), [GCE PD](/docs/concepts/storage/volumes/#gcepersistentdisk) and [Azure Disk](/docs/concepts/storage/volumes/#azuredisk) volume block types. You need to explicitly set this for [local](/docs/concepts/storage/volumes/#local) volumes.
For most volume types, you do not need to set this field. It is automatically
populated for [AWS EBS](/docs/concepts/storage/volumes/#awselasticblockstore),
[GCE PD](/docs/concepts/storage/volumes/#gcepersistentdisk) and
[Azure Disk](/docs/concepts/storage/volumes/#azuredisk) volume block types. You
need to explicitly set this for [local](/docs/concepts/storage/volumes/#local) volumes.
{{< /note >}}
A PV can specify node affinity to define constraints that limit what nodes this volume can be accessed from. Pods that use a PV will only be scheduled to nodes that are selected by the node affinity. To specify node affinity, set `nodeAffinity` in the `.spec` of a PV. The [PersistentVolume](/docs/reference/kubernetes-api/config-and-storage-resources/persistent-volume-v1/#PersistentVolumeSpec) API reference has more details on this field.
A PV can specify node affinity to define constraints that limit what nodes this
volume can be accessed from. Pods that use a PV will only be scheduled to nodes
that are selected by the node affinity. To specify node affinity, set
`nodeAffinity` in the `.spec` of a PV. The
[PersistentVolume](/docs/reference/kubernetes-api/config-and-storage-resources/persistent-volume-v1/#PersistentVolumeSpec)
API reference has more details on this field.
### Phase
@ -671,24 +795,35 @@ spec:
### Access Modes
Claims use [the same conventions as volumes](#access-modes) when requesting storage with specific access modes.
Claims use [the same conventions as volumes](#access-modes) when requesting
storage with specific access modes.
### Volume Modes
Claims use [the same convention as volumes](#volume-mode) to indicate the consumption of the volume as either a filesystem or block device.
Claims use [the same convention as volumes](#volume-mode) to indicate the
consumption of the volume as either a filesystem or block device.
### Resources
Claims, like Pods, can request specific quantities of a resource. In this case, the request is for storage. The same [resource model](https://git.k8s.io/design-proposals-archive/scheduling/resources.md) applies to both volumes and claims.
Claims, like Pods, can request specific quantities of a resource. In this case,
the request is for storage. The same
[resource model](https://git.k8s.io/design-proposals-archive/scheduling/resources.md)
applies to both volumes and claims.
### Selector
Claims can specify a [label selector](/docs/concepts/overview/working-with-objects/labels/#label-selectors) to further filter the set of volumes. Only the volumes whose labels match the selector can be bound to the claim. The selector can consist of two fields:
Claims can specify a
[label selector](/docs/concepts/overview/working-with-objects/labels/#label-selectors)
to further filter the set of volumes. Only the volumes whose labels match the selector
can be bound to the claim. The selector can consist of two fields:
* `matchLabels` - the volume must have a label with this value
* `matchExpressions` - a list of requirements made by specifying key, list of values, and operator that relates the key and values. Valid operators include In, NotIn, Exists, and DoesNotExist.
* `matchExpressions` - a list of requirements made by specifying key, list of values,
and operator that relates the key and values. Valid operators include In, NotIn,
Exists, and DoesNotExist.
All of the requirements, from both `matchLabels` and `matchExpressions`, are ANDed together they must all be satisfied in order to match.
All of the requirements, from both `matchLabels` and `matchExpressions`, are
ANDed together they must all be satisfied in order to match.
### Class
@ -738,22 +873,38 @@ In the past, the annotation `volume.beta.kubernetes.io/storage-class` was used i
of `storageClassName` attribute. This annotation is still working; however,
it won't be supported in a future Kubernetes release.
#### Retroactive default StorageClass assignment
{{< feature-state for_k8s_version="v1.26" state="beta" >}}
You can create a PersistentVolumeClaim without specifying a `storageClassName` for the new PVC, and you can do so even when no default StorageClass exists in your cluster. In this case, the new PVC creates as you defined it, and the `storageClassName` of that PVC remains unset until default becomes available.
You can create a PersistentVolumeClaim without specifying a `storageClassName`
for the new PVC, and you can do so even when no default StorageClass exists
in your cluster. In this case, the new PVC creates as you defined it, and the
`storageClassName` of that PVC remains unset until default becomes available.
When a default StorageClass becomes available, the control plane identifies any existing PVCs without `storageClassName`. For the PVCs that either have an empty value for `storageClassName` or do not have this key, the control plane then updates those PVCs to set `storageClassName` to match the new default StorageClass. If you have an existing PVC where the `storageClassName` is `""`, and you configure a default StorageClass, then this PVC will not get updated.
When a default StorageClass becomes available, the control plane identifies any
existing PVCs without `storageClassName`. For the PVCs that either have an empty
value for `storageClassName` or do not have this key, the control plane then
updates those PVCs to set `storageClassName` to match the new default StorageClass.
If you have an existing PVC where the `storageClassName` is `""`, and you configure
a default StorageClass, then this PVC will not get updated.
In order to keep binding to PVs with `storageClassName` set to `""` (while a default StorageClass is present), you need to set the `storageClassName` of the associated PVC to `""`.
In order to keep binding to PVs with `storageClassName` set to `""`
(while a default StorageClass is present), you need to set the `storageClassName`
of the associated PVC to `""`.
This behavior helps administrators change default StorageClass by removing the old one first and then creating or setting another one. This brief window while there is no default causes PVCs without `storageClassName` created at that time to not have any default, but due to the retroactive default StorageClass assignment this way of changing defaults is safe.
This behavior helps administrators change default StorageClass by removing the
old one first and then creating or setting another one. This brief window while
there is no default causes PVCs without `storageClassName` created at that time
to not have any default, but due to the retroactive default StorageClass
assignment this way of changing defaults is safe.
## Claims As Volumes
Pods access storage by using the claim as a volume. Claims must exist in the same namespace as the Pod using the claim. The cluster finds the claim in the Pod's namespace and uses it to get the PersistentVolume backing the claim. The volume is then mounted to the host and into the Pod.
Pods access storage by using the claim as a volume. Claims must exist in the
same namespace as the Pod using the claim. The cluster finds the claim in the
Pod's namespace and uses it to get the PersistentVolume backing the claim.
The volume is then mounted to the host and into the Pod.
```yaml
apiVersion: v1
@ -775,12 +926,15 @@ spec:
### A Note on Namespaces
PersistentVolumes binds are exclusive, and since PersistentVolumeClaims are namespaced objects, mounting claims with "Many" modes (`ROX`, `RWX`) is only possible within one namespace.
PersistentVolumes binds are exclusive, and since PersistentVolumeClaims are
namespaced objects, mounting claims with "Many" modes (`ROX`, `RWX`) is only
possible within one namespace.
### PersistentVolumes typed `hostPath`
A `hostPath` PersistentVolume uses a file or directory on the Node to emulate network-attached storage.
See [an example of `hostPath` typed volume](/docs/tasks/configure-pod-container/configure-persistent-volume-storage/#create-a-persistentvolume).
A `hostPath` PersistentVolume uses a file or directory on the Node to emulate
network-attached storage. See
[an example of `hostPath` typed volume](/docs/tasks/configure-pod-container/configure-persistent-volume-storage/#create-a-persistentvolume).
## Raw Block Volume Support
@ -819,6 +973,7 @@ spec:
lun: 0
readOnly: false
```
### PersistentVolumeClaim requesting a Raw Block Volume {#persistent-volume-claim-requesting-a-raw-block-volume}
```yaml
@ -858,14 +1013,18 @@ spec:
```
{{< note >}}
When adding a raw block device for a Pod, you specify the device path in the container instead of a mount path.
When adding a raw block device for a Pod, you specify the device path in the
container instead of a mount path.
{{< /note >}}
### Binding Block Volumes
If a user requests a raw block volume by indicating this using the `volumeMode` field in the PersistentVolumeClaim spec, the binding rules differ slightly from previous releases that didn't consider this mode as part of the spec.
Listed is a table of possible combinations the user and admin might specify for requesting a raw block device. The table indicates if the volume will be bound or not given the combinations:
Volume binding matrix for statically provisioned volumes:
If a user requests a raw block volume by indicating this using the `volumeMode`
field in the PersistentVolumeClaim spec, the binding rules differ slightly from
previous releases that didn't consider this mode as part of the spec.
Listed is a table of possible combinations the user and admin might specify for
requesting a raw block device. The table indicates if the volume will be bound or
not given the combinations: Volume binding matrix for statically provisioned volumes:
| PV volumeMode | PVC volumeMode | Result |
| --------------|:---------------:| ----------------:|
@ -880,15 +1039,19 @@ Volume binding matrix for statically provisioned volumes:
| Filesystem | unspecified | BIND |
{{< note >}}
Only statically provisioned volumes are supported for alpha release. Administrators should take care to consider these values when working with raw block devices.
Only statically provisioned volumes are supported for alpha release. Administrators
should take care to consider these values when working with raw block devices.
{{< /note >}}
## Volume Snapshot and Restore Volume from Snapshot Support
{{< feature-state for_k8s_version="v1.20" state="stable" >}}
Volume snapshots only support the out-of-tree CSI volume plugins. For details, see [Volume Snapshots](/docs/concepts/storage/volume-snapshots/).
In-tree volume plugins are deprecated. You can read about the deprecated volume plugins in the [Volume Plugin FAQ](https://github.com/kubernetes/community/blob/master/sig-storage/volume-plugin-faq.md).
Volume snapshots only support the out-of-tree CSI volume plugins.
For details, see [Volume Snapshots](/docs/concepts/storage/volume-snapshots/).
In-tree volume plugins are deprecated. You can read about the deprecated volume
plugins in the
[Volume Plugin FAQ](https://github.com/kubernetes/community/blob/master/sig-storage/volume-plugin-faq.md).
### Create a PersistentVolumeClaim from a Volume Snapshot {#create-persistent-volume-claim-from-volume-snapshot}
@ -912,7 +1075,8 @@ spec:
## Volume Cloning
[Volume Cloning](/docs/concepts/storage/volume-pvc-datasource/) only available for CSI volume plugins.
[Volume Cloning](/docs/concepts/storage/volume-pvc-datasource/)
only available for CSI volume plugins.
### Create PersistentVolumeClaim from an existing PVC {#create-persistent-volume-claim-from-an-existing-pvc}
@ -949,27 +1113,32 @@ same namespace, except for core objects other than PVCs. For clusters that have
gate enabled, use of the `dataSourceRef` is preferred over `dataSource`.
## Cross namespace data sources
{{< feature-state for_k8s_version="v1.26" state="alpha" >}}
Kubernetes supports cross namespace volume data sources.
To use cross namespace volume data sources, you must enable the `AnyVolumeDataSource` and `CrossNamespaceVolumeDataSource`
To use cross namespace volume data sources, you must enable the `AnyVolumeDataSource`
and `CrossNamespaceVolumeDataSource`
[feature gates](/docs/reference/command-line-tools-reference/feature-gates/) for
the kube-apiserver, kube-controller-manager.
Also, you must enable the `CrossNamespaceVolumeDataSource` feature gate for the csi-provisioner.
Enabling the `CrossNamespaceVolumeDataSource` feature gate allow you to specify a namespace in the dataSourceRef field.
Enabling the `CrossNamespaceVolumeDataSource` feature gate allows you to specify
a namespace in the dataSourceRef field.
{{< note >}}
When you specify a namespace for a volume data source, Kubernetes checks for a
ReferenceGrant in the other namespace before accepting the reference.
ReferenceGrant is part of the `gateway.networking.k8s.io` extension APIs.
See [ReferenceGrant](https://gateway-api.sigs.k8s.io/api-types/referencegrant/) in the Gateway API documentation for details.
See [ReferenceGrant](https://gateway-api.sigs.k8s.io/api-types/referencegrant/)
in the Gateway API documentation for details.
This means that you must extend your Kubernetes cluster with at least ReferenceGrant from the
Gateway API before you can use this mechanism.
{{< /note >}}
## Data source references
The `dataSourceRef` field behaves almost the same as the `dataSource` field. If either one is
The `dataSourceRef` field behaves almost the same as the `dataSource` field. If one is
specified while the other is not, the API server will give both fields the same value. Neither
field can be changed after creation, and attempting to specify different values for the two
fields will result in a validation error. Therefore the two fields will always have the same
@ -986,7 +1155,8 @@ users should be aware of:
When the `CrossNamespaceVolumeDataSource` feature is enabled, there are additional differences:
* The `dataSource` field only allows local objects, while the `dataSourceRef` field allows objects in any namespaces.
* The `dataSource` field only allows local objects, while the `dataSourceRef` field allows
objects in any namespaces.
* When namespace is specified, `dataSource` and `dataSourceRef` are not synced.
Users should always use `dataSourceRef` on clusters that have the feature gate enabled, and
@ -1030,10 +1200,13 @@ responsibility of that populator controller to report Events that relate to volu
the process.
### Using a cross-namespace volume data source
{{< feature-state for_k8s_version="v1.26" state="alpha" >}}
Create a ReferenceGrant to allow the namespace owner to accept the reference.
You define a populated volume by specifying a cross namespace volume data source using the `dataSourceRef` field. You must already have a valid ReferenceGrant in the source namespace:
You define a populated volume by specifying a cross namespace volume data source
using the `dataSourceRef` field. You must already have a valid ReferenceGrant
in the source namespace:
```yaml
apiVersion: gateway.networking.k8s.io/v1beta1

View File

@ -62,22 +62,22 @@ volumeBindingMode: Immediate
Each StorageClass has a provisioner that determines what volume plugin is used
for provisioning PVs. This field must be specified.
| Volume Plugin | Internal Provisioner| Config Example |
| :--- | :---: | :---: |
| AWSElasticBlockStore | &#x2713; | [AWS EBS](#aws-ebs) |
| AzureFile | &#x2713; | [Azure File](#azure-file) |
| AzureDisk | &#x2713; | [Azure Disk](#azure-disk) |
| CephFS | - | - |
| Cinder | &#x2713; | [OpenStack Cinder](#openstack-cinder)|
| FC | - | - |
| FlexVolume | - | - |
| GCEPersistentDisk | &#x2713; | [GCE PD](#gce-pd) |
| iSCSI | - | - |
| NFS | - | [NFS](#nfs) |
| RBD | &#x2713; | [Ceph RBD](#ceph-rbd) |
| VsphereVolume | &#x2713; | [vSphere](#vsphere) |
| PortworxVolume | &#x2713; | [Portworx Volume](#portworx-volume) |
| Local | - | [Local](#local) |
| Volume Plugin | Internal Provisioner | Config Example |
| :------------------- | :------------------: | :-----------------------------------: |
| AWSElasticBlockStore | &#x2713; | [AWS EBS](#aws-ebs) |
| AzureFile | &#x2713; | [Azure File](#azure-file) |
| AzureDisk | &#x2713; | [Azure Disk](#azure-disk) |
| CephFS | - | - |
| Cinder | &#x2713; | [OpenStack Cinder](#openstack-cinder) |
| FC | - | - |
| FlexVolume | - | - |
| GCEPersistentDisk | &#x2713; | [GCE PD](#gce-pd) |
| iSCSI | - | - |
| NFS | - | [NFS](#nfs) |
| RBD | &#x2713; | [Ceph RBD](#ceph-rbd) |
| VsphereVolume | &#x2713; | [vSphere](#vsphere) |
| PortworxVolume | &#x2713; | [Portworx Volume](#portworx-volume) |
| Local | - | [Local](#local) |
You are not restricted to specifying the "internal" provisioners
listed here (whose names are prefixed with "kubernetes.io" and shipped
@ -109,29 +109,28 @@ whatever reclaim policy they were assigned at creation.
{{< feature-state for_k8s_version="v1.11" state="beta" >}}
PersistentVolumes can be configured to be expandable. This feature when set to `true`,
allows the users to resize the volume by editing the corresponding PVC object.
PersistentVolumes can be configured to be expandable. This feature when set to `true`,
allows the users to resize the volume by editing the corresponding PVC object.
The following types of volumes support volume expansion, when the underlying
StorageClass has the field `allowVolumeExpansion` set to true.
{{< table caption = "Table of Volume types and the version of Kubernetes they require" >}}
Volume type | Required Kubernetes version
:---------- | :--------------------------
gcePersistentDisk | 1.11
awsElasticBlockStore | 1.11
Cinder | 1.11
rbd | 1.11
Azure File | 1.11
Azure Disk | 1.11
Portworx | 1.11
FlexVolume | 1.13
CSI | 1.14 (alpha), 1.16 (beta)
| Volume type | Required Kubernetes version |
| :------------------- | :-------------------------- |
| gcePersistentDisk | 1.11 |
| awsElasticBlockStore | 1.11 |
| Cinder | 1.11 |
| rbd | 1.11 |
| Azure File | 1.11 |
| Azure Disk | 1.11 |
| Portworx | 1.11 |
| FlexVolume | 1.13 |
| CSI | 1.14 (alpha), 1.16 (beta) |
{{< /table >}}
{{< note >}}
You can only use the volume expansion feature to grow a Volume, not to shrink it.
{{< /note >}}
@ -168,14 +167,14 @@ and [taints and tolerations](/docs/concepts/scheduling-eviction/taint-and-tolera
The following plugins support `WaitForFirstConsumer` with dynamic provisioning:
* [AWSElasticBlockStore](#aws-ebs)
* [GCEPersistentDisk](#gce-pd)
* [AzureDisk](#azure-disk)
- [AWSElasticBlockStore](#aws-ebs)
- [GCEPersistentDisk](#gce-pd)
- [AzureDisk](#azure-disk)
The following plugins support `WaitForFirstConsumer` with pre-created PersistentVolume binding:
* All of the above
* [Local](#local)
- All of the above
- [Local](#local)
{{< feature-state state="stable" for_k8s_version="v1.17" >}}
[CSI volumes](/docs/concepts/storage/volumes/#csi) are also supported with dynamic provisioning
@ -183,10 +182,10 @@ and pre-created PVs, but you'll need to look at the documentation for a specific
to see its supported topology keys and examples.
{{< note >}}
If you choose to use `WaitForFirstConsumer`, do not use `nodeName` in the Pod spec
to specify node affinity. If `nodeName` is used in this case, the scheduler will be bypassed and PVC will remain in `pending` state.
If you choose to use `WaitForFirstConsumer`, do not use `nodeName` in the Pod spec
to specify node affinity. If `nodeName` is used in this case, the scheduler will be bypassed and PVC will remain in `pending` state.
Instead, you can use node selector for hostname in this case as shown below.
Instead, you can use node selector for hostname in this case as shown below.
{{< /note >}}
```yaml
@ -243,7 +242,7 @@ allowedTopologies:
Storage Classes have parameters that describe volumes belonging to the storage
class. Different parameters may be accepted depending on the `provisioner`. For
example, the value `io1`, for the parameter `type`, and the parameter
example, the value `io1`, for the parameter `type`, and the parameter
`iopsPerGB` are specific to EBS. When a parameter is omitted, some default is
used.
@ -265,26 +264,26 @@ parameters:
fsType: ext4
```
* `type`: `io1`, `gp2`, `sc1`, `st1`. See
- `type`: `io1`, `gp2`, `sc1`, `st1`. See
[AWS docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html)
for details. Default: `gp2`.
* `zone` (Deprecated): AWS zone. If neither `zone` nor `zones` is specified, volumes are
- `zone` (Deprecated): AWS zone. If neither `zone` nor `zones` is specified, volumes are
generally round-robin-ed across all active zones where Kubernetes cluster
has a node. `zone` and `zones` parameters must not be used at the same time.
* `zones` (Deprecated): A comma separated list of AWS zone(s). If neither `zone` nor `zones`
- `zones` (Deprecated): A comma separated list of AWS zone(s). If neither `zone` nor `zones`
is specified, volumes are generally round-robin-ed across all active zones
where Kubernetes cluster has a node. `zone` and `zones` parameters must not
be used at the same time.
* `iopsPerGB`: only for `io1` volumes. I/O operations per second per GiB. AWS
- `iopsPerGB`: only for `io1` volumes. I/O operations per second per GiB. AWS
volume plugin multiplies this with size of requested volume to compute IOPS
of the volume and caps it at 20 000 IOPS (maximum supported by AWS, see
[AWS docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html)).
A string is expected here, i.e. `"10"`, not `10`.
* `fsType`: fsType that is supported by kubernetes. Default: `"ext4"`.
* `encrypted`: denotes whether the EBS volume should be encrypted or not.
- `fsType`: fsType that is supported by kubernetes. Default: `"ext4"`.
- `encrypted`: denotes whether the EBS volume should be encrypted or not.
Valid values are `"true"` or `"false"`. A string is expected here,
i.e. `"true"`, not `true`.
* `kmsKeyId`: optional. The full Amazon Resource Name of the key to use when
- `kmsKeyId`: optional. The full Amazon Resource Name of the key to use when
encrypting the volume. If none is supplied but `encrypted` is true, a key is
generated by AWS. See AWS docs for valid ARN value.
@ -307,17 +306,17 @@ parameters:
replication-type: none
```
* `type`: `pd-standard` or `pd-ssd`. Default: `pd-standard`
* `zone` (Deprecated): GCE zone. If neither `zone` nor `zones` is specified, volumes are
- `type`: `pd-standard` or `pd-ssd`. Default: `pd-standard`
- `zone` (Deprecated): GCE zone. If neither `zone` nor `zones` is specified, volumes are
generally round-robin-ed across all active zones where Kubernetes cluster has
a node. `zone` and `zones` parameters must not be used at the same time.
* `zones` (Deprecated): A comma separated list of GCE zone(s). If neither `zone` nor `zones`
- `zones` (Deprecated): A comma separated list of GCE zone(s). If neither `zone` nor `zones`
is specified, volumes are generally round-robin-ed across all active zones
where Kubernetes cluster has a node. `zone` and `zones` parameters must not
be used at the same time.
* `fstype`: `ext4` or `xfs`. Default: `ext4`. The defined filesystem type must be supported by the host operating system.
- `fstype`: `ext4` or `xfs`. Default: `ext4`. The defined filesystem type must be supported by the host operating system.
* `replication-type`: `none` or `regional-pd`. Default: `none`.
- `replication-type`: `none` or `regional-pd`. Default: `none`.
If `replication-type` is set to `none`, a regular (zonal) PD will be provisioned.
@ -350,14 +349,15 @@ parameters:
readOnly: "false"
```
* `server`: Server is the hostname or IP address of the NFS server.
* `path`: Path that is exported by the NFS server.
* `readOnly`: A flag indicating whether the storage will be mounted as read only (default false).
- `server`: Server is the hostname or IP address of the NFS server.
- `path`: Path that is exported by the NFS server.
- `readOnly`: A flag indicating whether the storage will be mounted as read only (default false).
Kubernetes doesn't include an internal NFS provisioner. You need to use an external provisioner to create a StorageClass for NFS.
Here are some examples:
* [NFS Ganesha server and external provisioner](https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner)
* [NFS subdir external provisioner](https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner)
- [NFS Ganesha server and external provisioner](https://github.com/kubernetes-sigs/nfs-ganesha-server-and-external-provisioner)
- [NFS subdir external provisioner](https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner)
### OpenStack Cinder
@ -371,7 +371,7 @@ parameters:
availability: nova
```
* `availability`: Availability Zone. If not specified, volumes are generally
- `availability`: Availability Zone. If not specified, volumes are generally
round-robin-ed across all active zones where Kubernetes cluster has a node.
{{< note >}}
@ -381,7 +381,7 @@ This internal provisioner of OpenStack is deprecated. Please use [the external c
### vSphere
There are two types of provisioners for vSphere storage classes:
There are two types of provisioners for vSphere storage classes:
- [CSI provisioner](#vsphere-provisioner-csi): `csi.vsphere.vmware.com`
- [vCP provisioner](#vcp-provisioner): `kubernetes.io/vsphere-volume`
@ -392,73 +392,73 @@ In-tree provisioners are [deprecated](/blog/2019/12/09/kubernetes-1-17-feature-c
The vSphere CSI StorageClass provisioner works with Tanzu Kubernetes clusters. For an example, refer to the [vSphere CSI repository](https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/master/example/vanilla-k8s-RWM-filesystem-volumes/example-sc.yaml).
#### vCP Provisioner
#### vCP Provisioner
The following examples use the VMware Cloud Provider (vCP) StorageClass provisioner.
The following examples use the VMware Cloud Provider (vCP) StorageClass provisioner.
1. Create a StorageClass with a user specified disk format.
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: zeroedthick
```
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: zeroedthick
```
`diskformat`: `thin`, `zeroedthick` and `eagerzeroedthick`. Default: `"thin"`.
`diskformat`: `thin`, `zeroedthick` and `eagerzeroedthick`. Default: `"thin"`.
2. Create a StorageClass with a disk format on a user specified datastore.
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: zeroedthick
datastore: VSANDatastore
```
```yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/vsphere-volume
parameters:
diskformat: zeroedthick
datastore: VSANDatastore
```
`datastore`: The user can also specify the datastore in the StorageClass.
The volume will be created on the datastore specified in the StorageClass,
which in this case is `VSANDatastore`. This field is optional. If the
datastore is not specified, then the volume will be created on the datastore
specified in the vSphere config file used to initialize the vSphere Cloud
Provider.
`datastore`: The user can also specify the datastore in the StorageClass.
The volume will be created on the datastore specified in the StorageClass,
which in this case is `VSANDatastore`. This field is optional. If the
datastore is not specified, then the volume will be created on the datastore
specified in the vSphere config file used to initialize the vSphere Cloud
Provider.
3. Storage Policy Management inside kubernetes
* Using existing vCenter SPBM policy
- Using existing vCenter SPBM policy
One of the most important features of vSphere for Storage Management is
policy based Management. Storage Policy Based Management (SPBM) is a
storage policy framework that provides a single unified control plane
across a broad range of data services and storage solutions. SPBM enables
vSphere administrators to overcome upfront storage provisioning challenges,
such as capacity planning, differentiated service levels and managing
capacity headroom.
One of the most important features of vSphere for Storage Management is
policy based Management. Storage Policy Based Management (SPBM) is a
storage policy framework that provides a single unified control plane
across a broad range of data services and storage solutions. SPBM enables
vSphere administrators to overcome upfront storage provisioning challenges,
such as capacity planning, differentiated service levels and managing
capacity headroom.
The SPBM policies can be specified in the StorageClass using the
`storagePolicyName` parameter.
The SPBM policies can be specified in the StorageClass using the
`storagePolicyName` parameter.
* Virtual SAN policy support inside Kubernetes
- Virtual SAN policy support inside Kubernetes
Vsphere Infrastructure (VI) Admins will have the ability to specify custom
Virtual SAN Storage Capabilities during dynamic volume provisioning. You
can now define storage requirements, such as performance and availability,
in the form of storage capabilities during dynamic volume provisioning.
The storage capability requirements are converted into a Virtual SAN
policy which are then pushed down to the Virtual SAN layer when a
persistent volume (virtual disk) is being created. The virtual disk is
distributed across the Virtual SAN datastore to meet the requirements.
Vsphere Infrastructure (VI) Admins will have the ability to specify custom
Virtual SAN Storage Capabilities during dynamic volume provisioning. You
can now define storage requirements, such as performance and availability,
in the form of storage capabilities during dynamic volume provisioning.
The storage capability requirements are converted into a Virtual SAN
policy which are then pushed down to the Virtual SAN layer when a
persistent volume (virtual disk) is being created. The virtual disk is
distributed across the Virtual SAN datastore to meet the requirements.
You can see [Storage Policy Based Management for dynamic provisioning of volumes](https://github.com/vmware-archive/vsphere-storage-for-kubernetes/blob/fa4c8b8ad46a85b6555d715dd9d27ff69839df53/documentation/policy-based-mgmt.md)
for more details on how to use storage policies for persistent volumes
management.
You can see [Storage Policy Based Management for dynamic provisioning of volumes](https://github.com/vmware-archive/vsphere-storage-for-kubernetes/blob/fa4c8b8ad46a85b6555d715dd9d27ff69839df53/documentation/policy-based-mgmt.md)
for more details on how to use storage policies for persistent volumes
management.
There are few
[vSphere examples](https://github.com/kubernetes/examples/tree/master/staging/volumes/vsphere)
@ -486,29 +486,30 @@ parameters:
imageFeatures: "layering"
```
* `monitors`: Ceph monitors, comma delimited. This parameter is required.
* `adminId`: Ceph client ID that is capable of creating images in the pool.
- `monitors`: Ceph monitors, comma delimited. This parameter is required.
- `adminId`: Ceph client ID that is capable of creating images in the pool.
Default is "admin".
* `adminSecretName`: Secret Name for `adminId`. This parameter is required.
- `adminSecretName`: Secret Name for `adminId`. This parameter is required.
The provided secret must have type "kubernetes.io/rbd".
* `adminSecretNamespace`: The namespace for `adminSecretName`. Default is "default".
* `pool`: Ceph RBD pool. Default is "rbd".
* `userId`: Ceph client ID that is used to map the RBD image. Default is the
- `adminSecretNamespace`: The namespace for `adminSecretName`. Default is "default".
- `pool`: Ceph RBD pool. Default is "rbd".
- `userId`: Ceph client ID that is used to map the RBD image. Default is the
same as `adminId`.
* `userSecretName`: The name of Ceph Secret for `userId` to map RBD image. It
- `userSecretName`: The name of Ceph Secret for `userId` to map RBD image. It
must exist in the same namespace as PVCs. This parameter is required.
The provided secret must have type "kubernetes.io/rbd", for example created in this
way:
```shell
kubectl create secret generic ceph-secret --type="kubernetes.io/rbd" \
--from-literal=key='QVFEQ1pMdFhPUnQrSmhBQUFYaERWNHJsZ3BsMmNjcDR6RFZST0E9PQ==' \
--namespace=kube-system
```
* `userSecretNamespace`: The namespace for `userSecretName`.
* `fsType`: fsType that is supported by kubernetes. Default: `"ext4"`.
* `imageFormat`: Ceph RBD image format, "1" or "2". Default is "2".
* `imageFeatures`: This parameter is optional and should only be used if you
```shell
kubectl create secret generic ceph-secret --type="kubernetes.io/rbd" \
--from-literal=key='QVFEQ1pMdFhPUnQrSmhBQUFYaERWNHJsZ3BsMmNjcDR6RFZST0E9PQ==' \
--namespace=kube-system
```
- `userSecretNamespace`: The namespace for `userSecretName`.
- `fsType`: fsType that is supported by kubernetes. Default: `"ext4"`.
- `imageFormat`: Ceph RBD image format, "1" or "2". Default is "2".
- `imageFeatures`: This parameter is optional and should only be used if you
set `imageFormat` to "2". Currently supported features are `layering` only.
Default is "", and no features are turned on.
@ -528,9 +529,9 @@ parameters:
storageAccount: azure_storage_account_name
```
* `skuName`: Azure storage account Sku tier. Default is empty.
* `location`: Azure storage account location. Default is empty.
* `storageAccount`: Azure storage account name. If a storage account is provided,
- `skuName`: Azure storage account Sku tier. Default is empty.
- `location`: Azure storage account location. Default is empty.
- `storageAccount`: Azure storage account name. If a storage account is provided,
it must reside in the same resource group as the cluster, and `location` is
ignored. If a storage account is not provided, a new storage account will be
created in the same resource group as the cluster.
@ -548,21 +549,21 @@ parameters:
kind: managed
```
* `storageaccounttype`: Azure storage account Sku tier. Default is empty.
* `kind`: Possible values are `shared`, `dedicated`, and `managed` (default).
- `storageaccounttype`: Azure storage account Sku tier. Default is empty.
- `kind`: Possible values are `shared`, `dedicated`, and `managed` (default).
When `kind` is `shared`, all unmanaged disks are created in a few shared
storage accounts in the same resource group as the cluster. When `kind` is
`dedicated`, a new dedicated storage account will be created for the new
unmanaged disk in the same resource group as the cluster. When `kind` is
`managed`, all managed disks are created in the same resource group as
unmanaged disk in the same resource group as the cluster. When `kind` is
`managed`, all managed disks are created in the same resource group as
the cluster.
* `resourceGroup`: Specify the resource group in which the Azure disk will be created.
It must be an existing resource group name. If it is unspecified, the disk will be
placed in the same resource group as the current Kubernetes cluster.
- `resourceGroup`: Specify the resource group in which the Azure disk will be created.
It must be an existing resource group name. If it is unspecified, the disk will be
placed in the same resource group as the current Kubernetes cluster.
- Premium VM can attach both Standard_LRS and Premium_LRS disks, while Standard
* Premium VM can attach both Standard_LRS and Premium_LRS disks, while Standard
VM can only attach Standard_LRS disks.
- Managed VM can only attach managed disks and unmanaged VM can only attach
* Managed VM can only attach managed disks and unmanaged VM can only attach
unmanaged disks.
### Azure File
@ -579,29 +580,29 @@ parameters:
storageAccount: azure_storage_account_name
```
* `skuName`: Azure storage account Sku tier. Default is empty.
* `location`: Azure storage account location. Default is empty.
* `storageAccount`: Azure storage account name. Default is empty. If a storage
- `skuName`: Azure storage account Sku tier. Default is empty.
- `location`: Azure storage account location. Default is empty.
- `storageAccount`: Azure storage account name. Default is empty. If a storage
account is not provided, all storage accounts associated with the resource
group are searched to find one that matches `skuName` and `location`. If a
storage account is provided, it must reside in the same resource group as the
cluster, and `skuName` and `location` are ignored.
* `secretNamespace`: the namespace of the secret that contains the Azure Storage
- `secretNamespace`: the namespace of the secret that contains the Azure Storage
Account Name and Key. Default is the same as the Pod.
* `secretName`: the name of the secret that contains the Azure Storage Account Name and
- `secretName`: the name of the secret that contains the Azure Storage Account Name and
Key. Default is `azure-storage-account-<accountName>-secret`
* `readOnly`: a flag indicating whether the storage will be mounted as read only.
Defaults to false which means a read/write mount. This setting will impact the
- `readOnly`: a flag indicating whether the storage will be mounted as read only.
Defaults to false which means a read/write mount. This setting will impact the
`ReadOnly` setting in VolumeMounts as well.
During storage provisioning, a secret named by `secretName` is created for the
mounting credentials. If the cluster has enabled both
[RBAC](/docs/reference/access-authn-authz/rbac/) and
During storage provisioning, a secret named by `secretName` is created for the
mounting credentials. If the cluster has enabled both
[RBAC](/docs/reference/access-authn-authz/rbac/) and
[Controller Roles](/docs/reference/access-authn-authz/rbac/#controller-roles),
add the `create` permission of resource `secret` for clusterrole
`system:controller:persistent-volume-binder`.
In a multi-tenancy context, it is strongly recommended to set the value for
In a multi-tenancy context, it is strongly recommended to set the value for
`secretNamespace` explicitly, otherwise the storage account credentials may
be read by other users.
@ -615,26 +616,25 @@ metadata:
provisioner: kubernetes.io/portworx-volume
parameters:
repl: "1"
snap_interval: "70"
priority_io: "high"
snap_interval: "70"
priority_io: "high"
```
* `fs`: filesystem to be laid out: `none/xfs/ext4` (default: `ext4`).
* `block_size`: block size in Kbytes (default: `32`).
* `repl`: number of synchronous replicas to be provided in the form of
- `fs`: filesystem to be laid out: `none/xfs/ext4` (default: `ext4`).
- `block_size`: block size in Kbytes (default: `32`).
- `repl`: number of synchronous replicas to be provided in the form of
replication factor `1..3` (default: `1`) A string is expected here i.e.
`"1"` and not `1`.
* `priority_io`: determines whether the volume will be created from higher
- `priority_io`: determines whether the volume will be created from higher
performance or a lower priority storage `high/medium/low` (default: `low`).
* `snap_interval`: clock/time interval in minutes for when to trigger snapshots.
- `snap_interval`: clock/time interval in minutes for when to trigger snapshots.
Snapshots are incremental based on difference with the prior snapshot, 0
disables snaps (default: `0`). A string is expected here i.e.
`"70"` and not `70`.
* `aggregation_level`: specifies the number of chunks the volume would be
- `aggregation_level`: specifies the number of chunks the volume would be
distributed into, 0 indicates a non-aggregated volume (default: `0`). A string
is expected here i.e. `"0"` and not `0`
* `ephemeral`: specifies whether the volume should be cleaned-up after unmount
- `ephemeral`: specifies whether the volume should be cleaned-up after unmount
or should be persistent. `emptyDir` use case can set this value to true and
`persistent volumes` use case such as for databases like Cassandra should set
to false, `true/false` (default `false`). A string is expected here i.e.
@ -660,4 +660,3 @@ specified by the `WaitForFirstConsumer` volume binding mode.
Delaying volume binding allows the scheduler to consider all of a Pod's
scheduling constraints when choosing an appropriate PersistentVolume for a
PersistentVolumeClaim.

View File

@ -227,7 +227,7 @@ $ kubectl get crd volumesnapshotcontent -o yaml
If you want to allow users to create a `PersistentVolumeClaim` from an existing
`VolumeSnapshot`, but with a different volume mode than the source, the annotation
`snapshot.storage.kubernetes.io/allowVolumeModeChange: "true"`needs to be added to
`snapshot.storage.kubernetes.io/allow-volume-mode-change: "true"`needs to be added to
the `VolumeSnapshotContent` that corresponds to the `VolumeSnapshot`.
For pre-provisioned snapshots, `spec.sourceVolumeMode` needs to be populated
@ -241,7 +241,7 @@ kind: VolumeSnapshotContent
metadata:
name: new-snapshot-content-test
annotations:
- snapshot.storage.kubernetes.io/allowVolumeModeChange: "true"
- snapshot.storage.kubernetes.io/allow-volume-mode-change: "true"
spec:
deletionPolicy: Delete
driver: hostpath.csi.k8s.io

View File

@ -1282,8 +1282,13 @@ in `Container.volumeMounts`. Its values are:
In similar fashion, no mounts created by the container will be visible on
the host. This is the default mode.
This mode is equal to `private` mount propagation as described in the
[Linux kernel documentation](https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt)
This mode is equal to `rprivate` mount propagation as described in
[`mount(8)`](https://man7.org/linux/man-pages/man8/mount.8.html)
However, the CRI runtime may choose `rslave` mount propagation (i.e.,
`HostToContainer`) instead, when `rprivate` propagation is not applicable.
cri-dockerd (Docker) is known to choose `rslave` mount propagation when the
mount source contains the Docker daemon's root directory (`/var/lib/docker`).
* `HostToContainer` - This volume mount will receive all subsequent mounts
that are mounted to this volume or any of its subdirectories.
@ -1296,7 +1301,7 @@ in `Container.volumeMounts`. Its values are:
propagation will see it.
This mode is equal to `rslave` mount propagation as described in the
[Linux kernel documentation](https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt)
[`mount(8)`](https://man7.org/linux/man-pages/man8/mount.8.html)
* `Bidirectional` - This volume mount behaves the same the `HostToContainer` mount.
In addition, all volume mounts created by the container will be propagated
@ -1306,7 +1311,7 @@ in `Container.volumeMounts`. Its values are:
a Pod that needs to mount something on the host using a `hostPath` volume.
This mode is equal to `rshared` mount propagation as described in the
[Linux kernel documentation](https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt)
[`mount(8)`](https://man7.org/linux/man-pages/man8/mount.8.html)
{{< warning >}}
`Bidirectional` mount propagation can be dangerous. It can damage

View File

@ -274,7 +274,8 @@ This functionality requires a container runtime that supports this functionality
#### Field compatibility for Pod security context {#compatibility-v1-pod-spec-containers-securitycontext}
None of the Pod [`securityContext`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context) fields work on Windows.
Only the `securityContext.runAsNonRoot` and `securityContext.windowsOptions` from the Pod
[`securityContext`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context) fields work on Windows.
## Node problem detector

View File

@ -105,30 +105,24 @@ If you do not specify either, then the DaemonSet controller will create Pods on
## How Daemon Pods are scheduled
### Scheduled by default scheduler
A DaemonSet ensures that all eligible nodes run a copy of a Pod. The DaemonSet
controller creates a Pod for each eligible node and adds the
`spec.affinity.nodeAffinity` field of the Pod to match the target host. After
the Pod is created, the default scheduler typically takes over and then binds
the Pod to the target host by setting the `.spec.nodeName` field. If the new
Pod cannot fit on the node, the default scheduler may preempt (evict) some of
the existing Pods based on the
[priority](/docs/concepts/scheduling-eviction/pod-priority-preemption/#pod-priority)
of the new Pod.
{{< feature-state for_k8s_version="1.17" state="stable" >}}
The user can specify a different scheduler for the Pods of the DamonSet, by
setting the `.spec.template.spec.schedulerName` field of the DaemonSet.
A DaemonSet ensures that all eligible nodes run a copy of a Pod. Normally, the
node that a Pod runs on is selected by the Kubernetes scheduler. However,
DaemonSet pods are created and scheduled by the DaemonSet controller instead.
That introduces the following issues:
* Inconsistent Pod behavior: Normal Pods waiting to be scheduled are created
and in `Pending` state, but DaemonSet pods are not created in `Pending`
state. This is confusing to the user.
* [Pod preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption/)
is handled by default scheduler. When preemption is enabled, the DaemonSet controller
will make scheduling decisions without considering pod priority and preemption.
`ScheduleDaemonSetPods` allows you to schedule DaemonSets using the default
scheduler instead of the DaemonSet controller, by adding the `NodeAffinity` term
to the DaemonSet pods, instead of the `.spec.nodeName` term. The default
scheduler is then used to bind the pod to the target host. If node affinity of
the DaemonSet pod already exists, it is replaced (the original node affinity was
taken into account before selecting the target host). The DaemonSet controller only
performs these operations when creating or modifying DaemonSet pods, and no
changes are made to the `spec.template` of the DaemonSet.
The original node affinity specified at the
`.spec.template.spec.affinity.nodeAffinity` field (if specified) is taken into
consideration by the DaemonSet controller when evaluating the eligible nodes,
but is replaced on the created Pod with the node affinity that matches the name
of the eligible node.
```yaml
nodeAffinity:
@ -141,25 +135,40 @@ nodeAffinity:
- target-host-name
```
In addition, `node.kubernetes.io/unschedulable:NoSchedule` toleration is added
automatically to DaemonSet Pods. The default scheduler ignores
`unschedulable` Nodes when scheduling DaemonSet Pods.
### Taints and Tolerations
### Taints and tolerations
Although Daemon Pods respect
[taints and tolerations](/docs/concepts/scheduling-eviction/taint-and-toleration/),
the following tolerations are added to DaemonSet Pods automatically according to
the related features.
The DaemonSet controller automatically adds a set of {{< glossary_tooltip
text="tolerations" term_id="toleration" >}} to DaemonSet Pods:
| Toleration Key | Effect | Version | Description |
| ---------------------------------------- | ---------- | ------- | ----------- |
| `node.kubernetes.io/not-ready` | NoExecute | 1.13+ | DaemonSet pods will not be evicted when there are node problems such as a network partition. |
| `node.kubernetes.io/unreachable` | NoExecute | 1.13+ | DaemonSet pods will not be evicted when there are node problems such as a network partition. |
| `node.kubernetes.io/disk-pressure` | NoSchedule | 1.8+ | DaemonSet pods tolerate disk-pressure attributes by default scheduler. |
| `node.kubernetes.io/memory-pressure` | NoSchedule | 1.8+ | DaemonSet pods tolerate memory-pressure attributes by default scheduler. |
| `node.kubernetes.io/unschedulable` | NoSchedule | 1.12+ | DaemonSet pods tolerate unschedulable attributes by default scheduler. |
| `node.kubernetes.io/network-unavailable` | NoSchedule | 1.12+ | DaemonSet pods, who uses host network, tolerate network-unavailable attributes by default scheduler. |
{{< table caption="Tolerations for DaemonSet pods" >}}
| Toleration key | Effect | Details |
| --------------------------------------------------------------------------------------------------------------------- | ------------ | --------------------------------------------------------------------------------------------------------------------------------------------- |
| [`node.kubernetes.io/not-ready`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-not-ready) | `NoExecute` | DaemonSet Pods can be scheduled onto nodes that are not healthy or ready to accept Pods. Any DaemonSet Pods running on such nodes will not be evicted. |
| [`node.kubernetes.io/unreachable`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-unreachable) | `NoExecute` | DaemonSet Pods can be scheduled onto nodes that are unreachable from the node controller. Any DaemonSet Pods running on such nodes will not be evicted. |
| [`node.kubernetes.io/disk-pressure`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-disk-pressure) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes with disk pressure issues. |
| [`node.kubernetes.io/memory-pressure`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-memory-pressure) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes with memory pressure issues. |
| [`node.kubernetes.io/pid-pressure`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-pid-pressure) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes with process pressure issues. |
| [`node.kubernetes.io/unschedulable`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-unschedulable) | `NoSchedule` | DaemonSet Pods can be scheduled onto nodes that are unschedulable. |
| [`node.kubernetes.io/network-unavailable`](/docs/reference/labels-annotations-taints/#node-kubernetes-io-network-unavailable) | `NoSchedule` | **Only added for DaemonSet Pods that request host networking**, i.e., Pods having `spec.hostNetwork: true`. Such DaemonSet Pods can be scheduled onto nodes with unavailable network.|
{{< /table >}}
You can add your own tolerations to the Pods of a Daemonset as well, by
defining these in the Pod template of the DaemonSet.
Because the DaemonSet controller sets the
`node.kubernetes.io/unschedulable:NoSchedule` toleration automatically,
Kubernetes can run DaemonSet Pods on nodes that are marked as _unschedulable_.
If you use a DaemonSet to provide an important node-level function, such as
[cluster networking](/docs/concepts/cluster-administration/networking/), it is
helpful that Kubernetes places DaemonSet Pods on nodes before they are ready.
For example, without that special toleration, you could end up in a deadlock
situation where the node is not marked as ready because the network plugin is
not running there, and at the same time the network plugin is not running on
that node because the node is not yet ready.
## Communicating with Daemon Pods

View File

@ -794,7 +794,7 @@ These are some requirements and semantics of the API:
are evaluated in order. Once a rule matches a Pod failure, the remaining rules
are ignored. When no rule matches the Pod failure, the default
handling applies.
- you may want to restrict a rule to a specific container by specifing its name
- you may want to restrict a rule to a specific container by specifying its name
in`spec.podFailurePolicy.rules[*].containerName`. When not specified the rule
applies to all containers. When specified, it should match one the container
or `initContainer` names in the Pod template.

View File

@ -69,7 +69,7 @@ kubectl get rs
And see the frontend one you created:
```shell
```
NAME DESIRED CURRENT READY AGE
frontend 3 3 3 6s
```
@ -118,7 +118,7 @@ kubectl get pods
You should see Pod information similar to:
```shell
```
NAME READY STATUS RESTARTS AGE
frontend-b2zdv 1/1 Running 0 6m36s
frontend-vcmts 1/1 Running 0 6m36s
@ -160,7 +160,7 @@ While you can create bare Pods with no problems, it is strongly recommended to m
labels which match the selector of one of your ReplicaSets. The reason for this is because a ReplicaSet is not limited
to owning Pods specified by its template-- it can acquire other Pods in the manner specified in the previous sections.
Take the previous frontend ReplicaSet example, and the Pods specified in the following manifest:
Take the previous frontend ReplicaSet example, and the Pods specified in the following manifest:
{{< codenew file="pods/pod-rs.yaml" >}}
@ -229,9 +229,9 @@ As with all other Kubernetes API objects, a ReplicaSet needs the `apiVersion`, `
For ReplicaSets, the `kind` is always a ReplicaSet.
When the control plane creates new Pods for a ReplicaSet, the `.metadata.name` of the
ReplicaSet is part of the basis for naming those Pods. The name of a ReplicaSet must be a valid
ReplicaSet is part of the basis for naming those Pods. The name of a ReplicaSet must be a valid
[DNS subdomain](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names)
value, but this can produce unexpected results for the Pod hostnames. For best compatibility,
value, but this can produce unexpected results for the Pod hostnames. For best compatibility,
the name should follow the more restrictive rules for a
[DNS label](/docs/concepts/overview/working-with-objects/names#dns-label-names).
@ -288,8 +288,8 @@ When using the REST API or the `client-go` library, you must set `propagationPol
```shell
kubectl proxy --port=8080
curl -X DELETE 'localhost:8080/apis/apps/v1/namespaces/default/replicasets/frontend' \
> -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' \
> -H "Content-Type: application/json"
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Foreground"}' \
-H "Content-Type: application/json"
```
### Deleting just a ReplicaSet
@ -303,11 +303,11 @@ For example:
```shell
kubectl proxy --port=8080
curl -X DELETE 'localhost:8080/apis/apps/v1/namespaces/default/replicasets/frontend' \
> -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Orphan"}' \
> -H "Content-Type: application/json"
-d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Orphan"}' \
-H "Content-Type: application/json"
```
Once the original is deleted, you can create a new ReplicaSet to replace it. As long
Once the original is deleted, you can create a new ReplicaSet to replace it. As long
as the old and new `.spec.selector` are the same, then the new one will adopt the old Pods.
However, it will not make any effort to make existing Pods match a new, different pod template.
To update Pods to a new spec in a controlled way, use a
@ -335,19 +335,19 @@ prioritize scaling down pods based on the following general algorithm:
1. If the pods' creation times differ, the pod that was created more recently
comes before the older pod (the creation times are bucketed on an integer log scale
when the `LogarithmicScaleDown` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled)
If all of the above match, then selection is random.
### Pod deletion cost
### Pod deletion cost
{{< feature-state for_k8s_version="v1.22" state="beta" >}}
Using the [`controller.kubernetes.io/pod-deletion-cost`](/docs/reference/labels-annotations-taints/#pod-deletion-cost)
Using the [`controller.kubernetes.io/pod-deletion-cost`](/docs/reference/labels-annotations-taints/#pod-deletion-cost)
annotation, users can set a preference regarding which pods to remove first when downscaling a ReplicaSet.
The annotation should be set on the pod, the range is [-2147483647, 2147483647]. It represents the cost of
deleting a pod compared to other pods belonging to the same ReplicaSet. Pods with lower deletion
cost are preferred to be deleted before pods with higher deletion cost.
cost are preferred to be deleted before pods with higher deletion cost.
The implicit value for this annotation for pods that don't set it is 0; negative values are permitted.
Invalid values will be rejected by the API server.
@ -360,13 +360,13 @@ This feature is beta and enabled by default. You can disable it using the
- This is honored on a best-effort basis, so it does not offer any guarantees on pod deletion order.
- Users should avoid updating the annotation frequently, such as updating it based on a metric value,
because doing so will generate a significant number of pod updates on the apiserver.
{{< /note >}}
{{< /note >}}
#### Example Use Case
The different pods of an application could have different utilization levels. On scale down, the application
The different pods of an application could have different utilization levels. On scale down, the application
may prefer to remove the pods with lower utilization. To avoid frequently updating the pods, the application
should update `controller.kubernetes.io/pod-deletion-cost` once before issuing a scale down (setting the
should update `controller.kubernetes.io/pod-deletion-cost` once before issuing a scale down (setting the
annotation to a value proportional to pod utilization level). This works if the application itself controls
the down scaling; for example, the driver pod of a Spark deployment.
@ -400,7 +400,7 @@ kubectl autoscale rs frontend --max=10 --min=3 --cpu-percent=50
[`Deployment`](/docs/concepts/workloads/controllers/deployment/) is an object which can own ReplicaSets and update
them and their Pods via declarative, server-side rolling updates.
While ReplicaSets can be used independently, today they're mainly used by Deployments as a mechanism to orchestrate Pod
While ReplicaSets can be used independently, today they're mainly used by Deployments as a mechanism to orchestrate Pod
creation, deletion and updates. When you use Deployments you don't have to worry about managing the ReplicaSets that
they create. Deployments own and manage their ReplicaSets.
As such, it is recommended to use Deployments when you want ReplicaSets.
@ -422,7 +422,7 @@ expected to terminate on their own (that is, batch jobs).
### DaemonSet
Use a [`DaemonSet`](/docs/concepts/workloads/controllers/daemonset/) instead of a ReplicaSet for Pods that provide a
machine-level function, such as machine monitoring or machine logging. These Pods have a lifetime that is tied
machine-level function, such as machine monitoring or machine logging. These Pods have a lifetime that is tied
to a machine lifetime: the Pod needs to be running on the machine before other Pods start, and are
safe to terminate when the machine is otherwise ready to be rebooted/shutdown.
@ -444,4 +444,3 @@ As such, ReplicaSets are preferred over ReplicationControllers
object definition to understand the API for replica sets.
* Read about [PodDisruptionBudget](/docs/concepts/workloads/pods/disruptions/) and how
you can use it to manage application availability during disruptions.

View File

@ -289,14 +289,31 @@ section.
## Privileged mode for containers
In Linux, any container in a Pod can enable privileged mode using the `privileged` (Linux) flag on the [security context](/docs/tasks/configure-pod-container/security-context/) of the container spec. This is useful for containers that want to use operating system administrative capabilities such as manipulating the network stack or accessing hardware devices.
If your cluster has the `WindowsHostProcessContainers` feature enabled, you can create a [Windows HostProcess pod](/docs/tasks/configure-pod-container/create-hostprocess-pod) by setting the `windowsOptions.hostProcess` flag on the security context of the pod spec. All containers in these pods must run as Windows HostProcess containers. HostProcess pods run directly on the host and can also be used to perform administrative tasks as is done with Linux privileged containers.
{{< note >}}
Your {{< glossary_tooltip text="container runtime" term_id="container-runtime" >}} must support the concept of a privileged container for this setting to be relevant.
{{< /note >}}
Any container in a pod can run in privileged mode to use operating system administrative capabilities
that would otherwise be inaccessible. This is available for both Windows and Linux.
### Linux priviledged containers
In Linux, any container in a Pod can enable privileged mode using the `privileged` (Linux) flag
on the [security context](/docs/tasks/configure-pod-container/security-context/) of the
container spec. This is useful for containers that want to use operating system administrative
capabilities such as manipulating the network stack or accessing hardware devices.
### Windows priviledged containers
{{< feature-state for_k8s_version="v1.26" state="stable" >}}
In Windows, you can create a [Windows HostProcess pod](/docs/tasks/configure-pod-container/create-hostprocess-pod)
by setting the `windowsOptions.hostProcess` flag on the security context of the pod spec. All containers in these
pods must run as Windows HostProcess containers. HostProcess pods run directly on the host and can also be used
to perform administrative tasks as is done with Linux privileged containers. In order to use this feature, the
`WindowsHostProcessContainers` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) must be enabled.
## Static Pods
_Static Pods_ are managed directly by the kubelet daemon on a specific node,

View File

@ -267,6 +267,11 @@ after successful sandbox creation and network configuration by the runtime
plugin). For a Pod without init containers, the kubelet sets the `Initialized`
condition to `True` before sandbox creation and network configuration starts.
### Pod scheduling readiness {#pod-scheduling-readiness-gate}
{{< feature-state for_k8s_version="v1.26" state="alpha" >}}
See [Pod Scheduling Readiness](/docs/concepts/scheduling-eviction/pod-scheduling-readiness/) for more information.
## Container probes

View File

@ -0,0 +1,117 @@
---
title: Pod Quality of Service Classes
content_type: concept
weight: 85
---
<!-- overview -->
This page introduces _Quality of Service (QoS) classes_ in Kubernetes, and explains
how Kubernetes assigns a QoS class to each Pods as a consequence of the resource
constraints that you specify for the containers in that Pod. Kubernetes relies on this
classification to make decisions about which Pods to evict when there are not enough
available resources on a Node.
<!-- body -->
## Quality of Service classes
Kubernetes classifies the Pods that you run and allocates each Pod into a specific
_quality of service (QoS) class_. Kubernetes uses that classification to influence how different
pods are handled. Kubernetes does this classification based on the
[resource requests](/docs/concepts/configuration/manage-resources-containers/)
of the {{< glossary_tooltip text="Containers" term_id="container" >}} in that Pod, along with
how those requests relate to resource limits.
This is known as {{< glossary_tooltip text="Quality of Service" term_id="qos-class" >}}
(QoS) class. Kubernetes assigns every Pod a QoS class based on the resource requests
and limits of its component Containers. QoS classes are used by Kubernetes to decide
which Pods to evict from a Node experiencing
[Node Pressure](/docs/concepts/scheduling-eviction/node-pressure-eviction/). The possible
QoS classes are `Guaranteed`, `Burstable`, and `BestEffort`. When a Node runs out of resources,
Kubernetes will first evict `BestEffort` Pods running on that Node, followed by `Burstable` and
finally `Guaranteed` Pods. When this eviction is due to resource pressure, only Pods exceeding
resource requests are candidates for eviction.
### Guaranteed
Pods that are `Guaranteed` have the strictest resource limits and are least likely
to face eviction. They are guaranteed not to be killed until they exceed their limits
or there are no lower-priority Pods that can be preempted from the Node. They may
not acquire resources beyond their specified limits. These Pods can also make
use of exclusive CPUs using the
[`static`](/docs/tasks/administer-cluster/cpu-management-policies/#static-policy) CPU management policy.
#### Criteria
For a Pod to be given a QoS class of `Guaranteed`:
* Every Container in the Pod must have a memory limit and a memory request.
* For every Container in the Pod, the memory limit must equal the memory request.
* Every Container in the Pod must have a CPU limit and a CPU request.
* For every Container in the Pod, the CPU limit must equal the CPU request.
### Burstable
Pods that are `Burstable` have some lower-bound resource guarantees based on the request, but
do not require a specific limit. If a limit is not specified, it defaults to a
limit equivalent to the capacity of the Node, which allows the Pods to flexibly increase
their resources if resources are available. In the event of Pod eviction due to Node
resource pressure, these Pods are evicted only after all `BestEffort` Pods are evicted.
Because a `Burstable` Pod can include a Container that has no resource limits or requests, a Pod
that is `Burstable` can try to use any amount of node resources.
#### Criteria
A Pod is given a QoS class of `Burstable` if:
* The Pod does not meet the criteria for QoS class `Guaranteed`.
* At least one Container in the Pod has a memory or CPU request or limit.
### BestEffort
Pods in the `BestEffort` QoS class can use node resources that aren't specifically assigned
to Pods in other QoS classes. For example, if you have a node with 16 CPU cores available to the
kubelet, and you assign assign 4 CPU cores to a `Guaranteed` Pod, then a Pod in the `BestEffort`
QoS class can try to use any amount of the remaining 12 CPU cores.
The kubelet prefers to evict `BestEffort` Pods if the node comes under resource pressure.
#### Criteria
A Pod has a QoS class of `BestEffort` if it doesn't meet the criteria for either `Guaranteed`
or `Burstable`. In other words, a Pod is `BestEffort` only if none of the Containers in the Pod have a
memory limit or a memory request, and none of the Containers in the Pod have a
CPU limit or a CPU request.
Containers in a Pod can request other resources (not CPU or memory) and still be classified as
`BestEffort`.
## Some behavior is independent of QoS class {#class-independent-behavior}
Certain behavior is independent of the QoS class assigned by Kubernetes. For example:
* Any Container exceeding a resource limit will be killed and restarted by the kubelet without
affecting other Containers in that Pod.
* If a Container exceeds its resource request and the node it runs on faces
resource pressure, the Pod it is in becomes a candidate for [eviction](/docs/concepts/scheduling-eviction/node-pressure-eviction/).
If this occurs, all Containers in the Pod will be terminated. Kubernetes may create a
replacement Pod, usually on a different node.
* The resource request of a Pod is equal to the sum of the resource requests of
its component Containers, and the resource limit of a Pod is equal to the sum of
the resource limits of its component Containers.
* The kube-scheduler does not consider QoS class when selecting which Pods to
[preempt](/docs/concepts/scheduling-eviction/pod-priority-preemption/#preemption).
Preemption can occur when a cluster does not have enough resources to run all the Pods
you defined.
## {{% heading "whatsnext" %}}
* Learn about [resource management for Pods and Containers](/docs/concepts/configuration/manage-resources-containers/).
* Learn about [Node-pressure eviction](/docs/concepts/scheduling-eviction/node-pressure-eviction/).
* Learn about [Pod priority and preemption](/docs/concepts/scheduling-eviction/pod-priority-preemption/).
* Learn about [Pod disruptions](/docs/concepts/workload/pods/disruptions/).
* Learn how to [assign memory resources to containers and pods](/docs/tasks/configure-pod-container/assign-memory-resource/).
* Learn how to [assign CPU resources to containers and pods](/docs/tasks/configure-pod-container/assign-cpu-resource/).
* Learn how to [configure Quality of Service for Pods](/docs/tasks/configure-pod-container/quality-service-pod/).

View File

@ -663,23 +663,15 @@ admission plugin, which allows preventing pods from running on specifically tain
{{< feature-state for_k8s_version="v1.25" state="stable" >}}
This is the replacement for the deprecated [PodSecurityPolicy](#podsecuritypolicy) admission controller
defined in the next section. This admission controller acts on creation and modification of the pod and
determines if it should be admitted based on the requested security context and the
[Pod Security Standards](/docs/concepts/security/pod-security-standards/).
The PodSecurity admission controller checks new Pods before they are
admitted, determines if it should be admitted based on the requested security context and the restrictions on permitted
[Pod Security Standards](/docs/concepts/security/pod-security-standards/)
for the namespace that the Pod would be in.
See the [Pod Security Admission documentation](/docs/concepts/security/pod-security-admission/)
for more information.
See the [Pod Security Admission](/docs/concepts/security/pod-security-admission/)
documentation for more information.
### PodSecurityPolicy {#podsecuritypolicy}
{{< feature-state for_k8s_version="v1.21" state="deprecated" >}}
This admission controller acts on creation and modification of the pod and determines if it should be admitted
based on the requested security context and the available Pod Security Policies.
See also the [PodSecurityPolicy](/docs/concepts/security/pod-security-policy/) documentation
for more information.
PodSecurity replaced an older admission controller named PodSecurityPolicy.
### PodTolerationRestriction {#podtolerationrestriction}

View File

@ -591,7 +591,7 @@ is not considered to match.
Use the object selector only if the webhook is opt-in, because end users may skip
the admission webhook by setting the labels.
This example shows a mutating webhook that would match a `CREATE` of any resource with the label `foo: bar`:
This example shows a mutating webhook that would match a `CREATE` of any resource (but not subresources) with the label `foo: bar`:
```yaml
apiVersion: admissionregistration.k8s.io/v1

View File

@ -383,7 +383,7 @@ Each feature gate is designed for enabling/disabling a specific feature:
to see the requesting subject's authentication information.
See [API access to authentication information for a client](/docs/reference/access-authn-authz/authentication/#self-subject-review)
for more details.
- `APIServerIdentity`: Assign each API server an ID in a cluster.
- `APIServerIdentity`: Assign each API server an ID in a cluster, using a [Lease](/docs/concepts/architecture/leases).
- `APIServerTracing`: Add support for distributed tracing in the API server.
See [Traces for Kubernetes System Components](/docs/concepts/cluster-administration/system-traces) for more details.
- `AdvancedAuditing`: Enable [advanced auditing](/docs/tasks/debug/debug-cluster/audit/#advanced-audit)

View File

@ -0,0 +1,20 @@
---
title: JSON Web Token (JWT)
id: jwt
date: 2023-01-17
full_link: https://www.rfc-editor.org/rfc/rfc7519
short_description: >
A means of representing claims to be transferred between two parties.
aka:
tags:
- security
- architecture
---
A means of representing claims to be transferred between two parties.
<!--more-->
JWTs can be digitally signed and encrypted. Kubernetes uses JWTs as
authentication tokens to verify the identity of entities that want to perform
actions in a cluster.

View File

@ -2,7 +2,7 @@
title: QoS Class
id: qos-class
date: 2019-04-15
full_link:
full_link: /docs/concepts/workloads/pods/pod-qos/
short_description: >
QoS Class (Quality of Service Class) provides a way for Kubernetes to classify pods within the cluster into several classes and make decisions about scheduling and eviction.

View File

@ -844,9 +844,9 @@ you through the steps you follow to apply a seccomp profile to a Pod or to one o
its containers. That tutorial covers the supported mechanism for configuring seccomp in Kubernetes,
based on setting `securityContext` within the Pod's `.spec`.
### snapshot.storage.kubernetes.io/allowVolumeModeChange
### snapshot.storage.kubernetes.io/allow-volume-mode-change
Example: `snapshot.storage.kubernetes.io/allowVolumeModeChange: "true"`
Example: `snapshot.storage.kubernetes.io/allow-volume-mode-change: "true"`
Used on: VolumeSnapshotContent

View File

@ -6,7 +6,8 @@ weight: 50
<!-- overview -->
Every {{< glossary_tooltip term_id="node" text="node" >}} in a Kubernetes
cluster runs a [kube-proxy](/docs/reference/command-line-tools-reference/kube-proxy/)
{{< glossary_tooltip term_id="cluster" text="cluster" >}} runs a
[kube-proxy](/docs/reference/command-line-tools-reference/kube-proxy/)
(unless you have deployed your own alternative component in place of `kube-proxy`).
The `kube-proxy` component is responsible for implementing a _virtual IP_
@ -39,8 +40,10 @@ network proxying service on a computer. Although the `kube-proxy` executable su
to use as-is.
<a id="example"></a>
Some of the details in this reference refer to an example: the backend Pods for a stateless
image-processing workload, running with three replicas. Those replicas are
Some of the details in this reference refer to an example: the backend
{{< glossary_tooltip term_id="pod" text="Pods" >}} for a stateless
image-processing workloads, running with
three replicas. Those replicas are
fungible&mdash;frontends do not care which backend they use. While the actual Pods that
compose the backend set may change, the frontend clients should not need to be aware of that,
nor should they need to keep track of the set of backends themselves.
@ -61,8 +64,10 @@ Note that the kube-proxy starts up in different modes, which are determined by i
### `iptables` proxy mode {#proxy-mode-iptables}
In this mode, kube-proxy watches the Kubernetes control plane for the addition and
removal of Service and EndpointSlice objects. For each Service, it installs
In this mode, kube-proxy watches the Kubernetes
{{< glossary_tooltip term_id="control-plane" text="control plane" >}} for the addition and
removal of Service and EndpointSlice {{< glossary_tooltip term_id="object" text="objects." >}}
For each Service, it installs
iptables rules, which capture traffic to the Service's `clusterIP` and `port`,
and redirect that traffic to one of the Service's
backend sets. For each endpoint, it installs iptables rules which
@ -134,11 +139,13 @@ attempts to resynchronize iptables rules with the kernel. If it is
every time any Service or Endpoint changes. This works fine in very
small clusters, but it results in a lot of redundant work when lots of
things change in a small time period. For example, if you have a
Service backed by a Deployment with 100 pods, and you delete the
Service backed by a {{< glossary_tooltip term_id="deployment" text="Deployment" >}}
with 100 pods, and you delete the
Deployment, then with `minSyncPeriod: 0s`, kube-proxy would end up
removing the Service's Endpoints from the iptables rules one by one,
for a total of 100 updates. With a larger `minSyncPeriod`, multiple
Pod deletion events would get aggregated together, so kube-proxy might
Pod deletion events would get aggregated
together, so kube-proxy might
instead end up making, say, 5 updates, each removing 20 endpoints,
which will be much more efficient in terms of CPU, and result in the
full set of changes being synchronized faster.
@ -182,7 +189,8 @@ enable the `MinimizeIPTablesRestore` [feature
gate](/docs/reference/command-line-tools-reference/feature-gates/) for
kube-proxy with `--feature-gates=MinimizeIPTablesRestore=true,…`.
If you enable that feature gate and you were previously overriding
If you enable that feature gate and
you were previously overriding
`minSyncPeriod`, you should try removing that override and letting
kube-proxy use the default value (`1s`) or at least a smaller value
than you were using before.
@ -274,7 +282,7 @@ someone else's choice. That is an isolation failure.
In order to allow you to choose a port number for your Services, we must
ensure that no two Services can collide. Kubernetes does that by allocating each
Service its own IP address from within the `service-cluster-ip-range`
CIDR range that is configured for the API server.
CIDR range that is configured for the {{< glossary_tooltip term_id="kube-apiserver" text="API Server" >}}.
To ensure each Service receives a unique IP, an internal allocator atomically
updates a global allocation map in {{< glossary_tooltip term_id="etcd" >}}
@ -353,7 +361,8 @@ N to 0 replicas of that deployment. In some cases, external load balancers can s
a node with 0 replicas in between health check probes. Routing traffic to terminating endpoints
ensures that Node's that are scaling down Pods can gracefully receive and drain traffic to
those terminating Pods. By the time the Pod completes termination, the external load balancer
should have seen the node's health check failing and fully removed the node from the backend pool.
should have seen the node's health check failing and fully removed the node from the backend
pool.
## {{% heading "whatsnext" %}}

View File

@ -273,22 +273,24 @@ on Kubernetes dual-stack support see [Dual-stack support with kubeadm](/docs/set
1. Optional: Check the cluster health.
```sh
docker run --rm -it \
--net host \
-v /etc/kubernetes:/etc/kubernetes registry.k8s.io/etcd:${ETCD_TAG} etcdctl \
--cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--endpoints https://${HOST0}:2379 endpoint health --cluster
...
https://[HOST0 IP]:2379 is healthy: successfully committed proposal: took = 16.283339ms
https://[HOST1 IP]:2379 is healthy: successfully committed proposal: took = 19.44402ms
https://[HOST2 IP]:2379 is healthy: successfully committed proposal: took = 35.926451ms
```
If `etcdctl` isn't available, you can run this tool inside a container image.
You would do that directly with your container runtime using a tool such as
`crictl run` and not through Kubernetes
```sh
ETCDCTL_API=3 etcdctl \
--cert /etc/kubernetes/pki/etcd/peer.crt \
--key /etc/kubernetes/pki/etcd/peer.key \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--endpoints https://${HOST0}:2379 endpoint health
...
https://[HOST0 IP]:2379 is healthy: successfully committed proposal: took = 16.283339ms
https://[HOST1 IP]:2379 is healthy: successfully committed proposal: took = 19.44402ms
https://[HOST2 IP]:2379 is healthy: successfully committed proposal: took = 35.926451ms
```
- Set `${HOST0}`to the IP address of the host you are testing.
- Set `${ETCD_TAG}` to the version tag of your etcd image. For example `3.4.3-0`. To see the etcd image and tag that kubeadm uses execute `kubeadm config images list --kubernetes-version ${K8S_VERSION}`, where `${K8S_VERSION}` is for example `v1.17.0`.
- Set `${HOST0}`to the IP address of the host you are testing.
## {{% heading "whatsnext" %}}

View File

@ -7,20 +7,18 @@ min-kubernetes-server-version: 1.19
<!-- overview -->
An [Ingress](/docs/concepts/services-networking/ingress/) is an API object that defines rules which allow external access
to services in a cluster. An [Ingress controller](/docs/concepts/services-networking/ingress-controllers/) fulfills the rules set in the Ingress.
This page shows you how to set up a simple Ingress which routes requests to Service web or web2 depending on the HTTP URI.
An [Ingress](/docs/concepts/services-networking/ingress/) is an API object that defines rules
which allow external access to services in a cluster. An
[Ingress controller](/docs/concepts/services-networking/ingress-controllers/)
fulfills the rules set in the Ingress.
This page shows you how to set up a simple Ingress which routes requests to Service 'web' or
'web2' depending on the HTTP URI.
## {{% heading "prerequisites" %}}
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
If you are using an older Kubernetes version, switch to the documentation
for that version.
If you are using an older Kubernetes version, switch to the documentation for that version.
### Create a Minikube cluster
@ -37,49 +35,60 @@ Locally
1. To enable the NGINX Ingress controller, run the following command:
```shell
minikube addons enable ingress
```
```shell
minikube addons enable ingress
```
1. Verify that the NGINX Ingress controller is running
{{< tabs name="tab_with_md" >}}
{{% tab name="minikube v1.19 or later" %}}
```shell
kubectl get pods -n ingress-nginx
```
{{< note >}}It can take up to a minute before you see these pods running OK.{{< /note >}}
```shell
kubectl get pods -n ingress-nginx
```
{{< note >}}
It can take up to a minute before you see these pods running OK.
{{< /note >}}
The output is similar to:
```
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-g9g49 0/1 Completed 0 11m
ingress-nginx-admission-patch-rqp78 0/1 Completed 1 11m
ingress-nginx-controller-59b45fb494-26npt 1/1 Running 0 11m
```
```none
NAME READY STATUS RESTARTS AGE
ingress-nginx-admission-create-g9g49 0/1 Completed 0 11m
ingress-nginx-admission-patch-rqp78 0/1 Completed 1 11m
ingress-nginx-controller-59b45fb494-26npt 1/1 Running 0 11m
```
{{% /tab %}}
{{% tab name="minikube v1.18.1 or earlier" %}}
```shell
kubectl get pods -n kube-system
```
{{< note >}}It can take up to a minute before you see these pods running OK.{{< /note >}}
```shell
kubectl get pods -n kube-system
```
{{< note >}}
It can take up to a minute before you see these pods running OK.
{{< /note >}}
The output is similar to:
```
NAME READY STATUS RESTARTS AGE
default-http-backend-59868b7dd6-xb8tq 1/1 Running 0 1m
kube-addon-manager-minikube 1/1 Running 0 3m
kube-dns-6dcb57bcc8-n4xd4 3/3 Running 0 2m
kubernetes-dashboard-5498ccf677-b8p5h 1/1 Running 0 2m
nginx-ingress-controller-5984b97644-rnkrg 1/1 Running 0 1m
storage-provisioner 1/1 Running 0 2m
```
```none
NAME READY STATUS RESTARTS AGE
default-http-backend-59868b7dd6-xb8tq 1/1 Running 0 1m
kube-addon-manager-minikube 1/1 Running 0 3m
kube-dns-6dcb57bcc8-n4xd4 3/3 Running 0 2m
kubernetes-dashboard-5498ccf677-b8p5h 1/1 Running 0 2m
nginx-ingress-controller-5984b97644-rnkrg 1/1 Running 0 1m
storage-provisioner 1/1 Running 0 2m
```
Make sure that you see a Pod with a name that starts with `nginx-ingress-controller-`.
Make sure that you see a Pod with a name that starts with `nginx-ingress-controller-`.
{{% /tab %}}
{{< /tabs >}}
## Deploy a hello, world app
@ -92,7 +101,7 @@ storage-provisioner 1/1 Running 0 2m
The output should be:
```
```none
deployment.apps/web created
```
@ -104,19 +113,19 @@ storage-provisioner 1/1 Running 0 2m
The output should be:
```
```none
service/web exposed
```
1. Verify the Service is created and is available on a node port:
```shell
```shell
kubectl get service web
```
The output is similar to:
```
```none
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
web NodePort 10.104.133.249 <none> 8080:31637/TCP 12m
```
@ -129,26 +138,31 @@ storage-provisioner 1/1 Running 0 2m
The output is similar to:
```
```none
http://172.17.0.15:31637
```
{{< note >}}Katacoda environment only: at the top of the terminal panel, click the plus sign, and then click **Select port to view on Host 1**. Enter the NodePort, in this case `31637`, and then click **Display Port**.{{< /note >}}
{{< note >}}
Katacoda environment only: at the top of the terminal panel, click the plus sign,
and then click **Select port to view on Host 1**. Enter the NodePort value,
in this case `31637`, and then click **Display Port**.
{{< /note >}}
The output is similar to:
```
```none
Hello, world!
Version: 1.0.0
Hostname: web-55b8c6998d-8k564
```
You can now access the sample app via the Minikube IP address and NodePort. The next step lets you access
the app using the Ingress resource.
You can now access the sample application via the Minikube IP address and NodePort.
The next step lets you access the application using the Ingress resource.
## Create an Ingress
The following manifest defines an Ingress that sends traffic to your Service via hello-world.info.
The following manifest defines an Ingress that sends traffic to your Service via
`hello-world.info`.
1. Create `example-ingress.yaml` from the following file:
@ -162,7 +176,7 @@ The following manifest defines an Ingress that sends traffic to your Service via
The output should be:
```
```none
ingress.networking.k8s.io/example-ingress created
```
@ -172,11 +186,13 @@ The following manifest defines an Ingress that sends traffic to your Service via
kubectl get ingress
```
{{< note >}}This can take a couple of minutes.{{< /note >}}
{{< note >}}
This can take a couple of minutes.
{{< /note >}}
You should see an IPv4 address in the ADDRESS column; for example:
You should see an IPv4 address in the `ADDRESS` column; for example:
```
```none
NAME CLASS HOSTS ADDRESS PORTS AGE
example-ingress <none> hello-world.info 172.17.0.15 80 38s
```
@ -184,30 +200,35 @@ The following manifest defines an Ingress that sends traffic to your Service via
1. Add the following line to the bottom of the `/etc/hosts` file on
your computer (you will need administrator access):
```
```none
172.17.0.15 hello-world.info
```
{{< note >}}If you are running Minikube locally, use `minikube ip` to get the external IP. The IP address displayed within the ingress list will be the internal IP.{{< /note >}}
{{< note >}}
If you are running Minikube locally, use `minikube ip` to get the external IP.
The IP address displayed within the ingress list will be the internal IP.
{{< /note >}}
After you make this change, your web browser sends requests for
hello-world.info URLs to Minikube.
After you make this change, your web browser sends requests for
`hello-world.info` URLs to Minikube.
1. Verify that the Ingress controller is directing traffic:
```shell
curl hello-world.info
```
```shell
curl hello-world.info
```
You should see:
You should see:
```
Hello, world!
Version: 1.0.0
Hostname: web-55b8c6998d-8k564
```
```none
Hello, world!
Version: 1.0.0
Hostname: web-55b8c6998d-8k564
```
{{< note >}}If you are running Minikube locally, you can visit hello-world.info from your browser.{{< /note >}}
{{< note >}}
If you are running Minikube locally, you can visit `hello-world.info` from your browser.
{{< /note >}}
## Create a second Deployment
@ -216,9 +237,10 @@ The following manifest defines an Ingress that sends traffic to your Service via
```shell
kubectl create deployment web2 --image=gcr.io/google-samples/hello-app:2.0
```
The output should be:
```
```none
deployment.apps/web2 created
```
@ -230,7 +252,7 @@ The following manifest defines an Ingress that sends traffic to your Service via
The output should be:
```
```none
service/web2 exposed
```
@ -240,13 +262,13 @@ The following manifest defines an Ingress that sends traffic to your Service via
following lines at the end:
```yaml
- path: /v2
pathType: Prefix
backend:
service:
name: web2
port:
number: 8080
- path: /v2
pathType: Prefix
backend:
service:
name: web2
port:
number: 8080
```
1. Apply the changes:
@ -257,7 +279,7 @@ The following manifest defines an Ingress that sends traffic to your Service via
You should see:
```
```none
ingress.networking/example-ingress configured
```
@ -271,7 +293,7 @@ The following manifest defines an Ingress that sends traffic to your Service via
The output is similar to:
```
```none
Hello, world!
Version: 1.0.0
Hostname: web-55b8c6998d-8k564
@ -285,16 +307,16 @@ The following manifest defines an Ingress that sends traffic to your Service via
The output is similar to:
```
```none
Hello, world!
Version: 2.0.0
Hostname: web2-75cd47646f-t8cjk
```
{{< note >}}If you are running Minikube locally, you can visit hello-world.info and hello-world.info/v2 from your browser.{{< /note >}}
{{< note >}}
If you are running Minikube locally, you can visit `hello-world.info` and
`hello-world.info/v2` from your browser.
{{< /note >}}
## {{% heading "whatsnext" %}}

View File

@ -111,11 +111,11 @@ This is the default policy and does not affect the memory allocation in any way.
It acts the same as if the Memory Manager is not present at all.
The `None` policy returns default topology hint. This special hint denotes that Hint Provider
(Memory Manger in this case) has no preference for NUMA affinity with any resource.
(Memory Manager in this case) has no preference for NUMA affinity with any resource.
#### Static policy {#policy-static}
In the case of the `Guaranteed` pod, the `Static` Memory Manger policy returns topology hints
In the case of the `Guaranteed` pod, the `Static` Memory Manager policy returns topology hints
relating to the set of NUMA nodes where the memory can be guaranteed,
and reserves the memory through updating the internal [NodeMap][2] object.

View File

@ -43,7 +43,7 @@ Decide whether you want to deploy a [cloud](#creating-a-calico-cluster-with-goog
## Creating a local Calico cluster with kubeadm
To get a local single-host Calico cluster in fifteen minutes using kubeadm, refer to the
[Calico Quickstart](https://docs.projectcalico.org/latest/getting-started/kubernetes/).
[Calico Quickstart](https://projectcalico.docs.tigera.io/getting-started/kubernetes/).

View File

@ -10,7 +10,7 @@ weight: 30
<!-- overview -->
This page shows how to use Cilium for NetworkPolicy.
For background on Cilium, read the [Introduction to Cilium](https://docs.cilium.io/en/stable/intro).
For background on Cilium, read the [Introduction to Cilium](https://docs.cilium.io/en/stable/overview/intro).
## {{% heading "prerequisites" %}}

View File

@ -53,9 +53,9 @@ To get a list of all parameters, you can run
sudo sysctl -a
```
## Enabling Unsafe Sysctls
## Safe and Unsafe Sysctls
Sysctls are grouped into _safe_ and _unsafe_ sysctls. In addition to proper
Kubernetes classes sysctls as either _safe_ or _unsafe_. In addition to proper
namespacing, a _safe_ sysctl must be properly _isolated_ between pods on the
same node. This means that setting a _safe_ sysctl for one pod
@ -80,6 +80,8 @@ The example `net.ipv4.tcp_syncookies` is not namespaced on Linux kernel version
This list will be extended in future Kubernetes versions when the kubelet
supports better isolation mechanisms.
### Enabling Unsafe Sysctls
All _safe_ sysctls are enabled by default.
All _unsafe_ sysctls are disabled by default and must be allowed manually by the

View File

@ -47,6 +47,9 @@ Then verify the blob by using `cosign`:
cosign verify-blob "$BINARY" --signature "$BINARY".sig --certificate "$BINARY".cert
```
cosign v1.9.0 is required to be able to use the `--certificate` flag. Please use
`--cert` for older versions of cosign.
{{< note >}}
To learn more about keyless signing, please refer to [Keyless
Signatures](https://github.com/sigstore/cosign/blob/main/KEYLESS.md#keyless-signatures).

View File

@ -4,23 +4,35 @@ reviewers:
- tallclair
- liggitt
content_type: task
min-kubernetes-server-version: v1.22
---
As of v1.22, Kubernetes provides a built-in [admission controller](/docs/reference/access-authn-authz/admission-controllers/#podsecurity)
Kubernetes provides a built-in [admission controller](/docs/reference/access-authn-authz/admission-controllers/#podsecurity)
to enforce the [Pod Security Standards](/docs/concepts/security/pod-security-standards).
You can configure this admission controller to set cluster-wide defaults and [exemptions](/docs/concepts/security/pod-security-admission/#exemptions).
## {{% heading "prerequisites" %}}
Following an alpha release in Kubernetes v1.22,
Pod Security Admission became available by default in Kubernetes v1.23, as
a beta. From version 1.25 onwards, Pod Security Admission is generally
available.
{{% version-check %}}
- Ensure the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/#feature-gates-for-alpha-or-beta-features) is enabled.
If you are not running Kubernetes {{< skew currentVersion >}}, you can switch
to viewing this page in the documentation for the Kubernetes version that you
are running.
## Configure the Admission Controller
{{< note >}}
`pod-security.admission.config.k8s.io/v1` configuration requires v1.25+.
For v1.23 and v1.24, use [v1beta1](https://v1-24.docs.kubernetes.io/docs/tasks/configure-pod-container/enforce-standards-admission-controller/).
For v1.22, use [v1alpha1](https://v1-22.docs.kubernetes.io/docs/tasks/configure-pod-container/enforce-standards-admission-controller/).
{{< /note >}}
```yaml
apiVersion: apiserver.config.k8s.io/v1
apiVersion: apiserver.config.k8s.io/v1 # see compatibility note
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
@ -52,12 +64,8 @@ plugins:
# Array of namespaces to exempt.
namespaces: []
```
{{< note >}}
The above manifest needs to be specified via the `--admission-control-config-file` to kube-apiserver.
{{< /note >}}
{{< note >}}
`pod-security.admission.config.k8s.io/v1` configuration requires v1.25+.
For v1.23 and v1.24, use [v1beta1](https://v1-24.docs.kubernetes.io/docs/tasks/configure-pod-container/enforce-standards-admission-controller/).
For v1.22, use [v1alpha1](https://v1-22.docs.kubernetes.io/docs/tasks/configure-pod-container/enforce-standards-admission-controller/).
{{< /note >}}

View File

@ -4,7 +4,6 @@ reviewers:
- tallclair
- liggitt
content_type: task
min-kubernetes-server-version: v1.22
---
Namespaces can be labeled to enforce the [Pod Security Standards](/docs/concepts/security/pod-security-standards). The three policies
@ -15,9 +14,11 @@ text="admission controller" term_id="admission-controller" >}}.
## {{% heading "prerequisites" %}}
{{% version-check %}}
Pod Security Admission was available by default in Kubernetes v1.23, as
a beta. From version 1.25 onwards, Pod Security Admission is generally
available.
- Ensure the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/#feature-gates-for-alpha-or-beta-features) is enabled.
{{% version-check %}}
## Requiring the `baseline` Pod Security Standard with namespace labels

View File

@ -18,7 +18,7 @@ admission controller. This can be done effectively using a combination of dry-ru
{{% version-check %}}
If you are currently running a version of Kubernetes other than
{{ skew currentVersion }}, you may want to switch to viewing this
{{< skew currentVersion >}}, you may want to switch to viewing this
page in the documentation for the version of Kubernetes that you
are actually running.

View File

@ -8,29 +8,28 @@ weight: 30
<!-- overview -->
This page shows how to configure Pods so that they will be assigned particular
Quality of Service (QoS) classes. Kubernetes uses QoS classes to make decisions about
scheduling and evicting Pods.
{{< glossary_tooltip text="Quality of Service (QoS) classes" term_id="qos-class" >}}.
Kubernetes uses QoS classes to make decisions about evicting Pods when Node resources are exceeded.
When Kubernetes creates a Pod it assigns one of these QoS classes to the Pod:
* [Guaranteed](/docs/concepts/workloads/pods/pod-qos/#guaranteed)
* [Burstable](/docs/concepts/workloads/pods/pod-qos/#burstable)
* [BestEffort](/docs/concepts/workloads/pods/pod-qos/#besteffort)
## {{% heading "prerequisites" %}}
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
{{< include "task-tutorial-prereqs.md" >}}
You also need to be able to create and delete namespaces.
<!-- steps -->
## QoS classes
When Kubernetes creates a Pod it assigns one of these QoS classes to the Pod:
* Guaranteed
* Burstable
* BestEffort
## Create a namespace
@ -43,7 +42,7 @@ kubectl create namespace qos-example
## Create a Pod that gets assigned a QoS class of Guaranteed
For a Pod to be given a QoS class of Guaranteed:
For a Pod to be given a QoS class of `Guaranteed`:
* Every Container in the Pod must have a memory limit and a memory request.
* For every Container in the Pod, the memory limit must equal the memory request.
@ -54,7 +53,7 @@ These restrictions apply to init containers and app containers equally.
[Ephemeral containers](/docs/concepts/workloads/pods/ephemeral-containers/)
cannot define resources so these restrictions do not apply.
Here is the configuration file for a Pod that has one Container. The Container has a memory limit and a
Here is a manifest for a Pod that has one Container. The Container has a memory limit and a
memory request, both equal to 200 MiB. The Container has a CPU limit and a CPU request, both equal to 700 milliCPU:
{{< codenew file="pods/qos/qos-pod.yaml" >}}
@ -71,7 +70,7 @@ View detailed information about the Pod:
kubectl get pod qos-demo --namespace=qos-example --output=yaml
```
The output shows that Kubernetes gave the Pod a QoS class of Guaranteed. The output also
The output shows that Kubernetes gave the Pod a QoS class of `Guaranteed`. The output also
verifies that the Pod Container has a memory request that matches its memory limit, and it has
a CPU request that matches its CPU limit.
@ -98,6 +97,9 @@ CPU limit, but does not specify a CPU request, Kubernetes automatically assigns
the limit.
{{< /note >}}
<!-- 4th level heading to suppress entry in nav -->
#### Clean up {#clean-up-guaranteed}
Delete your Pod:
```shell
@ -106,12 +108,12 @@ kubectl delete pod qos-demo --namespace=qos-example
## Create a Pod that gets assigned a QoS class of Burstable
A Pod is given a QoS class of Burstable if:
A Pod is given a QoS class of `Burstable` if:
* The Pod does not meet the criteria for QoS class Guaranteed.
* The Pod does not meet the criteria for QoS class `Guaranteed`.
* At least one Container in the Pod has a memory or CPU request or limit.
Here is the configuration file for a Pod that has one Container. The Container has a memory limit of 200 MiB
Here is a manifest for a Pod that has one Container. The Container has a memory limit of 200 MiB
and a memory request of 100 MiB.
{{< codenew file="pods/qos/qos-pod-2.yaml" >}}
@ -128,7 +130,7 @@ View detailed information about the Pod:
kubectl get pod qos-demo-2 --namespace=qos-example --output=yaml
```
The output shows that Kubernetes gave the Pod a QoS class of Burstable.
The output shows that Kubernetes gave the Pod a QoS class of `Burstable`:
```yaml
spec:
@ -146,6 +148,9 @@ status:
qosClass: Burstable
```
<!-- 4th level heading to suppress entry in nav -->
#### Clean up {#clean-up-burstable}
Delete your Pod:
```shell
@ -154,10 +159,10 @@ kubectl delete pod qos-demo-2 --namespace=qos-example
## Create a Pod that gets assigned a QoS class of BestEffort
For a Pod to be given a QoS class of BestEffort, the Containers in the Pod must not
For a Pod to be given a QoS class of `BestEffort`, the Containers in the Pod must not
have any memory or CPU limits or requests.
Here is the configuration file for a Pod that has one Container. The Container has no memory or CPU
Here is a manifest for a Pod that has one Container. The Container has no memory or CPU
limits or requests:
{{< codenew file="pods/qos/qos-pod-3.yaml" >}}
@ -174,7 +179,7 @@ View detailed information about the Pod:
kubectl get pod qos-demo-3 --namespace=qos-example --output=yaml
```
The output shows that Kubernetes gave the Pod a QoS class of BestEffort.
The output shows that Kubernetes gave the Pod a QoS class of `BestEffort`:
```yaml
spec:
@ -186,6 +191,9 @@ status:
qosClass: BestEffort
```
<!-- 4th level heading to suppress entry in nav -->
#### Clean up {#clean-up-besteffort}
Delete your Pod:
```shell
@ -194,13 +202,13 @@ kubectl delete pod qos-demo-3 --namespace=qos-example
## Create a Pod that has two Containers
Here is the configuration file for a Pod that has two Containers. One container specifies a memory
Here is a manifest for a Pod that has two Containers. One container specifies a memory
request of 200 MiB. The other Container does not specify any requests or limits.
{{< codenew file="pods/qos/qos-pod-4.yaml" >}}
Notice that this Pod meets the criteria for QoS class Burstable. That is, it does not meet the
criteria for QoS class Guaranteed, and one of its Containers has a memory request.
Notice that this Pod meets the criteria for QoS class `Burstable`. That is, it does not meet the
criteria for QoS class `Guaranteed`, and one of its Containers has a memory request.
Create the Pod:
@ -214,7 +222,7 @@ View detailed information about the Pod:
kubectl get pod qos-demo-4 --namespace=qos-example --output=yaml
```
The output shows that Kubernetes gave the Pod a QoS class of Burstable:
The output shows that Kubernetes gave the Pod a QoS class of `Burstable`:
```yaml
spec:
@ -232,12 +240,19 @@ status:
qosClass: Burstable
```
Delete your Pod:
## Retrieve the QoS class for a Pod
```shell
kubectl delete pod qos-demo-4 --namespace=qos-example
Rather than see all the fields, you can view just the field you need:
```bash
kubectl --namespace=qos-example get pod qos-demo-4 -o jsonpath='{ .status.qosClass}{"\n"}'
```
```none
Burstable
```
## Clean up
Delete your namespace:

View File

@ -129,34 +129,22 @@ you need is an existing `docker-compose.yml` file.
The output is similar to:
```none
INFO Kubernetes file "frontend-service.yaml" created
INFO Kubernetes file "frontend-service.yaml" created
INFO Kubernetes file "frontend-service.yaml" created
INFO Kubernetes file "redis-master-service.yaml" created
INFO Kubernetes file "redis-master-service.yaml" created
INFO Kubernetes file "redis-master-service.yaml" created
INFO Kubernetes file "redis-slave-service.yaml" created
INFO Kubernetes file "redis-slave-service.yaml" created
INFO Kubernetes file "redis-slave-service.yaml" created
INFO Kubernetes file "frontend-deployment.yaml" created
INFO Kubernetes file "frontend-deployment.yaml" created
INFO Kubernetes file "frontend-deployment.yaml" created
INFO Kubernetes file "redis-master-deployment.yaml" created
INFO Kubernetes file "redis-master-deployment.yaml" created
INFO Kubernetes file "redis-master-deployment.yaml" created
INFO Kubernetes file "redis-slave-deployment.yaml" created
INFO Kubernetes file "redis-slave-deployment.yaml" created
INFO Kubernetes file "frontend-tcp-service.yaml" created
INFO Kubernetes file "redis-master-service.yaml" created
INFO Kubernetes file "redis-slave-service.yaml" created
INFO Kubernetes file "frontend-deployment.yaml" created
INFO Kubernetes file "redis-master-deployment.yaml" created
INFO Kubernetes file "redis-slave-deployment.yaml" created
```
```bash
kubectl apply -f frontend-service.yaml,redis-master-service.yaml,redis-slave-service.yaml,frontend-deployment.yaml,redis-master-deployment.yaml,redis-slave-deployment.yaml
kubectl apply -f frontend-tcp-service.yaml,redis-master-service.yaml,redis-slave-service.yaml,frontend-deployment.yaml,redis-master-deployment.yaml,redis-slave-deployment.yaml
```
The output is similar to:
```none
service/frontend created
service/frontend-tcp created
service/redis-master created
service/redis-slave created
deployment.apps/frontend created
@ -181,18 +169,29 @@ you need is an existing `docker-compose.yml` file.
```
```none
Name: frontend
Namespace: default
Labels: service=frontend
Selector: service=frontend
Type: LoadBalancer
IP: 10.0.0.183
LoadBalancer Ingress: 192.0.2.89
Port: 80 80/TCP
NodePort: 80 31144/TCP
Endpoints: 172.17.0.4:80
Session Affinity: None
No events.
Name: frontend-tcp
Namespace: default
Labels: io.kompose.service=frontend-tcp
Annotations: kompose.cmd: kompose convert
kompose.service.type: LoadBalancer
kompose.version: 1.26.0 (40646f47)
Selector: io.kompose.service=frontend
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.43.67.174
IPs: 10.43.67.174
Port: 80 80/TCP
TargetPort: 80/TCP
NodePort: 80 31254/TCP
Endpoints: 10.42.0.25:80
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 62s service-controller Ensuring load balancer
Normal AppliedDaemonSet 62s service-controller Applied LoadBalancer DaemonSet kube-system/svclb-frontend-tcp-9362d276
```
If you're using a cloud provider, your IP will be listed next to `LoadBalancer Ingress`.
@ -200,6 +199,15 @@ you need is an existing `docker-compose.yml` file.
```sh
curl http://192.0.2.89
```
4. Clean-up.
After you are finished testing out the example application deployment, simply run the following command in your shell to delete the
resources used.
```sh
kubectl delete -f frontend-tcp-service.yaml,redis-master-service.yaml,redis-slave-service.yaml,frontend-deployment.yaml,redis-master-deployment.yaml,redis-slave-deployment.yaml
```
<!-- discussion -->

View File

@ -127,7 +127,7 @@ Restart Count tells you how many times the container has been restarted; this in
Currently the only Condition associated with a Pod is the binary Ready condition, which indicates that the pod is able to service requests and should be added to the load balancing pools of all matching services.
Lastly, you see a log of recent events related to your Pod. The system compresses multiple identical events by indicating the first and last time it was seen and the number of times it was seen. "From" indicates the component that is logging the event, "SubobjectPath" tells you which object (e.g. container within the pod) is being referred to, and "Reason" and "Message" tell you what happened.
Lastly, you see a log of recent events related to your Pod. "From" indicates the component that is logging the event. "Reason" and "Message" tell you what happened.
## Example: debugging Pending Pods

View File

@ -57,10 +57,17 @@ respond to these metrics by automatically scaling or adapting the cluster
based on its current state, using mechanisms such as the Horizontal Pod
Autoscaler. The monitoring pipeline fetches metrics from the kubelet and
then exposes them to Kubernetes via an adapter by implementing either the
`custom.metrics.k8s.io` or `external.metrics.k8s.io` API.
`custom.metrics.k8s.io` or `external.metrics.k8s.io` API.
[Prometheus](https://prometheus.io), a CNCF project, can natively monitor Kubernetes, nodes, and Prometheus itself.
Full metrics pipeline projects that are not part of the CNCF are outside the scope of Kubernetes documentation.
Integration of a full metrics pipeline into your Kubernetes implementation is outside
the scope of Kubernetes documentation because of the very wide scope of possible
solutions.
The choice of monitoring platform depends heavily on your needs, budget, and technical resources.
Kubernetes does not recommend any specific metrics pipeline; [many options](https://landscape.cncf.io/card-mode?category=monitoring&project=graduated,incubating,member,no&grouping=category&sort=stars) are available.
Your monitoring system should be capable of handling the [OpenMetrics](https://openmetrics.io/) metrics
transmission standard, and needs to chosen to best fit in to your overall design and deployment of
your infrastructure platform.
## {{% heading "whatsnext" %}}

View File

@ -149,8 +149,101 @@ Here is a configuration file you can use to create a Pod:
39528$vdg7Jb
```
Modify your image or command line so that the program looks for files in the
`mountPath` directory. Each key in the Secret `data` map becomes a file name
in this directory.
### Project Secret keys to specific file paths
You can also control the paths within the volume where Secret keys are projected. Use the `.spec.volumes[].secret.items` field to change the target
path of each key:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: mypod
image: redis
volumeMounts:
- name: foo
mountPath: "/etc/foo"
readOnly: true
volumes:
- name: foo
secret:
secretName: mysecret
items:
- key: username
path: my-group/my-username
```
When you deploy this Pod, the following happens:
* The `username` key from `mysecret` is available to the container at the path
`/etc/foo/my-group/my-username` instead of at `/etc/foo/username`.
* The `password` key from that Secret object is not projected.
If you list keys explicitly using `.spec.volumes[].secret.items`, consider the
following:
* Only keys specified in `items` are projected.
* To consume all keys from the Secret, all of them must be listed in the
`items` field.
* All listed keys must exist in the corresponding Secret. Otherwise, the volume
is not created.
### Set POSIX permissions for Secret keys
You can set the POSIX file access permission bits for a single Secret key.
If you don't specify any permissions, `0644` is used by default.
You can also set a default POSIX file mode for the entire Secret volume, and
you can override per key if needed.
For example, you can specify a default mode like this:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: mypod
image: redis
volumeMounts:
- name: foo
mountPath: "/etc/foo"
volumes:
- name: foo
secret:
secretName: mysecret
defaultMode: 0400
```
The Secret is mounted on `/etc/foo`; all the files created by the
secret volume mount have permission `0400`.
{{< note >}}
If you're defining a Pod or a Pod template using JSON, beware that the JSON
specification doesn't support octal literals for numbers because JSON considers
`0400` to be the _decimal_ value `400`. In JSON, use decimal values for the
`defaultMode` instead. If you're writing YAML, you can write the `defaultMode`
in octal.
{{< /note >}}
## Define container environment variables using Secret data
You can consume the data in Secrets as environment variables in your
containers.
If a container already consumes a Secret in an environment variable,
a Secret update will not be seen by the container unless it is
restarted. There are third party solutions for triggering restarts when
secrets change.
### Define a container environment variable with data from a single Secret
* Define an environment variable as a key-value pair in a Secret:

View File

@ -21,7 +21,7 @@ Here is an overview of the steps in this example:
1. **Create a queue, and fill it with messages.** Each message represents one task to be done. In
this example, a message is an integer that we will do a lengthy computation on.
1. **Start a Job that works on tasks from the queue**. The Job starts several pods. Each pod takes
one task from the message queue, processes it, and repeats until the end of the queue is reached.
one task from the message queue, processes it, and exits.
## {{% heading "prerequisites" %}}

View File

@ -261,7 +261,7 @@ kubectl get deployment patch-demo --output yaml
The `containers` list that you specified in the patch has only one Container.
The output shows that your list of one Container replaced the existing `containers` list.
```shell
```yaml
spec:
containers:
- image: gcr.io/google-samples/node-hello:1.0

View File

@ -10,28 +10,17 @@ This page shows you how to run a single-instance stateful application
in Kubernetes using a PersistentVolume and a Deployment. The
application is MySQL.
## {{% heading "objectives" %}}
* Create a PersistentVolume referencing a disk in your environment.
* Create a MySQL Deployment.
* Expose MySQL to other pods in the cluster at a known DNS name.
- Create a PersistentVolume referencing a disk in your environment.
- Create a MySQL Deployment.
- Expose MySQL to other pods in the cluster at a known DNS name.
## {{% heading "prerequisites" %}}
- {{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
* {{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
* {{< include "default-storage-class-prereqs.md" >}}
- {{< include "default-storage-class-prereqs.md" >}}
<!-- lessoncontent -->
@ -39,7 +28,7 @@ application is MySQL.
You can run a stateful application by creating a Kubernetes Deployment
and connecting it to an existing PersistentVolume using a
PersistentVolumeClaim. For example, this YAML file describes a
PersistentVolumeClaim. For example, this YAML file describes a
Deployment that runs MySQL and references the PersistentVolumeClaim. The file
defines a volume mount for /var/lib/mysql, and then creates a
PersistentVolumeClaim that looks for a 20G volume. This claim is
@ -55,80 +44,96 @@ for a secure solution.
1. Deploy the PV and PVC of the YAML file:
kubectl apply -f https://k8s.io/examples/application/mysql/mysql-pv.yaml
```shell
kubectl apply -f https://k8s.io/examples/application/mysql/mysql-pv.yaml
```
1. Deploy the contents of the YAML file:
kubectl apply -f https://k8s.io/examples/application/mysql/mysql-deployment.yaml
```shell
kubectl apply -f https://k8s.io/examples/application/mysql/mysql-deployment.yaml
```
1. Display information about the Deployment:
kubectl describe deployment mysql
```shell
kubectl describe deployment mysql
```
The output is similar to this:
The output is similar to this:
Name: mysql
Namespace: default
CreationTimestamp: Tue, 01 Nov 2016 11:18:45 -0700
Labels: app=mysql
Annotations: deployment.kubernetes.io/revision=1
Selector: app=mysql
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: Recreate
MinReadySeconds: 0
Pod Template:
Labels: app=mysql
Containers:
mysql:
Image: mysql:5.6
Port: 3306/TCP
Environment:
MYSQL_ROOT_PASSWORD: password
Mounts:
/var/lib/mysql from mysql-persistent-storage (rw)
Volumes:
mysql-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mysql-pv-claim
ReadOnly: false
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing True ReplicaSetUpdated
OldReplicaSets: <none>
NewReplicaSet: mysql-63082529 (1/1 replicas created)
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
33s 33s 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set mysql-63082529 to 1
```
Name: mysql
Namespace: default
CreationTimestamp: Tue, 01 Nov 2016 11:18:45 -0700
Labels: app=mysql
Annotations: deployment.kubernetes.io/revision=1
Selector: app=mysql
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: Recreate
MinReadySeconds: 0
Pod Template:
Labels: app=mysql
Containers:
mysql:
Image: mysql:5.6
Port: 3306/TCP
Environment:
MYSQL_ROOT_PASSWORD: password
Mounts:
/var/lib/mysql from mysql-persistent-storage (rw)
Volumes:
mysql-persistent-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: mysql-pv-claim
ReadOnly: false
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing True ReplicaSetUpdated
OldReplicaSets: <none>
NewReplicaSet: mysql-63082529 (1/1 replicas created)
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
33s 33s 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set mysql-63082529 to 1
```
1. List the pods created by the Deployment:
kubectl get pods -l app=mysql
```shell
kubectl get pods -l app=mysql
```
The output is similar to this:
The output is similar to this:
NAME READY STATUS RESTARTS AGE
mysql-63082529-2z3ki 1/1 Running 0 3m
```
NAME READY STATUS RESTARTS AGE
mysql-63082529-2z3ki 1/1 Running 0 3m
```
1. Inspect the PersistentVolumeClaim:
kubectl describe pvc mysql-pv-claim
```shell
kubectl describe pvc mysql-pv-claim
```
The output is similar to this:
The output is similar to this:
Name: mysql-pv-claim
Namespace: default
StorageClass:
Status: Bound
Volume: mysql-pv-volume
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed=yes
pv.kubernetes.io/bound-by-controller=yes
Capacity: 20Gi
Access Modes: RWO
Events: <none>
```
Name: mysql-pv-claim
Namespace: default
StorageClass:
Status: Bound
Volume: mysql-pv-volume
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed=yes
pv.kubernetes.io/bound-by-controller=yes
Capacity: 20Gi
Access Modes: RWO
Events: <none>
```
## Accessing the MySQL instance
@ -140,7 +145,7 @@ behind a Service and you don't intend to increase the number of Pods.
Run a MySQL client to connect to the server:
```
```shell
kubectl run -it --rm --image=mysql:5.6 --restart=Never mysql-client -- mysql -h mysql -ppassword
```
@ -161,11 +166,11 @@ The image or any other part of the Deployment can be updated as usual
with the `kubectl apply` command. Here are some precautions that are
specific to stateful apps:
* Don't scale the app. This setup is for single-instance apps
- Don't scale the app. This setup is for single-instance apps
only. The underlying PersistentVolume can only be mounted to one
Pod. For clustered stateful apps, see the
[StatefulSet documentation](/docs/concepts/workloads/controllers/statefulset/).
* Use `strategy:` `type: Recreate` in the Deployment configuration
- Use `strategy:` `type: Recreate` in the Deployment configuration
YAML file. This instructs Kubernetes to _not_ use rolling
updates. Rolling updates will not work, as you cannot have more than
one Pod running at a time. The `Recreate` strategy will stop the
@ -175,7 +180,7 @@ specific to stateful apps:
Delete the deployed objects by name:
```
```shell
kubectl delete deployment,svc mysql
kubectl delete pvc mysql-pv-claim
kubectl delete pv mysql-pv-volume
@ -188,20 +193,12 @@ PersistentVolume when it sees that you deleted the PersistentVolumeClaim.
Some dynamic provisioners (such as those for EBS and PD) also release the
underlying resource upon deleting the PersistentVolume.
## {{% heading "whatsnext" %}}
- Learn more about [Deployment objects](/docs/concepts/workloads/controllers/deployment/).
* Learn more about [Deployment objects](/docs/concepts/workloads/controllers/deployment/).
* Learn more about [Deploying applications](/docs/tasks/run-application/run-stateless-application-deployment/)
* [kubectl run documentation](/docs/reference/generated/kubectl/kubectl-commands/#run)
* [Volumes](/docs/concepts/storage/volumes/) and [Persistent Volumes](/docs/concepts/storage/persistent-volumes/)
- Learn more about [Deploying applications](/docs/tasks/run-application/run-stateless-application-deployment/)
- [kubectl run documentation](/docs/reference/generated/kubectl/kubectl-commands/#run)
- [Volumes](/docs/concepts/storage/volumes/) and [Persistent Volumes](/docs/concepts/storage/persistent-volumes/)

View File

@ -9,27 +9,16 @@ weight: 10
This page shows how to run an application using a Kubernetes Deployment object.
## {{% heading "objectives" %}}
* Create an nginx deployment.
* Use kubectl to list information about the deployment.
* Update the deployment.
- Create an nginx deployment.
- Use kubectl to list information about the deployment.
- Update the deployment.
## {{% heading "prerequisites" %}}
{{< include "task-tutorial-prereqs.md" >}} {{< version-check >}}
<!-- lessoncontent -->
## Creating and exploring an nginx deployment
@ -40,60 +29,71 @@ a Deployment that runs the nginx:1.14.2 Docker image:
{{< codenew file="application/deployment.yaml" >}}
1. Create a Deployment based on the YAML file:
kubectl apply -f https://k8s.io/examples/application/deployment.yaml
```shell
kubectl apply -f https://k8s.io/examples/application/deployment.yaml
```
1. Display information about the Deployment:
kubectl describe deployment nginx-deployment
```shell
kubectl describe deployment nginx-deployment
```
The output is similar to this:
The output is similar to this:
Name: nginx-deployment
Namespace: default
CreationTimestamp: Tue, 30 Aug 2016 18:11:37 -0700
Labels: app=nginx
Annotations: deployment.kubernetes.io/revision=1
Selector: app=nginx
Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: app=nginx
Containers:
nginx:
Image: nginx:1.14.2
Port: 80/TCP
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: nginx-deployment-1771418926 (2/2 replicas created)
No events.
```
Name: nginx-deployment
Namespace: default
CreationTimestamp: Tue, 30 Aug 2016 18:11:37 -0700
Labels: app=nginx
Annotations: deployment.kubernetes.io/revision=1
Selector: app=nginx
Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
Pod Template:
Labels: app=nginx
Containers:
nginx:
Image: nginx:1.14.2
Port: 80/TCP
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: <none>
NewReplicaSet: nginx-deployment-1771418926 (2/2 replicas created)
No events.
```
1. List the Pods created by the deployment:
kubectl get pods -l app=nginx
```shell
kubectl get pods -l app=nginx
```
The output is similar to this:
The output is similar to this:
NAME READY STATUS RESTARTS AGE
nginx-deployment-1771418926-7o5ns 1/1 Running 0 16h
nginx-deployment-1771418926-r18az 1/1 Running 0 16h
```
NAME READY STATUS RESTARTS AGE
nginx-deployment-1771418926-7o5ns 1/1 Running 0 16h
nginx-deployment-1771418926-r18az 1/1 Running 0 16h
```
1. Display information about a Pod:
kubectl describe pod <pod-name>
```shell
kubectl describe pod <pod-name>
```
where `<pod-name>` is the name of one of your Pods.
where `<pod-name>` is the name of one of your Pods.
## Updating the deployment
@ -104,11 +104,15 @@ specifies that the deployment should be updated to use nginx 1.16.1.
1. Apply the new YAML file:
kubectl apply -f https://k8s.io/examples/application/deployment-update.yaml
```shell
kubectl apply -f https://k8s.io/examples/application/deployment-update.yaml
```
1. Watch the deployment create pods with new names and delete the old pods:
kubectl get pods -l app=nginx
```shell
kubectl get pods -l app=nginx
```
## Scaling the application by increasing the replica count
@ -120,25 +124,33 @@ should have four Pods:
1. Apply the new YAML file:
kubectl apply -f https://k8s.io/examples/application/deployment-scale.yaml
```shell
kubectl apply -f https://k8s.io/examples/application/deployment-scale.yaml
```
1. Verify that the Deployment has four Pods:
kubectl get pods -l app=nginx
```shell
kubectl get pods -l app=nginx
```
The output is similar to this:
The output is similar to this:
NAME READY STATUS RESTARTS AGE
nginx-deployment-148880595-4zdqq 1/1 Running 0 25s
nginx-deployment-148880595-6zgi1 1/1 Running 0 25s
nginx-deployment-148880595-fxcez 1/1 Running 0 2m
nginx-deployment-148880595-rwovn 1/1 Running 0 2m
```
NAME READY STATUS RESTARTS AGE
nginx-deployment-148880595-4zdqq 1/1 Running 0 25s
nginx-deployment-148880595-6zgi1 1/1 Running 0 25s
nginx-deployment-148880595-fxcez 1/1 Running 0 2m
nginx-deployment-148880595-rwovn 1/1 Running 0 2m
```
## Deleting a deployment
Delete the deployment by name:
kubectl delete deployment nginx-deployment
```shell
kubectl delete deployment nginx-deployment
```
## ReplicationControllers -- the Old Way
@ -147,14 +159,6 @@ which in turn uses a ReplicaSet. Before the Deployment and ReplicaSet were
added to Kubernetes, replicated applications were configured using a
[ReplicationController](/docs/concepts/workloads/controllers/replicationcontroller/).
## {{% heading "whatsnext" %}}
* Learn more about [Deployment objects](/docs/concepts/workloads/controllers/deployment/).
- Learn more about [Deployment objects](/docs/concepts/workloads/controllers/deployment/).

View File

@ -55,7 +55,7 @@ bash-completion sources all completion scripts in `/etc/bash_completion.d`.
{{< /note >}}
Both approaches are equivalent. After reloading your shell, kubectl autocompletion should be working.
To enable bash autocompletion in current session of shell, run `exec bash`:
To enable bash autocompletion in current session of shell, source the ~/.bashrc file:
```bash
exec bash
source ~/.bashrc
```

View File

@ -95,6 +95,16 @@ For example, to download version {{< param "fullversion" >}} on Linux, type:
```bash
kubectl version --client
```
{{< note >}}
The above command will generate a warning:
```
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.
```
You can ignore this warning. You are only checking the version of `kubectl` that you
have installed.
{{< /note >}}
Or use this for detailed view of version:
```cmd

View File

@ -27,7 +27,6 @@ The following methods exist for installing kubectl on macOS:
- [Optional kubectl configurations and plugins](#optional-kubectl-configurations-and-plugins)
- [Enable shell autocompletion](#enable-shell-autocompletion)
- [Install `kubectl convert` plugin](#install-kubectl-convert-plugin)
- [{{% heading "whatsnext" %}}](#-heading-whatsnext-)
### Install kubectl binary with curl on macOS
@ -117,6 +116,17 @@ The following methods exist for installing kubectl on macOS:
```bash
kubectl version --client
```
{{< note >}}
The above command will generate a warning:
```
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.
```
You can ignore this warning. You are only checking the version of `kubectl` that you
have installed.
{{< /note >}}
Or use this for detailed view of version:
```cmd

View File

@ -66,12 +66,28 @@ The following methods exist for installing kubectl on Windows:
```cmd
kubectl version --client
```
{{< note >}}
The above command will generate a warning:
```
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.
```
You can ignore this warning. You are only checking the version of `kubectl` that you
have installed.
{{< /note >}}
Or use this for detailed view of version:
```cmd
kubectl version --client --output=yaml
```
1. After installing the plugin, clean up the installation files:
```powershell
del kubectl.exe kubectl.exe.sha256
```
{{< note >}}
[Docker Desktop for Windows](https://docs.docker.com/docker-for-windows/#kubernetes) adds its own version of `kubectl` to `PATH`.
If you have installed Docker Desktop before, you may need to place your `PATH` entry before the one added by the Docker Desktop installer or remove the Docker Desktop's `kubectl`.
@ -191,6 +207,12 @@ Below are the procedures to set up autocompletion for PowerShell.
If you do not see an error, it means the plugin is successfully installed.
1. After installing the plugin, clean up the installation files:
```powershell
del kubectl-convert.exe kubectl-convert.exe.sha256
```
## {{% heading "whatsnext" %}}
{{< include "included/kubectl-whats-next.md" >}}

View File

@ -41,7 +41,7 @@ This tutorial provides a container image that uses NGINX to echo back all the re
## Create a minikube cluster
1. Click **Launch Terminal**
1. Click **Launch Terminal**.
{{< kat-button >}}
@ -90,6 +90,8 @@ tutorial has only one Container. A Kubernetes
Pod and restarts the Pod's Container if it terminates. Deployments are the
recommended way to manage the creation and scaling of Pods.
1. Katacoda environment only: At the top of the terminal pane, click the plus sign, and then click **Open New Terminal**.
1. Use the `kubectl create` command to create a Deployment that manages a Pod. The
Pod runs a Container based on the provided Docker image.
@ -97,7 +99,7 @@ Pod runs a Container based on the provided Docker image.
kubectl create deployment hello-node --image=registry.k8s.io/e2e-test-images/agnhost:2.39 -- /agnhost netexec --http-port=8080
```
2. View the Deployment:
1. View the Deployment:
```shell
kubectl get deployments
@ -110,7 +112,7 @@ Pod runs a Container based on the provided Docker image.
hello-node 1/1 1 1 1m
```
3. View the Pod:
1. View the Pod:
```shell
kubectl get pods
@ -123,13 +125,13 @@ Pod runs a Container based on the provided Docker image.
hello-node-5f76cf6ccf-br9b5 1/1 Running 0 1m
```
4. View cluster events:
1. View cluster events:
```shell
kubectl get events
```
5. View the `kubectl` configuration:
1. View the `kubectl` configuration:
```shell
kubectl config view

View File

@ -74,7 +74,7 @@ deployment.apps/source-ip-app created
Packets sent to ClusterIP from within the cluster are never source NAT'd if
you're running kube-proxy in
[iptables mode](/docs/concepts/services-networking/service/#proxy-mode-iptables),
[iptables mode](/docs/reference/networking/virtual-ips/#proxy-mode-iptables),
(the default). You can query the kube-proxy mode by fetching
`http://localhost:10249/proxyMode` on the node where kube-proxy is running.

View File

@ -320,6 +320,8 @@ func validateObject(obj runtime.Object) (errors field.ErrorList) {
case *rbac.ClusterRoleBinding:
// clusterolebinding does not accept namespace
errors = rbac_validation.ValidateClusterRoleBinding(t)
case *rbac.RoleBinding:
errors = rbac_validation.ValidateRoleBinding(t)
case *storage.StorageClass:
// storageclass does not accept namespace
errors = storage_validation.ValidateStorageClass(t)
@ -454,7 +456,7 @@ func TestExampleObjectSchemas(t *testing.T) {
},
"admin/sched": {
"clusterrole": {&rbac.ClusterRole{}},
"my-scheduler": {&api.ServiceAccount{}, &rbac.ClusterRoleBinding{}, &rbac.ClusterRoleBinding{}, &api.ConfigMap{}, &apps.Deployment{}},
"my-scheduler": {&api.ServiceAccount{}, &rbac.ClusterRoleBinding{}, &rbac.ClusterRoleBinding{}, &rbac.RoleBinding{}, &api.ConfigMap{}, &apps.Deployment{}},
"pod1": {&api.Pod{}},
"pod2": {&api.Pod{}},
"pod3": {&api.Pod{}},

View File

@ -10,6 +10,7 @@ spec:
# name must match the volume name below
- name: secret-volume
mountPath: /etc/secret-volume
readOnly: true
# The secret data is exposed to Containers in the Pod through a Volume.
volumes:
- name: secret-volume

View File

@ -10,7 +10,7 @@ spec:
dnsPolicy: "None"
dnsConfig:
nameservers:
- 1.2.3.4
- 192.0.2.1 # this is an example
searches:
- ns1.svc.cluster-domain.example
- my.dns.search.suffix

View File

@ -23,48 +23,17 @@ signatures:
| Container Image | Supported Architectures |
| ------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| [registry.k8s.io/kube-apiserver:{{< param "fullversion" >}}][0] | [amd64][0-amd64], [arm][0-arm], [arm64][0-arm64], [ppc64le][0-ppc64le], [s390x][0-s390x] |
| [registry.k8s.io/kube-controller-manager:{{< param "fullversion" >}}][1] | [amd64][1-amd64], [arm][1-arm], [arm64][1-arm64], [ppc64le][1-ppc64le], [s390x][1-s390x] |
| [registry.k8s.io/kube-proxy:{{< param "fullversion" >}}][2] | [amd64][2-amd64], [arm][2-arm], [arm64][2-arm64], [ppc64le][2-ppc64le], [s390x][2-s390x] |
| [registry.k8s.io/kube-scheduler:{{< param "fullversion" >}}][3] | [amd64][3-amd64], [arm][3-arm], [arm64][3-arm64], [ppc64le][3-ppc64le], [s390x][3-s390x] |
| [registry.k8s.io/conformance:{{< param "fullversion" >}}][4] | [amd64][4-amd64], [arm][4-arm], [arm64][4-arm64], [ppc64le][4-ppc64le], [s390x][4-s390x] |
[0]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-apiserver
[0-amd64]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-apiserver-amd64
[0-arm]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-apiserver-arm
[0-arm64]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-apiserver-arm64
[0-ppc64le]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-apiserver-ppc64le
[0-s390x]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-apiserver-s390x
[1]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-controller-manager
[1-amd64]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-controller-manager-amd64
[1-arm]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-controller-manager-arm
[1-arm64]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-controller-manager-arm64
[1-ppc64le]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-controller-manager-ppc64le
[1-s390x]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-controller-manager-s390x
[2]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-proxy
[2-amd64]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-proxy-amd64
[2-arm]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-proxy-arm
[2-arm64]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-proxy-arm64
[2-ppc64le]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-proxy-ppc64le
[2-s390x]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-proxy-s390x
[3]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-scheduler
[3-amd64]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-scheduler-amd64
[3-arm]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-scheduler-arm
[3-arm64]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-scheduler-arm64
[3-ppc64le]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-scheduler-ppc64le
[3-s390x]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/kube-scheduler-s390x
[4]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/conformance
[4-amd64]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/conformance-amd64
[4-arm]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/conformance-arm
[4-arm64]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/conformance-arm64
[4-ppc64le]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/conformance-ppc64le
[4-s390x]: https://console.cloud.google.com/gcr/images/k8s-artifacts-prod/us/conformance-s390x
| registry.k8s.io/kube-apiserver:{{< param "fullversion" >}} | amd64, arm, arm64, ppc64le, s390x |
| registry.k8s.io/kube-controller-manager:{{< param "fullversion" >}} | amd64, arm, arm64, ppc64le, s390x |
| registry.k8s.io/kube-proxy:{{< param "fullversion" >}} | amd64, arm, arm64, ppc64le, s390x |
| registry.k8s.io/kube-scheduler:{{< param "fullversion" >}} | amd64, arm, arm64, ppc64le, s390x |
| registry.k8s.io/conformance:{{< param "fullversion" >}} | amd64, arm, arm64, ppc64le, s390x |
All container images are available for multiple architectures, whereas the
container runtime should choose the correct one based on the underlying
platform. It is also possible to pull a dedicated architecture by suffixing the
container image name, for example
[`registry.k8s.io/kube-apiserver-arm64:{{< param "fullversion" >}}`][0-arm64]. All
`registry.k8s.io/kube-apiserver-arm64:{{< param "fullversion" >}}`. All
those derivations are signed in the same way as the multi-architecture manifest lists.
The Kubernetes project publishes a list of signed Kubernetes container images

View File

@ -0,0 +1,27 @@
---
title: Políticas de Seguridad del Pod
content_type: concept
weight: 30
---
<!-- overview -->
{{% alert title="Funcionalidad eliminada" color="warning" %}}
PodSecurityPolicy está en [obsoleto](/blog/2021/04/08/kubernetes-1-21-release-announcement/#podsecuritypolicy-deprecation)
en Kubernetes v1.21, y removida desde Kubernetes v1.25.
{{% /alert %}}
En vez de usar PodSecurityPolicy, puedes aplicar restricciones similares en Pods usando
cualquiera o los dos:
- [Admisión de seguridad de pod](/docs/concepts/security/pod-security-admission/)
- Un plugin de admisión de terceros, que implementa y configura usted mismo
Para obtener una guía de migración, consulte
[Migrar de PodSecurityPolicy al Controlador de Admisión de Seguridad de Pod Integrado](/docs/tasks/configure-pod-container/migrate-from-psp/).
Para obtener más información sobre la eliminación de esta API, consulte
[Obsoleto de PodSecurityPolicy: Pasado, Presente y Futuro](/blog/2021/04/06/podsecuritypolicy-deprecation-past-present-and-future/).
Si no está ejecutando Kubernetes v{{< skew currentVersion >}}, consulte la documentación para
su versión de Kubernetes.

View File

@ -38,7 +38,7 @@ Les serveurs doivent être installés en s'assurant des éléments suivants:
* Les serveurs cibles doivent avoir **accès à Internet** afin de télécharger les images Docker. Autrement, une configuration supplémentaire est nécessaire, (se référer à [Offline Environment](https://github.com/kubernetes-sigs/kubespray/blob/master/docs/downloads.md#offline-environment))
* Les serveurs cibles doivent être configurés afin d'autoriser le transfert IPv4 (**IPv4 forwarding**)
* **Votre clé ssh doit être copiée** sur tous les serveurs faisant partie de votre inventaire Ansible.
* La configuration du **pare-feu n'est pas gérée**. Vous devrez vous en charger en utilisant votre méthode habituelle. Afin d'éviter tout problème pendant l'installation nous vous conseillons de le désacttiver.
* La configuration du **pare-feu n'est pas gérée**. Vous devrez vous en charger en utilisant votre méthode habituelle. Afin d'éviter tout problème pendant l'installation nous vous conseillons de le désactiver.
* Si Kubespray est exécuté avec un utilisateur autre que "root", une méthode d'autorisation appropriée devra être configurée sur les serveurs cibles (exemple: sudo). Il faudra aussi utiliser le paramètre `ansible_become` ou ajouter `--become` ou `b` à la ligne de commande.
Afin de vous aider à préparer votre de votre environnement, Kubespray fournit les outils suivants:

View File

@ -0,0 +1,252 @@
---
title: Configurer un Pod pour utiliser un stockage de type PersistentVolume
content_type: task
weight: 60
---
<!-- overview -->
Cette page montre comment configurer un Pod afin qu'il utilise un {{< glossary_tooltip text="PersistentVolumeClaim" term_id="persistent-volume-claim" >}} comme système de stockage.
Voici un résumé des étapes:
1. En tant qu'administrateur d'un cluster, vous créez un PersistentVolume qui pointe vers un système de stockage physique. Vous n'associez le volume avec aucun Pod pour le moment.
1. En tant que développeur ou utilisateur du cluster, vous créez un PersistentVolumeClaim qui sera automatiquement lié à un PersistentVolume adapté.
1. Vous créez un Pod qui utilise le PersistentVolumeClaim créé précédemment comme stockage.
## {{% heading "prerequisites" %}}
* Vous devez avoir à disposition un cluster qui n'a qu'un seul noeud, et l'utilitaire en ligne de commande
{{< glossary_tooltip text="kubectl" term_id="kubectl" >}} doit être configuré pour communiquer avec votre cluster. Si vous n'avez pas déja de cluster à disposition, vous pouvez en créer un en utilisant [Minikube](https://minikube.sigs.k8s.io/docs/).
* Vous pouvez vous familiariser avec la documentation des
[Persistent Volumes](/docs/concepts/storage/persistent-volumes/).
<!-- steps -->
## Créer un fichier index.html sur votre noeud
Ouvrez une session shell sur le noeud de votre cluster. La facon d'ouvrir
la session va dependre de la configuration de votre cluster. Si vous utilisez Minikube,
vous pouvez ouvrir une session via la commande `minikube ssh`.
Via la session sur ce noeud, créez un dossier `/mnt/data`:
```shell
# En supposant que votre noeud utilise `sudo` pour les accès privilégiés
sudo mkdir /mnt/data
```
Dans le dossier `/mnt/data`, créez un fichier `index.html`:
```shell
# En supposant toujours que votre noeud utilise `sudo` pour les accès privilégiés
sudo sh -c "echo 'Hello from Kubernetes storage' > /mnt/data/index.html"
```
{{< note >}}
Si votre noeud utilise un utilitaire d'accès privilégié autre que `sudo`, les commandes notées ici devraient fonctionner en remplacant `sudo` par le nom de l'utilitaire.
{{< /note >}}
Testez que le fichier `index.html` existe:
```shell
cat /mnt/data/index.html
```
Le résultat de la commande doit être:
```
Hello from Kubernetes storage
```
Vous pouvez maintenant fermer l'accès shell à votre Noeud.
## Créer un PersistentVolume
Dans cet exercice, vous allez créer un PersistentVolume de type *hostpath*. Kubernetes prend en charge le type hostPath pour le développement et les tests sur un cluster à noeud unique. Un PersistentVolume de type hostPath utilise un fichier ou un dossier sur le noeud pour simuler un stockage réseau.
Dans un cluster de production, vous n'utiliseriez pas le type *hostPath*. Plus communément, un administrateur de cluster
provisionnerait une ressource réseau comme un disque persistant Google Compute Engine,
un partage NFS ou un volume Amazon Elastic Block Store. Les administrateurs de cluster peuvent également
utiliser les [StorageClasses](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#storageclass-v1-storage-k8s-io)
pour paramétrer du
[provisioning dynamique](/docs/concepts/storage/dynamic-provisioning/).
Voici le fichier de configuration pour le PersistentVolume de type hostPath:
{{< codenew file="pods/storage/pv-volume.yaml" >}}
Le fichier de configuration spécifie que le chemin du volume sur le noeud est `/mnt/data`. Il spécifie aussi une taille de 10 gibibytes, ainsi qu'un mode d'accès de type `ReadWriteOnce`, impliquant que le volume ne peut être monté en lecture et écriture que par un seul noeud. Le fichier définit un [nom de StorageClass](/docs/concepts/storage/persistent-volumes/#class) à `manual`, ce qui sera utilisé pour attacher un PersistentVolumeClaim à ce PersistentVolume
Créez le PersistentVolume:
```shell
kubectl apply -f https://k8s.io/examples/pods/storage/pv-volume.yaml
```
Affichez les informations du PersistentVolume:
```shell
kubectl get pv task-pv-volume
```
Le résultat affiche que le PersitentVolume a un `STATUS` de `Available`.
Cela signifie qu'il n'a pas encore été attaché à un PersistentVolumeClaim.
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
task-pv-volume 10Gi RWO Retain Available manual 4s
## Créer un PersistentVolumeClaim
La prochaine étape est de créer un PersistentVolumeClaim (demande de stockage). Les Pods utilisent les PersistentVolumeClaims pour demander un accès à du stockage physique.
Dans cet exercice, vous créez un PersistentVolumeClaim qui demande un volume d'au moins 3 GB, et qui peut être monté en lecture et écriture sur au moins un noeud.
Voici le fichier de configuration du PersistentVolumeClaim:
{{< codenew file="pods/storage/pv-claim.yaml" >}}
Créez le PersistentVolumeClaim:
kubectl apply -f https://k8s.io/examples/pods/storage/pv-claim.yaml
Après avoir créé le PersistentVolumeClaim, le control plane de Kubernetes va chercher un PersistentVolume qui respecte les exigences du PersistentVolumeClaim. Si le control plane trouve un PersistentVolume approprié avec la même StorageClass, il attache la demande au volume.
Affichez à nouveau les informations du PersistentVolume:
```shell
kubectl get pv task-pv-volume
```
Maintenant, le résultat affiche un `STATUS` à `Bound`.
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
task-pv-volume 10Gi RWO Retain Bound default/task-pv-claim manual 2m
Affichez les informations du PersistentVolumeClaim:
```shell
kubectl get pvc task-pv-claim
```
Le résultat montre que le PersistentVolumeClaim est attaché au PersistentVolume `task-pv-volume`.
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
task-pv-claim Bound task-pv-volume 10Gi RWO manual 30s
## Créer un Pod
La prochaine étape est de créer un Pod qui utilise le PersistentVolumeClaim comme volume de stockage.
Voici le fichier de configuration du Pod:
{{< codenew file="pods/storage/pv-pod.yaml" >}}
Notez que le fichier de configuration du Pod spécifie un PersistentVolumeClaim et non un PersistentVolume. Du point de vue du Pod, la demande est un volume de stockage.
Créez le Pod:
```shell
kubectl apply -f https://k8s.io/examples/pods/storage/pv-pod.yaml
```
Vérifiez que le container dans le Pod est opérationnel:
```shell
kubectl get pod task-pv-pod
```
Lancez un shell dans le container du Pod:
```shell
kubectl exec -it task-pv-pod -- /bin/bash
```
Depuis le shell, vérifiez que nginx utilise le fichier `index.html` du volume hostPath:
```shell
# Assurez vous de lancer ces 3 commandes dans le shell qui provient de
# la commande "kubectl exec" exécutée précedemment
apt update
apt install curl
curl http://localhost/
```
Le résultat doit afficher le texte qui a été écrit auparavant dans le fichier `index.html` dans le volume hostPath:
Hello from Kubernetes storage
Si vous voyez ce message, vous avez configuré un Pod qui utilise un PersistentVolumeClaim comme stockage avec succès.
## Nettoyage
Supprimez le Pod, le PersistentVolumeClaim et le PersistentVolume:
```shell
kubectl delete pod task-pv-pod
kubectl delete pvc task-pv-claim
kubectl delete pv task-pv-volume
```
Si vous n'avez pas déja de session ouverte sur le noeud de votre cluster, ouvrez en un de la même manière que précédemment.
Dans la session shell, supprimez les fichiers et dossiers que vous avez créé:
```shell
# En assumant que le noeud utilise "sudo" pour les accès privilégiés
sudo rm /mnt/data/index.html
sudo rmdir /mnt/data
```
Vous pouvez maintenant arrêter la session shell vers votre noeud.
## Monter le même PersistentVolume à deux endroits
Vous pouvez monter plusieurs fois un même PersistentVolume
à plusieurs endroits différents dans votre container nginx:
{{< codenew file="pods/storage/pv-duplicate.yaml" >}}
- `/usr/share/nginx/html` pour le site statique
- `/etc/nginx/nginx.conf` pour la configuration par défaut
<!-- discussion -->
## Contrôle d'accès
Le stockage configuré avec un ID de groupe (GID) ne permettra l'écriture que par les Pods qui utilisent le même GID.
Les GID manquants ou qui ne correspondent pas entraîneront des erreurs d'autorisation refusée. Pour alléger la coordination avec les utilisateurs, un administrateur peut annoter un PersistentVolume
avec un GID. Ensuite, le GID sera automatiquement ajouté à tout pod qui utilise le PersistentVolume.
Utilisez l'annotation `pv.beta.kubernetes.io/gid` comme ceci:
```yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv1
annotations:
pv.beta.kubernetes.io/gid: "1234"
```
Lorsqu'un Pod attache un PersistentVolume qui a une annotation pour le GID, ce dernier est appliqué à tous les containers du Pod de la même façon que les GID spécifiés dans le contexte de sécurité du Pod. Peu importe s'il provient d'une annotation du PersistentVolume ou de la spécification du Pod, chaque GID sera appliqué au premier process exécuté dans chaque container.
{{< note >}}
Quand un Pod attache un PersistentVolume, les GID associés avec le PersistentVolume ne sont pas répércutés sur la spécification de la ressource du Pod.
{{< /note >}}
## {{% heading "whatsnext" %}}
* Pour en savoir plus sur les [PersistentVolumes](/docs/concepts/storage/persistent-volumes/).
* Lire la [documentation de conception sur le stockage persistant](https://git.k8s.io/design-proposals-archive/storage/persistent-storage.md).
### Références
* [PersistentVolume](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#persistentvolume-v1-core)
* [PersistentVolumeSpec](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#persistentvolumespec-v1-core)
* [PersistentVolumeClaim](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#persistentvolumeclaim-v1-core)
* [PersistentVolumeClaimSpec](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#persistentvolumeclaimspec-v1-core)

View File

@ -1,4 +1,5 @@
---
title: Injection des données dans les applications
title: "Injecter des données dans les applications"
description: Spécifier la configuration et paramètres pour les Pods qui exécutent vos charges de travail.
weight: 30
---

Some files were not shown because too many files have changed in this diff Show More