Merge branch 'main' into fixes-31657

pull/32933/head
Abigail McCarthy 2022-06-06 08:06:13 -04:00 committed by GitHub
commit 3272b3577c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1035 changed files with 98689 additions and 26205 deletions

1
.gitmodules vendored
View File

@ -1,6 +1,7 @@
[submodule "themes/docsy"]
path = themes/docsy
url = https://github.com/google/docsy.git
branch = v0.2.0
[submodule "api-ref-generator"]
path = api-ref-generator
url = https://github.com/kubernetes-sigs/reference-docs

View File

@ -27,16 +27,18 @@ RUN mkdir $HOME/src && \
FROM golang:1.16-alpine
RUN apk add --no-cache \
runuser \
git \
openssh-client \
rsync \
npm && \
npm install -D autoprefixer postcss-cli
RUN mkdir -p /usr/local/src && \
cd /usr/local/src && \
RUN mkdir -p /var/hugo && \
addgroup -Sg 1000 hugo && \
adduser -Sg hugo -u 1000 -h /src hugo
adduser -Sg hugo -u 1000 -h /var/hugo hugo && \
chown -R hugo: /var/hugo && \
runuser -u hugo -- git config --global --add safe.directory /src
COPY --from=0 /go/bin/hugo /usr/local/bin/hugo

View File

@ -71,6 +71,9 @@ container-image: ## Build a container image for the preview of the website
--tag $(CONTAINER_IMAGE) \
--build-arg HUGO_VERSION=$(HUGO_VERSION)
container-push: container-image ## Push container image for the preview of the website
$(CONTAINER_ENGINE) push $(CONTAINER_IMAGE)
container-build: module-check
$(CONTAINER_RUN) --read-only --mount type=tmpfs,destination=/tmp,tmpfs-mode=01777 $(CONTAINER_IMAGE) sh -c "npm ci && hugo --minify --environment development"

View File

@ -127,6 +127,7 @@ aliases:
# MasayaAoyama
- nasa9084
# oke-py
- ptux
sig-docs-ko-owners: # Admins for Korean content
- ClaudiaJKang
- gochist
@ -178,6 +179,7 @@ aliases:
- tanjunchen
- tengqm
- xichengliudui
- ydFu
# zhangxiaoyu-zidif
sig-docs-pt-owners: # Admins for Portuguese content
- edsoncelio
@ -248,7 +250,6 @@ aliases:
- cpanato # SIG Technical Lead
- jeremyrickard # SIG Technical Lead
- justaugustus # SIG Chair
- LappleApple # SIG Program Manager
- puerco # SIG Technical Lead
- saschagrunert # SIG Chair
release-engineering-approvers:

View File

@ -4,6 +4,9 @@
このリポジトリには、[KubernetesのWebサイトとドキュメント](https://kubernetes.io/)をビルドするために必要な全アセットが格納されています。貢献に興味を持っていただきありがとうございます!
- [ドキュメントに貢献する](#contributing-to-the-docs)
- [翻訳された`README.md`一覧](#localization-readmemds)
# リポジトリの使い方
Hugo(Extended version)を使用してWebサイトをローカルで実行することも、コンテナランタイムで実行することもできます。コンテナランタイムを使用することを強くお勧めします。これにより、本番Webサイトとのデプロイメントの一貫性が得られます。
@ -56,6 +59,43 @@ make serve
これで、Hugoのサーバーが1313番ポートを使って開始します。お使いのブラウザにて http://localhost:1313 にアクセスしてください。リポジトリ内のソースファイルに変更を加えると、HugoがWebサイトの内容を更新してブラウザに反映します。
## API reference pagesをビルドする
`content/en/docs/reference/kubernetes-api`に配置されているAPIリファレンスページは<https://github.com/kubernetes-sigs/reference-docs/tree/master/gen-resourcesdocs>を使ってSwagger仕様書からビルドされています。
新しいKubernetesリリースのためにリファレンスページをアップデートするには、次の手順を実行します:
1. `api-ref-generator`サブモジュールをプルする:
```bash
git submodule update --init --recursive --depth 1
```
2. Swagger仕様書を更新する:
```bash
curl 'https://raw.githubusercontent.com/kubernetes/kubernetes/master/api/openapi-spec/swagger.json' > api-ref-assets/api/swagger.json
```
3. 新しいリリースの変更を反映するため、`api-ref-assets/config/`で`toc.yaml`と`fields.yaml`を適用する。
4. 次に、ページをビルドする:
```bash
make api-reference
```
コンテナイメージからサイトを作成・サーブする事でローカルで結果をテストすることができます:
```bash
make container-image
make container-serve
```
APIリファレンスを見るために、ブラウザで<http://localhost:1313/docs/reference/kubernetes-api/>を開いてください。
5. 新しいコントラクトのすべての変更が設定ファイル`toc.yaml`と`fields.yaml`に反映されたら、新しく生成されたAPIリファレンスページとともにPull Requestを作成します。
## トラブルシューティング
### error: failed to transform resource: TOCSS: failed to transform "scss/main.scss" (text/x-scss): this feature is not available in your current Hugo version
@ -107,7 +147,7 @@ sudo launchctl load -w /Library/LaunchDaemons/limit.maxfiles.plist
- [Slack](https://kubernetes.slack.com/messages/kubernetes-docs-ja)
- [メーリングリスト](https://groups.google.com/forum/#!forum/kubernetes-sig-docs)
## ドキュメントに貢献する
## ドキュメントに貢献する {#contributing-to-the-docs}
GitHubの画面右上にある**Fork**ボタンをクリックすると、お使いのGitHubアカウントに紐付いた本リポジトリのコピーが作成され、このコピーのことを*フォーク*と呼びます。フォークリポジトリの中ではお好きなように変更を加えていただいて構いません。加えた変更をこのリポジトリに追加したい任意のタイミングにて、フォークリポジトリからPull Reqeustを作成してください。
@ -124,7 +164,15 @@ Kubernetesのドキュメントへの貢献に関する詳細については以
* [ドキュメントのスタイルガイド](https://kubernetes.io/docs/contribute/style/style-guide/)
* [Kubernetesドキュメントの翻訳方法](https://kubernetes.io/docs/contribute/localization/)
## 翻訳された`README.md`一覧
### New Contributor Ambassadors
コントリビュートする時に何か助けが必要なら、[New Contributor Ambassadors](https://kubernetes.io/docs/contribute/advanced/#serve-as-a-new-contributor-ambassador)に聞いてみると良いでしょう。彼らはSIG Docsのapproverで、最初の数回のPull Requestを通して新しいコントリビューターを指導し助けることを責務としています。New Contributors Ambassadorsにコンタクトするには、[Kubernetes Slack](https://slack.k8s.io)が最適な場所です。現在のSIG DocsのNew Contributor Ambassadorは次の通りです:
| 名前 | Slack | GitHub |
| -------------------------- | -------------------------- | -------------------------- |
| Arsh Sharma | @arsh | @RinkiyaKeDad |
## 翻訳された`README.md`一覧 {#localization-readmemds}
| Language | Language |
|---|---|

View File

@ -13,7 +13,14 @@ This repository contains the assets required to build the [Kubernetes website an
我们非常高兴您想要参与贡献!
<!--
# Using this repository
- [Contributing to the docs](#contributing-to-the-docs)
- [Localization ReadMes](#localization-readmemds)
-->
- [为文档做贡献](#为文档做贡献)
- [README.md 本地化](#readmemd-本地化)
<!--
## Using this repository
You can run the website locally using Hugo (Extended version), or you can run it in a container runtime. We strongly recommend using the container runtime, as it gives deployment consistency with the live website.
-->
@ -46,7 +53,7 @@ Before you start, install the dependencies. Clone the repository and navigate to
-->
开始前,先安装这些依赖。克隆本仓库并进入对应目录:
```
```bash
git clone https://github.com/kubernetes/website.git
cd website
```
@ -57,7 +64,7 @@ The Kubernetes website uses the [Docsy Hugo theme](https://github.com/google/doc
Kubernetes 网站使用的是 [Docsy Hugo 主题](https://github.com/google/docsy#readme)。 即使你打算在容器中运行网站,我们也强烈建议你通过运行以下命令来引入子模块和其他开发依赖项:
```
```bash
# pull in the Docsy submodule
git submodule update --init --recursive --depth 1
```
@ -72,15 +79,23 @@ To build the site in a container, run the following to build the container image
要在容器中构建网站,请通过以下命令来构建容器镜像并运行:
```
```bash
make container-image
make container-serve
```
<!--
Open up your browser to http://localhost:1313 to view the website. As you make changes to the source files, Hugo updates the website and forces a browser refresh.
If you see errors, it probably means that the hugo container did not have enough computing resources available. To solve it, increase the amount of allowed CPU and memory usage for Docker on your machine ([MacOSX](https://docs.docker.com/docker-for-mac/#resources) and [Windows](https://docs.docker.com/docker-for-windows/#resources)).
-->
启动浏览器,打开 http://localhost:1313 来查看网站。
如果您看到错误,这可能意味着 hugo 容器没有足够的可用计算资源。
要解决这个问题,请增加机器([MacOSX](https://docs.docker.com/docker-for-mac/#resources)
和 [Windows](https://docs.docker.com/docker-for-windows/#resources))上
Docker 允许的 CPU 和内存使用量。
<!--
Open up your browser to <http://localhost:1313> to view the website. As you make changes to the source files, Hugo updates the website and forces a browser refresh.
-->
启动浏览器,打开 <http://localhost:1313> 来查看网站。
当你对源文件作出修改时Hugo 会更新网站并强制浏览器执行刷新操作。
<!--
@ -104,18 +119,84 @@ make serve
```
<!--
This will start the local Hugo server on port 1313. Open up your browser to http://localhost:1313 to view the website. As you make changes to the source files, Hugo updates the website and forces a browser refresh.
This will start the local Hugo server on port 1313. Open up your browser to <http://localhost:1313> to view the website. As you make changes to the source files, Hugo updates the website and forces a browser refresh.
-->
上述命令会在端口 1313 上启动本地 Hugo 服务器。
启动浏览器,打开 http://localhost:1313 来查看网站。
启动浏览器,打开 <http://localhost:1313> 来查看网站。
当你对源文件作出修改时Hugo 会更新网站并强制浏览器执行刷新操作。
<!--
## Building the API reference pages
-->
## 构建 API 参考页面
<!--
The API reference pages located in `content/en/docs/reference/kubernetes-api` are built from the Swagger specification, using <https://github.com/kubernetes-sigs/reference-docs/tree/master/gen-resourcesdocs>.
To update the reference pages for a new Kubernetes release follow these steps:
-->
位于 `content/en/docs/reference/kubernetes-api` 的 API 参考页面是根据 Swagger 规范构建的,使用 <https://github.com/kubernetes-sigs/reference-docs/tree/master/gen-resourcesdocs>
要更新新 Kubernetes 版本的参考页面,请执行以下步骤:
<!--
1. Pull in the `api-ref-generator` submodule:
-->
1. 拉取 `api-ref-generator` 子模块:
```bash
git submodule update --init --recursive --depth 1
```
<!--
2. Update the Swagger specification:
-->
2. 更新 Swagger 规范:
```bash
curl 'https://raw.githubusercontent.com/kubernetes/kubernetes/master/api/openapi-spec/swagger.json' > api-ref-assets/api/swagger.json
```
<!--
3. In `api-ref-assets/config/`, adapt the files `toc.yaml` and `fields.yaml` to reflect the changes of the new release.
-->
3. 在 `api-ref-assets/config/` 中,调整文件 `toc.yaml``fields.yaml` 以反映新版本的变化。
<!--
4. Next, build the pages:
-->
4. 接下来,构建页面:
```bash
make api-reference
```
<!--
You can test the results locally by making and serving the site from a container image:
-->
您可以通过从容器映像创建和提供站点来在本地测试结果:
```bash
make container-image
make container-serve
```
<!--
In a web browser, go to <http://localhost:1313/docs/reference/kubernetes-api/> to view the API reference.
-->
在 Web 浏览器中,打开 <http://localhost:1313/docs/reference/kubernetes-api/> 查看 API 参考。
<!--
5. When all changes of the new contract are reflected into the configuration files `toc.yaml` and `fields.yaml`, create a Pull Request with the newly generated API reference pages.
-->
5. 当所有新的更改都反映到配置文件 `toc.yaml``fields.yaml` 中时,使用新生成的 API 参考页面创建一个 Pull Request。
<!--
## Troubleshooting
### error: failed to transform resource: TOCSS: failed to transform "scss/main.scss" (text/x-scss): this feature is not available in your current Hugo version
Hugo is shipped in two set of binaries for technical reasons. The current website runs based on the **Hugo Extended** version only. In the [release page](https://github.com/gohugoio/hugo/releases) look for archives with `extended` in the name. To confirm, run `hugo version` and look for the word `extended`.
-->
## 故障排除
@ -131,22 +212,28 @@ Hugo is shipped in two set of binaries for technical reasons. The current websit
If you run `make serve` on macOS and receive the following error:
-->
### 对 macOs 上打开太多文件的故障排除
### 对 macOS 上打开太多文件的故障排除
如果在 macOS 上运行 `make serve` 收到以下错误:
```
```bash
ERROR 2020/08/01 19:09:18 Error: listen tcp 127.0.0.1:1313: socket: too many open files
make: *** [serve] Error 1
```
<!--
Try checking the current limit for open files:
-->
试着查看一下当前打开文件数的限制:
`launchctl limit maxfiles`
然后运行以下命令参考https://gist.github.com/tombigel/d503800a282fcadbee14b537735d202c
<!--
Then run the following commands (adapted from <https://gist.github.com/tombigel/d503800a282fcadbee14b537735d202c>):
-->
然后运行以下命令(参考 <https://gist.github.com/tombigel/d503800a282fcadbee14b537735d202c>
```
```shell
#!/bin/sh
# These are the original gist links, linking to my gists now.
@ -165,6 +252,9 @@ sudo chown root:wheel /Library/LaunchDaemons/limit.maxproc.plist
sudo launchctl load -w /Library/LaunchDaemons/limit.maxfiles.plist
```
<!--
This works for Catalina as well as Mojave macOS.
-->
这适用于 Catalina 和 Mojave macOS。
<!--
@ -174,7 +264,8 @@ Learn more about SIG Docs Kubernetes community and meetings on the [community pa
You can also reach the maintainers of this project at:
- [Slack](https://kubernetes.slack.com/messages/sig-docs) [Get an invite for this Slack](https://slack.k8s.io/)
- [Slack](https://kubernetes.slack.com/messages/sig-docs)
- [Get an invite for this Slack](https://slack.k8s.io/)
- [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-sig-docs)
-->
# 参与 SIG Docs 工作
@ -184,20 +275,21 @@ You can also reach the maintainers of this project at:
你也可以通过以下渠道联系本项目的维护人员:
- [Slack](https://kubernetes.slack.com/messages/sig-docs) [加入Slack](https://slack.k8s.io/)
- [Slack](https://kubernetes.slack.com/messages/sig-docs)
- [获得此 Slack 的邀请](https://slack.k8s.io/)
- [邮件列表](https://groups.google.com/forum/#!forum/kubernetes-sig-docs)
<!--
## Contributing to the docs
You can click the **Fork** button in the upper-right area of the screen to create a copy of this repository in your GitHub account. This copy is called a *fork*. Make any changes you want in your fork, and when you are ready to send those changes to us, go to your fork and create a new pull request to let us know about it.
You can click the **Fork** button in the upper-right area of the screen to create a copy of this repository in your GitHub account. This copy is called a _fork_. Make any changes you want in your fork, and when you are ready to send those changes to us, go to your fork and create a new pull request to let us know about it.
Once your pull request is created, a Kubernetes reviewer will take responsibility for providing clear, actionable feedback. As the owner of the pull request, **it is your responsibility to modify your pull request to address the feedback that has been provided to you by the Kubernetes reviewer.**
Once your pull request is created, a Kubernetes reviewer will take responsibility for providing clear, actionable feedback. As the owner of the pull request, **it is your responsibility to modify your pull request to address the feedback that has been provided to you by the Kubernetes reviewer.**
-->
# 为文档做贡献
你也可以点击屏幕右上方区域的 **Fork** 按钮,在你自己的 GitHub
账号下创建本仓库的拷贝。此拷贝被称作 *fork*
账号下创建本仓库的拷贝。此拷贝被称作 _fork_
你可以在自己的拷贝中任意地修改文档,并在你已准备好将所作修改提交给我们时,
在你自己的拷贝下创建一个拉取请求Pull Request以便让我们知道。
@ -208,7 +300,7 @@ Once your pull request is created, a Kubernetes reviewer will take responsibilit
<!--
Also, note that you may end up having more than one Kubernetes reviewer provide you feedback or you may end up getting feedback from a Kubernetes reviewer that is different than the one initially assigned to provide you feedback.
Furthermore, in some cases, one of your reviewers might ask for a technical review from a Kubernetes tech reviewer when needed. Reviewers will do their best to provide feedback in a timely fashion but response time can vary based on circumstances.
Furthermore, in some cases, one of your reviewers might ask for a technical review from a Kubernetes tech reviewer when needed. Reviewers will do their best to provide feedback in a timely fashion but response time can vary based on circumstances.
-->
还要提醒的一点,有时可能会有不止一个 Kubernetes 评审人为你提供反馈意见。
有时候,某个评审人的意见和另一个最初被指派的评审人的意见不同。
@ -220,17 +312,65 @@ Furthermore, in some cases, one of your reviewers might ask for a technical revi
<!--
For more information about contributing to the Kubernetes documentation, see:
* [Contribute to Kubernetes docs](https://kubernetes.io/docs/contribute/)
* [Page Content Types](https://kubernetes.io/docs/contribute/style/page-content-types/)
* [Documentation Style Guide](https://kubernetes.io/docs/contribute/style/style-guide/)
* [Localizing Kubernetes Documentation](https://kubernetes.io/docs/contribute/localization/)
- [Contribute to Kubernetes docs](https://kubernetes.io/docs/contribute/)
- [Page Content Types](https://kubernetes.io/docs/contribute/style/page-content-types/)
- [Documentation Style Guide](https://kubernetes.io/docs/contribute/style/style-guide/)
- [Localizing Kubernetes Documentation](https://kubernetes.io/docs/contribute/localization/)
-->
有关为 Kubernetes 文档做出贡献的更多信息,请参阅:
* [贡献 Kubernetes 文档](https://kubernetes.io/docs/contribute/)
* [页面内容类型](https://kubernetes.io/docs/contribute/style/page-content-types/)
* [文档风格指南](https://kubernetes.io/docs/contribute/style/style-guide/)
* [本地化 Kubernetes 文档](https://kubernetes.io/docs/contribute/localization/)
- [贡献 Kubernetes 文档](https://kubernetes.io/docs/contribute/)
- [页面内容类型](https://kubernetes.io/docs/contribute/style/page-content-types/)
- [文档风格指南](https://kubernetes.io/docs/contribute/style/style-guide/)
- [本地化 Kubernetes 文档](https://kubernetes.io/docs/contribute/localization/)
<!--
### New contributor ambassadors
-->
### 新贡献者大使
<!--
If you need help at any point when contributing, the [New Contributor Ambassadors](https://kubernetes.io/docs/contribute/advanced/#serve-as-a-new-contributor-ambassador) are a good point of contact. These are SIG Docs approvers whose responsibilities include mentoring new contributors and helping them through their first few pull requests. The best place to contact the New Contributors Ambassadors would be on the [Kubernetes Slack](https://slack.k8s.io/). Current New Contributors Ambassadors for SIG Docs:
-->
如果您在贡献时需要帮助,[新贡献者大使](https://kubernetes.io/docs/contribute/advanced/#serve-as-a-new-contributor-ambassador)是一个很好的联系人。
这些是 SIG Docs 批准者,其职责包括指导新贡献者并帮助他们完成最初的几个拉取请求。
联系新贡献者大使的最佳地点是 [Kubernetes Slack](https://slack.k8s.io/)。
SIG Docs 的当前新贡献者大使:
<!--
| Name | Slack | GitHub |
| -------------------------- | -------------------------- | -------------------------- |
| Arsh Sharma | @arsh | @RinkiyaKeDad |
-->
| 姓名 | Slack | GitHub |
| -------------------------- | -------------------------- | -------------------------- |
| Arsh Sharma | @arsh | @RinkiyaKeDad |
<!--
## Localization `README.md`'s
-->
## `README.md` 本地化
<!--
| Language | Language |
| -------------------------- | -------------------------- |
| [Chinese](README-zh.md) | [Korean](README-ko.md) |
| [French](README-fr.md) | [Polish](README-pl.md) |
| [German](README-de.md) | [Portuguese](README-pt.md) |
| [Hindi](README-hi.md) | [Russian](README-ru.md) |
| [Indonesian](README-id.md) | [Spanish](README-es.md) |
| [Italian](README-it.md) | [Ukrainian](README-uk.md) |
| [Japanese](README-ja.md) | [Vietnamese](README-vi.md) |
-->
| 语言 | 语言 |
| -------------------------- | -------------------------- |
| [中文](README-zh.md) | [韩语](README-ko.md) |
| [法语](README-fr.md) | [波兰语](README-pl.md) |
| [德语](README-de.md) | [葡萄牙语](README-pt.md) |
| [印地语](README-hi.md) | [俄语](README-ru.md) |
| [印尼语](README-id.md) | [西班牙语](README-es.md) |
| [意大利语](README-it.md) | [乌克兰语](README-uk.md) |
| [日语](README-ja.md) | [越南语](README-vi.md) |
# 中文本地化
@ -241,19 +381,19 @@ For more information about contributing to the Kubernetes documentation, see:
* [Slack channel](https://kubernetes.slack.com/messages/kubernetes-docs-zh)
<!--
### Code of conduct
## Code of conduct
Participation in the Kubernetes community is governed by the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md).
-->
# 行为准则
## 行为准则
参与 Kubernetes 社区受 [CNCF 行为准则](https://github.com/cncf/foundation/blob/master/code-of-conduct.md) 约束。
<!--
## Thank you!
## Thank you
Kubernetes thrives on community participation, and we appreciate your contributions to our website and our documentation!
-->
# 感谢!
## 感谢你
Kubernetes 因为社区的参与而蓬勃发展,感谢您对我们网站和文档的贡献!

File diff suppressed because it is too large Load Diff

View File

@ -405,6 +405,7 @@
- fields:
- jobTemplate
- schedule
- timeZone
- concurrencyPolicy
- startingDeadlineSeconds
- suspend

View File

@ -124,7 +124,7 @@ parts:
version: v1
- name: CSIStorageCapacity
group: storage.k8s.io
version: v1beta1
version: v1
- name: Authentication Resources
chapters:
- name: ServiceAccount

View File

@ -634,12 +634,12 @@ body.td-documentation {
a {
color: inherit;
border-bottom: 1px solid #fff;
text-decoration: underline;
}
a:hover {
color: inherit;
border-bottom: none;
text-decoration: initial;
}
}
@ -648,6 +648,9 @@ body.td-documentation {
}
#announcement {
// default background is blue; overrides are possible
color: #fff;
.announcement-main {
margin-left: auto;
margin-right: auto;
@ -660,9 +663,8 @@ body.td-documentation {
}
/* always white */
h1, h2, h3, h4, h5, h6, p * {
color: #ffffff;
color: inherit; /* defaults to white */
background: transparent;
img.event-logo {

View File

@ -9,17 +9,20 @@ options:
steps:
# It's fine to bump the tag to a recent version, as needed
- name: "gcr.io/k8s-staging-test-infra/gcb-docker-gcloud:v20210917-12df099d55"
entrypoint: make
entrypoint: 'bash'
env:
- DOCKER_CLI_EXPERIMENTAL=enabled
- TAG=$_GIT_TAG
- BASE_REF=$_PULL_BASE_REF
args:
- container-image
- -c
- |
gcloud auth configure-docker \
&& make container-push
substitutions:
# _GIT_TAG will be filled with a git-based tag for the image, of the form vYYYYMMDD-hash, and
# can be used as a substitution
_GIT_TAG: "12345"
# _PULL_BASE_REF will contain the ref that was pushed to to trigger this build -
# a branch like 'master' or 'release-0.2', or a tag like 'v0.2'.
_PULL_BASE_REF: "master"
# a branch like 'main' or 'release-0.2', or a tag like 'v0.2'.
_PULL_BASE_REF: "main"

View File

@ -122,7 +122,7 @@ id = "UA-00000000-0"
[params]
copyright_k8s = "The Kubernetes Authors"
copyright_linux = "Copyright © 2020 The Linux Foundation ®."
copyright_linux = "Copyright © 2020 The Linux Foundation ®."
# privacy_policy = "https://policies.google.com/privacy"
@ -139,10 +139,10 @@ time_format_default = "January 02, 2006 at 3:04 PM PST"
description = "Production-Grade Container Orchestration"
showedit = true
latest = "v1.23"
latest = "v1.24"
fullversion = "v1.23.0"
version = "v1.23"
fullversion = "v1.24.0"
version = "v1.24"
githubbranch = "main"
docsbranch = "main"
deprecated = false
@ -155,10 +155,10 @@ githubWebsiteRaw = "raw.githubusercontent.com/kubernetes/website"
# GitHub repository link for editing a page and opening issues.
github_repo = "https://github.com/kubernetes/website"
#Searching
# Searching
k8s_search = true
#The following search parameters are specific to Docsy's implementation. Kubernetes implementes its own search-related partials and scripts.
# The following search parameters are specific to Docsy's implementation. Kubernetes implementes its own search-related partials and scripts.
# Google Custom Search Engine ID. Remove or comment out to disable search.
#gcs_engine_id = "011737558837375720776:fsdu1nryfng"
@ -179,51 +179,53 @@ js = [
]
[[params.versions]]
fullversion = "v1.23.0"
version = "v1.23"
githubbranch = "v1.23.0"
fullversion = "v1.24.0"
version = "v1.24"
githubbranch = "v1.24.0"
docsbranch = "main"
url = "https://kubernetes.io"
[[params.versions]]
fullversion = "v1.22.4"
fullversion = "v1.23.6"
version = "v1.23"
githubbranch = "v1.23.6"
docsbranch = "release-1.23"
url = "https://v1-23.docs.kubernetes.io"
[[params.versions]]
fullversion = "v1.22.9"
version = "v1.22"
githubbranch = "v1.22.4"
githubbranch = "v1.22.9"
docsbranch = "release-1.22"
url = "https://v1-22.docs.kubernetes.io"
[[params.versions]]
fullversion = "v1.21.7"
fullversion = "v1.21.12"
version = "v1.21"
githubbranch = "v1.21.7"
githubbranch = "v1.21.12"
docsbranch = "release-1.21"
url = "https://v1-21.docs.kubernetes.io"
[[params.versions]]
fullversion = "v1.20.13"
fullversion = "v1.20.15"
version = "v1.20"
githubbranch = "v1.20.13"
githubbranch = "v1.20.15"
docsbranch = "release-1.20"
url = "https://v1-20.docs.kubernetes.io"
[[params.versions]]
fullversion = "v1.19.16"
version = "v1.19"
githubbranch = "v1.19.16"
docsbranch = "release-1.19"
url = "https://v1-19.docs.kubernetes.io"
# User interface configuration
[params.ui]
# Enable to show the side bar menu in its compact state.
sidebar_menu_compact = false
# Show expand/collapse icon for sidebar sections.
sidebar_menu_foldable = true
# https://github.com/gohugoio/hugo/issues/8918#issuecomment-903314696
sidebar_cache_limit = 1
# Set to true to disable breadcrumb navigation.
# Set to true to disable breadcrumb navigation.
breadcrumb_disable = false
# Set to true to hide the sidebar search box (the top nav search box will still be displayed if search is enabled)
# Set to true to hide the sidebar search box (the top nav search box will still be displayed if search is enabled)
sidebar_search_disable = false
# Set to false if you don't want to display a logo (/assets/icons/logo.svg) in the top nav bar
# Set to false if you don't want to display a logo (/assets/icons/logo.svg) in the top nav bar
navbar_logo = true
# Set to true to disable the About link in the site footer
footer_about_disable = false
@ -244,50 +246,50 @@ no = 'Sorry to hear that. Please <a href="https://github.com/USERNAME/REPOSITORY
name = "User mailing list"
url = "https://discuss.kubernetes.io"
icon = "fa fa-envelope"
desc = "Discussion and help from your fellow users"
desc = "Discussion and help from your fellow users"
[[params.links.user]]
name = "Twitter"
url = "https://twitter.com/kubernetesio"
icon = "fab fa-twitter"
desc = "Follow us on Twitter to get the latest news!"
desc = "Follow us on Twitter to get the latest news!"
[[params.links.user]]
name = "Calendar"
url = "https://calendar.google.com/calendar/embed?src=calendar%40kubernetes.io"
icon = "fas fa-calendar-alt"
desc = "Google Calendar for Kubernetes"
desc = "Google Calendar for Kubernetes"
[[params.links.user]]
name = "Youtube"
url = "https://youtube.com/kubernetescommunity"
icon = "fab fa-youtube"
desc = "Youtube community videos"
desc = "Youtube community videos"
# Developer relevant links. These will show up on right side of footer and in the community page if you have one.
[[params.links.developer]]
name = "GitHub"
url = "https://github.com/kubernetes/kubernetes"
icon = "fab fa-github"
desc = "Development takes place here!"
desc = "Development takes place here!"
[[params.links.developer]]
name = "Slack"
url = "https://slack.k8s.io"
icon = "fab fa-slack"
desc = "Chat with other project developers"
desc = "Chat with other project developers"
[[params.links.developer]]
name = "Contribute"
url = "https://git.k8s.io/community/contributors/guide"
icon = "fas fa-edit"
desc = "Contribute to the Kubernetes website"
desc = "Contribute to the Kubernetes website"
[[params.links.developer]]
name = "Stack Overflow"
url = "https://stackoverflow.com/questions/tagged/kubernetes"
icon = "fab fa-stack-overflow"
desc = "Practical questions and curated answers"
desc = "Practical questions and curated answers"
# Language definitions.
@ -295,7 +297,7 @@ no = 'Sorry to hear that. Please <a href="https://github.com/USERNAME/REPOSITORY
[languages.en]
title = "Kubernetes"
description = "Production-Grade Container Orchestration"
languageName ="English"
languageName = "English"
# Weight used for sorting.
weight = 1
languagedirection = "ltr"
@ -342,7 +344,7 @@ language_alternatives = ["en"]
[languages.fr]
title = "Kubernetes"
description = "Solution professionnelle dorchestration de conteneurs"
languageName ="Français (French)"
languageName = "Français (French)"
languageNameLatinScript = "Français"
weight = 5
contentDir = "content/fr"
@ -370,7 +372,7 @@ language_alternatives = ["en"]
[languages.no]
title = "Kubernetes"
description = "Production-Grade Container Orchestration"
languageName ="Norsk (Norwegian)"
languageName = "Norsk (Norwegian)"
languageNameLatinScript = "Norsk"
weight = 7
contentDir = "content/no"
@ -384,7 +386,7 @@ language_alternatives = ["en"]
[languages.de]
title = "Kubernetes"
description = "Produktionsreife Container-Orchestrierung"
languageName ="Deutsch (German)"
languageName = "Deutsch (German)"
languageNameLatinScript = "Deutsch"
weight = 8
contentDir = "content/de"
@ -398,7 +400,7 @@ language_alternatives = ["en"]
[languages.es]
title = "Kubernetes"
description = "Orquestación de contenedores para producción"
languageName ="Español (Spanish)"
languageName = "Español (Spanish)"
languageNameLatinScript = "Español"
weight = 9
contentDir = "content/es"
@ -412,9 +414,10 @@ language_alternatives = ["en"]
[languages.pt-br]
title = "Kubernetes"
description = "Orquestração de contêineres em nível de produção"
languageName ="Português (Portuguese)"
languageName = "Português (Portuguese)"
languageNameLatinScript = "Português"
weight = 9
weight = 10
contentDir = "content/pt-br"
languagedirection = "ltr"
@ -428,7 +431,7 @@ title = "Kubernetes"
description = "Orkestrasi Kontainer dengan Skala Produksi"
languageName ="Bahasa Indonesia"
languageNameLatinScript = "Bahasa Indonesia"
weight = 10
weight = 11
contentDir = "content/id"
languagedirection = "ltr"
@ -442,7 +445,7 @@ title = "Kubernetes"
description = "Production-Grade Container Orchestration"
languageName = "हिन्दी (Hindi)"
languageNameLatinScript = "Hindi"
weight = 11
weight = 12
contentDir = "content/hi"
languagedirection = "ltr"
@ -456,7 +459,7 @@ description = "Giải pháp điều phối container trong môi trường produc
languageName = "Tiếng Việt (Vietnamese)"
languageNameLatinScript = "Tiếng Việt"
contentDir = "content/vi"
weight = 12
weight = 13
languagedirection = "ltr"
[languages.ru]
@ -464,7 +467,7 @@ title = "Kubernetes"
description = "Первоклассная оркестрация контейнеров"
languageName = "Русский (Russian)"
languageNameLatinScript = "Russian"
weight = 12
weight = 14
contentDir = "content/ru"
languagedirection = "ltr"
@ -478,7 +481,7 @@ title = "Kubernetes"
description = "Produkcyjny system zarządzania kontenerami"
languageName = "Polski (Polish)"
languageNameLatinScript = "Polski"
weight = 13
weight = 15
contentDir = "content/pl"
languagedirection = "ltr"
@ -492,7 +495,7 @@ title = "Kubernetes"
description = "Довершена система оркестрації контейнерів"
languageName = "Українська (Ukrainian)"
languageNameLatinScript = "Ukrainian"
weight = 14
weight = 16
contentDir = "content/uk"
languagedirection = "ltr"

View File

@ -1,4 +1,4 @@
---
title: "Kubernets erweitern"
title: "Kubernetes erweitern"
weight: 110
---

View File

@ -248,7 +248,7 @@ einige Einschränkungen:
Eintrag zur Liste `metadata.finalizers` hinzugefügt werden.
- Pod-Updates dürfen keine Felder ändern, die Ausnahmen sind
`spec.containers[*].image`,
`spec.initContainers[*].image`,` spec.activeDeadlineSeconds` oder
`spec.initContainers[*].image`, `spec.activeDeadlineSeconds` oder
`spec.tolerations`. Für `spec.tolerations` kannnst du nur neue Einträge
hinzufügen.
- Für `spec.activeDeadlineSeconds` sind nur zwei Änderungen erlaubt:

View File

@ -1,6 +1,6 @@
---
content_type: concept
title: Zur Kubernets-Dokumentation beitragen
title: Zur Kubernetes-Dokumentation beitragen
linktitle: Mitmachen
main_menu: true
weight: 80

View File

@ -76,6 +76,6 @@ Um eine Pull-Anfrage zu schließen, hinterlasse einen `/close`-Kommentar zu dem
{{< note >}}
Der [`fejta-bot`](https://github.com/fejta-bot) Bot markiert Themen nach 90 Tagen Inaktivität als veraltet. Nach weiteren 30 Tagen markiert er Issues als faul und schließt sie. PR-Beauftragte sollten Themen nach 14-30 Tagen Inaktivität schließen.
Der [`k8s-ci-robot`](https://github.com/k8s-ci-robot) Bot markiert Themen nach 90 Tagen Inaktivität als veraltet. Nach weiteren 30 Tagen markiert er Issues als faul und schließt sie. PR-Beauftragte sollten Themen nach 14-30 Tagen Inaktivität schließen.
{{< /note >}}

View File

@ -322,7 +322,7 @@ Ausgabeformat | Beschreibung
### Kubectl Ausgabe Ausführlichkeit und Debugging
Die Ausführlichkeit von Kubectl wird mit den Flags `-v` oder `--v ` gesteuert, gefolgt von einer Ganzzahl, die die Protokollebene darstellt. Allgemeine Protokollierungskonventionen für Kubernetes und die zugehörigen Protokollebenen werden [hier](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md) beschrieben.
Die Ausführlichkeit von Kubectl wird mit den Flags `-v` oder `--v` gesteuert, gefolgt von einer Ganzzahl, die die Protokollebene darstellt. Allgemeine Protokollierungskonventionen für Kubernetes und die zugehörigen Protokollebenen werden [hier](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md) beschrieben.
Ausführlichkeit | Beschreibung
--------------| -----------

View File

@ -1,5 +1,6 @@
---
title: "Jobs ausführen"
description: Führen Sie Jobs mit paralleler Verarbeitung aus.
weight: 50
---

View File

@ -46,7 +46,7 @@ card:
</div>
<div id="basics-modules" class="content__modules">
<h2>Kubernets Grundlagen Module</h2>
<h2>Kubernetes Grundlagen Module</h2>
<div class="row">
<div class="col-md-12">
<div class="row">

View File

@ -1 +1 @@
Sie benötigen entweder einen dynamischen PersistentVolume-Anbieter mit einer [Standard-Speicherklasse](/docs/concepts/storage/storage-classes/), oder Sie selbst stellen statische [PersistentVolumes](/docs/user-guide/persistent-volumes/#provisioning) bereit, um die [PersistentVolumeClaims](/docs/user-guide/persistent-volumes/#persistentvolumeclaims) zu erfüllen, die hier verwendet werden.
Sie benötigen entweder einen dynamischen PersistentVolume-Anbieter mit einer [Standard-Speicherklasse](/docs/concepts/storage/storage-classes/), oder Sie selbst stellen statische [PersistentVolumes](/docs/concepts/storage/persistent-volumes/#provisioning) bereit, um die [PersistentVolumeClaims](/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) zu erfüllen, die hier verwendet werden.

View File

@ -19,7 +19,7 @@ Security:
- [Node authorizer](/docs/reference/access-authn-authz/node/) and admission control plugin are new additions that restrict kubelets access to secrets, pods and other objects based on its node.
- [Encryption for Secrets](/docs/tasks/administer-cluster/encrypt-data/), and other resources in etcd, is now available as alpha.&nbsp;
- [Kubelet TLS bootstrapping](/docs/admin/kubelet-tls-bootstrapping/) now supports client and server certificate rotation.
- [Audit logs](/docs/tasks/debug-application-cluster/audit/) stored by the API server are now more customizable and extensible with support for event filtering and webhooks. They also provide richer data for system audit.
- [Audit logs](/docs/tasks/debug/debug-cluster/audit/) stored by the API server are now more customizable and extensible with support for event filtering and webhooks. They also provide richer data for system audit.
Stateful workloads:

View File

@ -117,7 +117,7 @@ To achieve the best possible isolation, each function call would have to happen
By using Landlock, we could isolate function calls from each other within the same container, making a temporary file created by one function call inaccessible to the next function call, for example. Integration between Landlock and technologies like Kubernetes-based serverless frameworks would be a ripe area for further exploration.
## Auditing kubectl-exec with eBPF
In Kubernetes 1.7 the [audit proposal](/docs/tasks/debug-application-cluster/audit/) started making its way in. It's currently pre-stable with plans to be stable in the 1.10 release. As the name implies, it allows administrators to log and audit events that take place in a Kubernetes cluster.
In Kubernetes 1.7 the [audit proposal](/docs/tasks/debug/debug-cluster/audit/) started making its way in. It's currently pre-stable with plans to be stable in the 1.10 release. As the name implies, it allows administrators to log and audit events that take place in a Kubernetes cluster.
While these events log Kubernetes events, they don't currently provide the level of visibility that some may require. For example, while we can see that someone has used `kubectl exec` to enter a container, we are not able to see what commands were executed in that session. With eBPF one can attach a BPF program that would record any commands executed in the `kubectl exec` session and pass those commands to a user-space program that logs those events. We could then play that session back and know the exact sequence of events that took place.
## Learn more about eBPF

View File

@ -6,6 +6,8 @@ date: 2018-07-11
**Author**: Michael Taufen (Google)
**Editors note: The feature has been removed in the version 1.24 after deprecation in 1.22.**
**Editors note: this post is part of a [series of in-depth articles](https://kubernetes.io/blog/2018/06/27/kubernetes-1.11-release-announcement/) on whats new in Kubernetes 1.11**
## Why Dynamic Kubelet Configuration?

View File

@ -66,7 +66,7 @@ There are plenty of [good examples](https://docs.bitnami.com/kubernetes/how-to/c
Incorrect or excessively permissive RBAC policies are a security threat in case of a compromised pod. Maintaining least privilege, and continuously reviewing and improving RBAC rules, should be considered part of the "technical debt hygiene" that teams build into their development lifecycle.
[Audit Logging](/docs/tasks/debug-application-cluster/audit/) (beta in 1.10) provides customisable API logging at the payload (e.g. request and response), and also metadata levels. Log levels can be tuned to your organisation&#39;s security policy - [GKE](https://cloud.google.com/kubernetes-engine/docs/how-to/audit-logging#audit_policy) provides sane defaults to get you started.
[Audit Logging](/docs/tasks/debug/debug-cluster/audit/) (beta in 1.10) provides customisable API logging at the payload (e.g. request and response), and also metadata levels. Log levels can be tuned to your organisation&#39;s security policy - [GKE](https://cloud.google.com/kubernetes-engine/docs/how-to/audit-logging#audit_policy) provides sane defaults to get you started.
For read requests such as get, list, and watch, only the request object is saved in the audit logs; the response object is not. For requests involving sensitive data such as Secret and ConfigMap, only the metadata is exported. For all other requests, both request and response objects are saved in audit logs.

View File

@ -174,7 +174,7 @@ Cluster-distributed stateful services (e.g., Cassandra) can benefit from splitti
## Other considerations
[Logs](/docs/concepts/cluster-administration/logging/) and [metrics](/docs/tasks/debug-application-cluster/resource-usage-monitoring/) (if collected and persistently retained) are valuable to diagnose outages, but given the variety of technologies available it will not be addressed in this blog. If Internet connectivity is available, it may be desirable to retain logs and metrics externally at a central location.
[Logs](/docs/concepts/cluster-administration/logging/) and [metrics](/docs/tasks/debug/debug-cluster/resource-usage-monitoring/) (if collected and persistently retained) are valuable to diagnose outages, but given the variety of technologies available it will not be addressed in this blog. If Internet connectivity is available, it may be desirable to retain logs and metrics externally at a central location.
Your production deployment should utilize an automated installation, configuration and update tool (e.g., [Ansible](https://github.com/kubernetes-incubator/kubespray), [BOSH](https://github.com/cloudfoundry-incubator/kubo-deployment), [Chef](https://github.com/chef-cookbooks/kubernetes), [Juju](/docs/getting-started-guides/ubuntu/installation/), [kubeadm](/docs/reference/setup-tools/kubeadm/), [Puppet](https://forge.puppet.com/puppetlabs/kubernetes), etc.). A manual process will have repeatability issues, be labor intensive, error prone, and difficult to scale. [Certified distributions](https://www.cncf.io/certification/software-conformance/#logos) are likely to include a facility for retaining configuration settings across updates, but if you implement your own install and config toolchain, then retention, backup and recovery of the configuration artifacts is essential. Consider keeping your deployment components and settings under a version control system such as Git.

View File

@ -360,7 +360,7 @@ So let's fix the issue by installing the missing package:
sudo apt install -y conntrack
```
![minikube-install-conntrack](/images/blog/2020-05-21-wsl2-dockerdesktop-k8s/wsl2-minikube-install conntrack.png)
![minikube-install-conntrack](/images/blog/2020-05-21-wsl2-dockerdesktop-k8s/wsl2-minikube-install-conntrack.png)
Let's try to launch it again:

View File

@ -177,7 +177,7 @@ group_right() apiserver_request_total
Metrics are a fast way to check whether deprecated APIs are being used, and at what rate,
but they don't include enough information to identify particular clients or API objects.
Starting in Kubernetes v1.19, [audit events](/docs/tasks/debug-application-cluster/audit/)
Starting in Kubernetes v1.19, [audit events](/docs/tasks/debug/debug-cluster/audit/)
for requests to deprecated APIs include an audit annotation of `"k8s.io/deprecated":"true"`.
Administrators can use those audit events to identify specific clients or objects that need to be updated.

View File

@ -20,7 +20,7 @@ The paper attempts to _not_ focus on any specific [cloud native project](https:/
When using Kubernetes as a workload orchestrator, some of the security controls this version of the whitepaper recommends are:
* [Pod Security Policies](/docs/concepts/security/pod-security-policy/): Implement a single source of truth for “least privilege” workloads across the entire cluster
* [Resource requests and limits](/docs/concepts/configuration/manage-resources-containers/#requests-and-limits): Apply requests (soft constraint) and limits (hard constraint) for shared resources such as memory and CPU
* [Audit log analysis](/docs/tasks/debug-application-cluster/audit/): Enable Kubernetes API auditing and filtering for security relevant events
* [Audit log analysis](/docs/tasks/debug/debug-cluster/audit/): Enable Kubernetes API auditing and filtering for security relevant events
* [Control plane authentication and certificate root of trust](/docs/concepts/architecture/control-plane-node-communication/): Enable mutual TLS authentication with a trusted CA for communication within the cluster
* [Secrets management](/docs/concepts/configuration/secret/): Integrate with a built-in or external secrets store

View File

@ -14,7 +14,7 @@ on the deprecation of Docker as a container runtime for Kubernetes kubelets, and
what that means, check out the blog post
[Don't Panic: Kubernetes and Docker](/blog/2020/12/02/dont-panic-kubernetes-and-docker/).
Also, you can read [check whether Dockershim deprecation affects you](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-deprecation-affects-you/) to check whether it does.
Also, you can read [check whether Dockershim removal affects you](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-removal-affects-you/) to check whether it does.
### Why is dockershim being deprecated?
@ -155,7 +155,7 @@ runtime where possible.
Another thing to look out for is anything expecting to run for system maintenance
or nested inside a container when building images will no longer work. For the
former, you can use the [`crictl`][cr] tool as a drop-in replacement (see [mapping from docker cli to crictl](https://kubernetes.io/docs/tasks/debug-application-cluster/crictl/#mapping-from-docker-cli-to-crictl)) and for the
former, you can use the [`crictl`][cr] tool as a drop-in replacement (see [mapping from dockercli to crictl](/docs/reference/tools/map-crictl-dockercli/)) and for the
latter you can use newer container build options like [img], [buildah],
[kaniko], or [buildkit-cli-for-kubectl] that dont require Docker.

View File

@ -3,13 +3,17 @@ layout: blog
title: "Don't Panic: Kubernetes and Docker"
date: 2020-12-02
slug: dont-panic-kubernetes-and-docker
evergreen: true
---
**Update:** _Kubernetes support for Docker via `dockershim` is now removed.
For more information, read the [removal FAQ](/dockershim).
You can also discuss the deprecation via a dedicated [GitHub issue](https://github.com/kubernetes/kubernetes/issues/106917)._
---
**Authors:** Jorge Castro, Duffie Cooley, Kat Cosgrove, Justin Garrison, Noah Kantrowitz, Bob Killen, Rey Lejano, Dan “POP” Papandrea, Jeffrey Sica, Davanum “Dims” Srinivas
_Update: Kubernetes support for Docker via `dockershim` is now deprecated.
For more information, read the [deprecation notice](/blog/2020/12/08/kubernetes-1-20-release-announcement/#dockershim-deprecation).
You can also discuss the deprecation via a dedicated [GitHub issue](https://github.com/kubernetes/kubernetes/issues/106917)._
Kubernetes is [deprecating
Docker](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#deprecation)
@ -28,7 +32,7 @@ shouldnt, use Docker as a development tool anymore. Docker is still a useful
tool for building containers, and the images that result from running `docker
build` can still run in your Kubernetes cluster.
If youre using a managed Kubernetes service like GKE, EKS, or AKS (which [defaults to containerd](https://github.com/Azure/AKS/releases/tag/2020-11-16)) you will need to
If youre using a managed Kubernetes service like AKS, EkS or GKE, you will need to
make sure your worker nodes are using a supported container runtime before
Docker support is removed in a future version of Kubernetes. If you have node
customizations you may need to update them based on your environment and runtime
@ -37,8 +41,8 @@ testing and planning.
If youre rolling your own clusters, you will also need to make changes to avoid
your clusters breaking. At v1.20, you will get a deprecation warning for Docker.
When Docker runtime support is removed in a future release (currently planned
for the 1.22 release in late 2021) of Kubernetes it will no longer be supported
When Docker runtime support is removed in a future release (<del>currently planned
for the 1.22 release in late 2021</del>) of Kubernetes it will no longer be supported
and you will need to switch to one of the other compliant container runtimes,
like containerd or CRI-O. Just make sure that the runtime you choose supports
the docker daemon configurations you currently use (e.g. logging).

View File

@ -32,7 +32,7 @@ The `kubectl alpha debug` features graduates to beta in 1.20, becoming `kubectl
Note that as a new built-in command, `kubectl debug` takes priority over any kubectl plugin named “debug”. You must rename the affected plugin.
Invocations using `kubectl alpha debug` are now deprecated and will be removed in a subsequent release. Update your scripts to use `kubectl debug`. For more information about `kubectl debug`, see [Debugging Running Pods](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-running-pod/).
Invocations using `kubectl alpha debug` are now deprecated and will be removed in a subsequent release. Update your scripts to use `kubectl debug`. For more information about `kubectl debug`, see [Debugging Running Pods](https://kubernetes.io/docs/tasks/debug/debug-application/debug-running-pod/).
### Beta: API Priority and Fairness

View File

@ -62,7 +62,7 @@ spec:
Note that completion mode is an alpha feature in the 1.21 release. To be able to
use it in your cluster, make sure to enable the `IndexedJob` [feature
gate](/docs/reference/command-line-tools-reference/feature-gates/) on the
[API server](docs/reference/command-line-tools-reference/kube-apiserver/) and
[API server](/docs/reference/command-line-tools-reference/kube-apiserver/) and
the [controller manager](/docs/reference/command-line-tools-reference/kube-controller-manager/).
When you run the example, you will see that each of the three created Pods gets a

View File

@ -108,7 +108,7 @@ metadata:
uid: 93a37fed-23e3-45e8-b6ee-b2521db81638
```
In short, whats happened is that the object was updated, not deleted. Thats because Kubernetes saw that the object contained finalizers and put it into a read-only state. The deletion timestamp signals that the object can only be read, with the exception of removing the finalizer key updates. In other words, the deletion will not be complete until we edit the object and remove the finalizer.
In short, whats happened is that the object was updated, not deleted. Thats because Kubernetes saw that the object contained finalizers and blocked removal of the object from etcd. The deletion timestamp signals that deletion was requested, but the deletion will not be complete until we edit the object and remove the finalizer.
Here's a demonstration of using the `patch` command to remove finalizers. If we want to delete an object, we can simply patch it on the command line to remove the finalizers. In this way, the deletion that was running in the background will complete and the object will be deleted. When we attempt to `get` that configmap, it will be gone.

View File

@ -255,7 +255,7 @@ The minimum required versions are:
## Whats next?
As part of the beta graduation for this feature, SIG Storage plans to update the Kubenetes scheduler to support pod preemption in relation to ReadWriteOncePod storage.
As part of the beta graduation for this feature, SIG Storage plans to update the Kubernetes scheduler to support pod preemption in relation to ReadWriteOncePod storage.
This means if two pods request a PersistentVolumeClaim with ReadWriteOncePod, the pod with highest priority will gain access to the PersistentVolumeClaim and any pod with lower priority will be preempted from the node and be unable to access the PersistentVolumeClaim.
## How can I learn more?

View File

@ -317,7 +317,7 @@ RequestResponse's including metadata and request / response bodies. While helpfu
Each organization needs to evaluate their
own threat model and build an audit policy that complements or helps troubleshooting incident response. Think
about how someone would attack your organization and what audit trail could identify it. Review more advanced options for tuning audit logs in the official [audit logging documentation](/docs/tasks/debug-application-cluster/audit/#audit-policy).
about how someone would attack your organization and what audit trail could identify it. Review more advanced options for tuning audit logs in the official [audit logging documentation](/docs/tasks/debug/debug-cluster/audit/#audit-policy).
It's crucial to tune your audit logs to only include events that meet your threat model. A minimal audit policy that logs everything at `metadata` level can also be a good starting point.
Audit logging configurations can also be tested with

View File

@ -8,8 +8,8 @@ slug: kubernetes-1-23-statefulset-pvc-auto-deletion
**Author:** Matthew Cary (Google)
Kubernetes v1.23 introduced a new, alpha-level policy for
[StatefulSets](docs/concepts/workloads/controllers/statefulset/) that controls the lifetime of
[PersistentVolumeClaims](docs/concepts/storage/persistent-volumes/) (PVCs) generated from the
[StatefulSets](/docs/concepts/workloads/controllers/statefulset/) that controls the lifetime of
[PersistentVolumeClaims](/docs/concepts/storage/persistent-volumes/) (PVCs) generated from the
StatefulSet spec template for cases when they should be deleted automatically when the StatefulSet
is deleted or pods in the StatefulSet are scaled down.
@ -82,7 +82,7 @@ This policy forms a matrix with four cases. Ill walk through and give an exam
new replicas will automatically use them.
Visit the
[documentation](docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-policies) to
[documentation](/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-policies) to
see all the details.
## Whats next?

View File

@ -12,7 +12,7 @@ to reaffirm our community values by supporting open source container runtimes,
enabling a smaller kubelet, and increasing engineering velocity for teams using
Kubernetes. If you [use Docker Engine as a container runtime](/docs/tasks/administer-cluster/migrating-from-dockershim/find-out-runtime-you-use/)
for your Kubernetes cluster, get ready to migrate in 1.24! To check if you're
affected, refer to [Check whether dockershim deprecation affects you](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-deprecation-affects-you/).
affected, refer to [Check whether dockershim removal affects you](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-removal-affects-you/).
## Why were moving away from dockershim

View File

@ -7,31 +7,37 @@ slug: dockershim-faq
aliases: [ '/dockershim' ]
---
**This is an update to the original [Dockershim Deprecation FAQ](/blog/2020/12/02/dockershim-faq/) article,
published in late 2020.**
**This supersedes the original
[Dockershim Deprecation FAQ](/blog/2020/12/02/dockershim-faq/) article,
published in late 2020. The article includes updates from the v1.24
release of Kubernetes.**
---
This document goes over some frequently asked questions regarding the
deprecation and removal of _dockershim_, that was
removal of _dockershim_ from Kubernetes. The removal was originally
[announced](/blog/2020/12/08/kubernetes-1-20-release-announcement/)
as a part of the Kubernetes v1.20 release. For more detail
on what that means, check out the blog post
as a part of the Kubernetes v1.20 release. The Kubernetes
[v1.24 release](/releases/#release-v1-24) actually removed the dockershim
from Kubernetes.
For more on what that means, check out the blog post
[Don't Panic: Kubernetes and Docker](/blog/2020/12/02/dont-panic-kubernetes-and-docker/).
Also, you can read [check whether dockershim removal affects you](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-deprecation-affects-you/)
to determine how much impact the removal of dockershim would have for you
or for your organization.
To determine the impact that the removal of dockershim would have for you or your organization,
you can read [Check whether dockershim removal affects you](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-removal-affects-you/).
As the Kubernetes 1.24 release has become imminent, we've been working hard to try to make this a smooth transition.
In the months and days leading up to the Kubernetes 1.24 release, Kubernetes contributors worked hard to try to make this a smooth transition.
- We've written a blog post detailing our [commitment and next steps](/blog/2022/01/07/kubernetes-is-moving-on-from-dockershim/).
- We believe there are no major blockers to migration to [other container runtimes](/docs/setup/production-environment/container-runtimes/#container-runtimes).
- There is also a [Migrating from dockershim](/docs/tasks/administer-cluster/migrating-from-dockershim/) guide available.
- We've also created a page to list
- A blog post detailing our [commitment and next steps](/blog/2022/01/07/kubernetes-is-moving-on-from-dockershim/).
- Checking if there were major blockers to migration to [other container runtimes](/docs/setup/production-environment/container-runtimes/#container-runtimes).
- Adding a [migrating from dockershim](/docs/tasks/administer-cluster/migrating-from-dockershim/) guide.
- Creating a list of
[articles on dockershim removal and on using CRI-compatible runtimes](/docs/reference/node/topics-on-dockershim-and-cri-compatible-runtimes/).
That list includes some of the already mentioned docs, and also covers selected external sources
(including vendor guides).
### Why is the dockershim being removed from Kubernetes?
### Why was the dockershim removed from Kubernetes?
Early versions of Kubernetes only worked with a specific container runtime:
Docker Engine. Later, Kubernetes added support for working with other container runtimes.
@ -49,36 +55,18 @@ In fact, maintaining dockershim had become a heavy burden on the Kubernetes main
Additionally, features that were largely incompatible with the dockershim, such
as cgroups v2 and user namespaces are being implemented in these newer CRI
runtimes. Removing support for the dockershim will allow further development in
those areas.
runtimes. Removing the dockershim from Kubernetes allows further development in those areas.
[drkep]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2221-remove-dockershim
### Can I still use Docker Engine in Kubernetes 1.23?
### Are Docker and containers the same thing?
Yes, the only thing changed in 1.20 is a single warning log printed at [kubelet]
startup if using Docker Engine as the runtime. You'll see this warning in all versions up to 1.23. The dockershim removal occurs in Kubernetes 1.24.
[kubelet]: /docs/reference/command-line-tools-reference/kubelet/
### When will dockershim be removed?
Given the impact of this change, we are using an extended deprecation timeline.
Removal of dockershim is scheduled for Kubernetes v1.24, see [Dockershim Removal Kubernetes Enhancement Proposal][drkep].
The Kubernetes project will be working closely with vendors and other ecosystem groups to ensure
a smooth transition and will evaluate things as the situation evolves.
### Can I still use Docker Engine as my container runtime?
First off, if you use Docker on your own PC to develop or test containers: nothing changes.
You can still use Docker locally no matter what container runtime(s) you use for your
Kubernetes clusters. Containers make this kind of interoperability possible.
Mirantis and Docker have [committed][mirantis] to maintaining a replacement adapter for
Docker Engine, and to maintain that adapter even after the in-tree dockershim is removed
from Kubernetes. The replacement adapter is named [`cri-dockerd`](https://github.com/Mirantis/cri-dockerd).
[mirantis]: https://www.mirantis.com/blog/mirantis-to-take-over-support-of-kubernetes-dockershim-2/
Docker popularized the Linux containers pattern and has been instrumental in
developing the underlying technology, however containers in Linux have existed
for a long time. The container ecosystem has grown to be much broader than just
Docker. Standards like OCI and CRI have helped many tools grow and thrive in our
ecosystem, some replacing aspects of Docker while others enhance existing
functionality.
### Will my existing container images still work?
@ -90,14 +78,41 @@ All your existing images will still work exactly the same.
Yes. All CRI runtimes support the same pull secrets configuration used in
Kubernetes, either via the PodSpec or ServiceAccount.
### Are Docker and containers the same thing?
### Can I still use Docker Engine in Kubernetes 1.23?
Docker popularized the Linux containers pattern and has been instrumental in
developing the underlying technology, however containers in Linux have existed
for a long time. The container ecosystem has grown to be much broader than just
Docker. Standards like OCI and CRI have helped many tools grow and thrive in our
ecosystem, some replacing aspects of Docker while others enhance existing
functionality.
Yes, the only thing changed in 1.20 is a single warning log printed at [kubelet]
startup if using Docker Engine as the runtime. You'll see this warning in all versions up to 1.23. The dockershim removal occurred
in Kubernetes 1.24.
If you're running Kubernetes v1.24 or later, see [Can I still use Docker Engine as my container runtime?](#can-i-still-use-docker-engine-as-my-container-runtime).
(Remember, you can switch away from the dockershim if you're using any supported Kubernetes release; from release v1.24, you
**must** switch as Kubernetes no longer incluides the dockershim).
[kubelet]: /docs/reference/command-line-tools-reference/kubelet/
### Which CRI implementation should I use?
Thats a complex question and it depends on a lot of factors. If Docker Engine is
working for you, moving to containerd should be a relatively easy swap and
will have strictly better performance and less overhead. However, we encourage you
to explore all the options from the [CNCF landscape] in case another would be an
even better fit for your environment.
[CNCF landscape]: https://landscape.cncf.io/card-mode?category=container-runtime&grouping=category
#### Can I still use Docker Engine as my container runtime?
First off, if you use Docker on your own PC to develop or test containers: nothing changes.
You can still use Docker locally no matter what container runtime(s) you use for your
Kubernetes clusters. Containers make this kind of interoperability possible.
Mirantis and Docker have [committed][mirantis] to maintaining a replacement adapter for
Docker Engine, and to maintain that adapter even after the in-tree dockershim is removed
from Kubernetes. The replacement adapter is named [`cri-dockerd`](https://github.com/Mirantis/cri-dockerd).
You can install `cri-dockerd` and use it to connect the kubelet to Docker Engine. Read [Migrate Docker Engine nodes from dockershim to cri-dockerd](/docs/tasks/administer-cluster/migrating-from-dockershim/migrate-dockershim-dockerd/) to learn more.
[mirantis]: https://www.mirantis.com/blog/mirantis-to-take-over-support-of-kubernetes-dockershim-2/
### Are there examples of folks using other runtimes in production today?
@ -135,16 +150,6 @@ provide an end-to-end standard for managing containers.
[runc]: https://github.com/opencontainers/runc
[containerd]: https://containerd.io/
### Which CRI implementation should I use?
Thats a complex question and it depends on a lot of factors. If Docker is
working for you, moving to containerd should be a relatively easy swap and
will have strictly better performance and less overhead. However, we encourage you
to explore all the options from the [CNCF landscape] in case another would be an
even better fit for your environment.
[CNCF landscape]: https://landscape.cncf.io/card-mode?category=container-runtime&grouping=category
### What should I look out for when changing CRI implementations?
While the underlying containerization code is the same between Docker and most
@ -153,24 +158,25 @@ common things to consider when migrating are:
- Logging configuration
- Runtime resource limitations
- Node provisioning scripts that call docker or use docker via it's control socket
- Kubectl plugins that require docker CLI or the control socket
- Node provisioning scripts that call docker or use Docker Engine via its control socket
- Plugins for `kubectl` that require the `docker` CLI or the Docker Engine control socket
- Tools from the Kubernetes project that require direct access to Docker Engine
(for example: the deprecated `kube-imagepuller` tool)
- Configuration of functionality like `registry-mirrors` and insecure registries
- Configuration of functionality like `registry-mirrors` and insecure registries
- Other support scripts or daemons that expect Docker Engine to be available and are run
outside of Kubernetes (for example, monitoring or security agents)
- GPUs or special hardware and how they integrate with your runtime and Kubernetes
If you use Kubernetes resource requests/limits or file-based log collection
DaemonSets then they will continue to work the same, but if youve customized
DaemonSets then they will continue to work the same, but if you've customized
your `dockerd` configuration, youll need to adapt that for your new container
runtime where possible.
Another thing to look out for is anything expecting to run for system maintenance
or nested inside a container when building images will no longer work. For the
former, you can use the [`crictl`][cr] tool as a drop-in replacement (see [mapping from docker cli to crictl](https://kubernetes.io/docs/tasks/debug-application-cluster/crictl/#mapping-from-docker-cli-to-crictl)) and for the
latter you can use newer container build options like [img], [buildah],
former, you can use the [`crictl`][cr] tool as a drop-in replacement (see
[mapping from docker cli to crictl](https://kubernetes.io/docs/tasks/debug/debug-cluster/crictl/#mapping-from-docker-cli-to-crictl))
and for the latter you can use newer container build options like [img], [buildah],
[kaniko], or [buildkit-cli-for-kubectl] that dont require Docker.
[cr]: https://github.com/kubernetes-sigs/cri-tools
@ -204,7 +210,7 @@ discussion of the changes.
[dep]: https://dev.to/inductor/wait-docker-is-deprecated-in-kubernetes-now-what-do-i-do-e4m
### Is there any tooling that can help me find dockershim in use
### Is there any tooling that can help me find dockershim in use?
Yes! The [Detector for Docker Socket (DDS)][dds] is a kubectl plugin that you can
install and then use to check your cluster. DDS can detect if active Kubernetes workloads

View File

@ -12,7 +12,7 @@ Way back in December of 2020, Kubernetes announced the [deprecation of Dockershi
## First, does this even affect you?
If you are rolling your own cluster or are otherwise unsure whether or not this removal affects you, stay on the safe side and [check to see if you have any dependencies on Docker Engine](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-deprecation-affects-you/). Please note that using Docker Desktop to build your application containers is not a Docker dependency for your cluster. Container images created by Docker are compliant with the [Open Container Initiative (OCI)](https://opencontainers.org/), a Linux Foundation governance structure that defines industry standards around container formats and runtimes. They will work just fine on any container runtime supported by Kubernetes.
If you are rolling your own cluster or are otherwise unsure whether or not this removal affects you, stay on the safe side and [check to see if you have any dependencies on Docker Engine](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-removal-affects-you/). Please note that using Docker Desktop to build your application containers is not a Docker dependency for your cluster. Container images created by Docker are compliant with the [Open Container Initiative (OCI)](https://opencontainers.org/), a Linux Foundation governance structure that defines industry standards around container formats and runtimes. They will work just fine on any container runtime supported by Kubernetes.
If you are using a managed Kubernetes service from a cloud provider, and you havent explicitly changed the container runtime, there may be nothing else for you to do. Amazon EKS, Azure AKS, and Google GKE all default to containerd now, though you should make sure they do not need updating if you have any node customizations. To check the runtime of your nodes, follow [Find Out What Container Runtime is Used on a Node](/docs/tasks/administer-cluster/migrating-from-dockershim/find-out-runtime-you-use/).

View File

@ -0,0 +1,155 @@
---
layout: blog
title: 'Increasing the security bar in Ingress-NGINX v1.2.0'
date: 2022-04-28
slug: ingress-nginx-1-2-0
---
**Authors:** Ricardo Katz (VMware), James Strong (Chainguard)
The [Ingress](/docs/concepts/services-networking/ingress/) may be one of the most targeted components
of Kubernetes. An Ingress typically defines an HTTP reverse proxy, exposed to the Internet, containing
multiple websites, and with some privileged access to Kubernetes API (such as to read Secrets relating to
TLS certificates and their private keys).
While it is a risky component in your architecture, it is still the most popular way to properly expose your services.
Ingress-NGINX has been part of security assessments that figured out we have a big problem: we don't
do all proper sanitization before turning the configuration into an `nginx.conf` file, which may lead to information
disclosure risks.
While we understand this risk and the real need to fix this, it's not an easy process to do, so we took another approach to reduce (but not remove!) this risk in the current (v1.2.0) release.
## Meet Ingress NGINX v1.2.0 and the chrooted NGINX process
One of the main challenges is that Ingress-NGINX runs the web proxy server (NGINX) alongside the Ingress
controller (the component that has access to Kubernetes API that and that creates the `nginx.conf` file).
So, NGINX does have the same access to the filesystem of the controller (and Kubernetes service account token, and other configurations from the container). While splitting those components is our end goal, the project needed a fast response; that lead us to the idea of using `chroot()`.
Let's take a look into what an Ingress-NGINX container looked like before this change:
![Ingress NGINX pre chroot](ingress-pre-chroot.png)
As we can see, the same container (not the Pod, the container!) that provides HTTP Proxy is the one that watches Ingress objects and writes the Container Volume
Now, meet the new architecture:
![Ingress NGINX post chroot](ingress-post-chroot.png)
What does all of this mean? A basic summary is: that we are isolating the NGINX service as a container inside the
controller container.
While this is not strictly true, to understand what was done here, it's good to understand how
Linux containers (and underlying mechanisms such as kernel namespaces) work.
You can read about cgroups in the Kubernetes glossary: [`cgroup`](https://kubernetes.io/docs/reference/glossary/?fundamental=true#term-cgroup) and learn more about cgroups interact with namespaces in the NGINX project article
[What Are Namespaces and cgroups, and How Do They Work?](https://www.nginx.com/blog/what-are-namespaces-cgroups-how-do-they-work/).
(As you read that, bear in mind that Linux kernel namespaces are a different thing from
[Kubernetes namespaces](/docs/concepts/overview/working-with-objects/namespaces/)).
## Skip the talk, what do I need to use this new approach?
While this increases the security, we made this feature an opt-in in this release so you can have
time to make the right adjustments in your environment(s). This new feature is only available from
release v1.2.0 of the Ingress-NGINX controller.
There are two required changes in your deployments to use this feature:
* Append the suffix "-chroot" to the container image name. For example: `gcr.io/k8s-staging-ingress-nginx/controller-chroot:v1.2.0`
* In your Pod template for the Ingress controller, find where you add the capability `NET_BIND_SERVICE` and add the capability `SYS_CHROOT`. After you edit the manifest, you'll see a snippet like:
```yaml
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
- SYS_CHROOT
```
If you deploy the controller using the official Helm chart then change the following setting in
`values.yaml`:
```yaml
controller:
image:
chroot: true
```
Ingress controllers are normally set up cluster-wide (the IngressClass API is cluster scoped). If you manage the
Ingress-NGINX controller but you're not the overall cluster operator, then check with your cluster admin about
whether you can use the `SYS_CHROOT` capability, **before** you enable it in your deployment.
## OK, but how does this increase the security of my Ingress controller?
Take the following configuration snippet and imagine, for some reason it was added to your `nginx.conf`:
```
location /randomthing/ {
alias /;
autoindex on;
}
```
If you deploy this configuration, someone can call `http://website.example/randomthing` and get some listing (and access) to the whole filesystem of the Ingress controller.
Now, can you spot the difference between chrooted and non chrooted Nginx on the listings below?
| Without extra `chroot()` | With extra `chroot()` |
|----------------------------|--------|
| `bin` | `bin` |
| `dev` | `dev` |
| `etc` | `etc` |
| `home` | |
| `lib` | `lib` |
| `media` | |
| `mnt` | |
| `opt` | `opt` |
| `proc` | `proc` |
| `root` | |
| `run` | `run` |
| `sbin` | |
| `srv` | |
| `sys` | |
| `tmp` | `tmp` |
| `usr` | `usr` |
| `var` | `var` |
| `dbg` | |
| `nginx-ingress-controller` | |
| `wait-shutdown` | |
The one in left side is not chrooted. So NGINX has full access to the filesystem. The one in right side is chrooted, so a new filesystem with only the required files to make NGINX work is created.
## What about other security improvements in this release?
We know that the new `chroot()` mechanism helps address some portion of the risk, but still, someone
can try to inject commands to read, for example, the `nginx.conf` file and extract sensitive information.
So, another change in this release (this is opt-out!) is the _deep inspector_.
We know that some directives or regular expressions may be dangerous to NGINX, so the deep inspector
checks all fields from an Ingress object (during its reconciliation, and also with a
[validating admission webhook](/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook))
to verify if any fields contains these dangerous directives.
The ingress controller already does this for annotations, and our goal is to move this existing validation to happen inside
deep inspection as part of a future release.
You can take a look into the existing rules in [https://github.com/kubernetes/ingress-nginx/blob/main/internal/ingress/inspector/rules.go](https://github.com/kubernetes/ingress-nginx/blob/main/internal/ingress/inspector/rules.go).
Due to the nature of inspecting and matching all strings within relevant Ingress objects, this new feature may consume a bit more CPU. You can disable it by running the ingress controller with the command line argument `--deep-inspect=false`.
## What's next?
This is not our final goal. Our final goal is to split the control plane and the data plane processes.
In fact, doing so will help us also achieve a [Gateway](https://gateway-api.sigs.k8s.io/) API implementation,
as we may have a different controller as soon as it "knows" what to provide to the data plane
(we need some help here!!)
Some other projects in Kubernetes already take this approach
(like [KPNG](https://github.com/kubernetes-sigs/kpng), the proposed replacement for `kube-proxy`),
and we plan to align with them and get the same experience for Ingress-NGINX.
## Further reading
If you want to take a look into how chrooting was done in Ingress NGINX, take a look
into [https://github.com/kubernetes/ingress-nginx/pull/8337](https://github.com/kubernetes/ingress-nginx/pull/8337)
The release v1.2.0 containing all the changes can be found at
[https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.2.0](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.2.0)

Binary file not shown.

After

Width:  |  Height:  |  Size: 59 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 51 KiB

View File

@ -0,0 +1,319 @@
---
layout: blog
title: "Frontiers, fsGroups and frogs: the Kubernetes 1.23 release interview"
date: 2022-04-29
---
**Author**: Craig Box (Google)
One of the highlights of hosting the weekly [Kubernetes Podcast from Google](https://kubernetespodcast.com/) is talking to the release managers for each new Kubernetes version. The release team is constantly refreshing. Many working their way from small documentation fixes, step up to shadow roles, and then eventually lead a release.
As we prepare for the 1.24 release next week, [in accordance with long-standing tradition](https://www.google.com/search?q=%22release+interview%22+site%3Akubernetes.io%2Fblog), I'm pleased to bring you a look back at the story of 1.23. The release was led by [Rey Lejano](https://twitter.com/reylejano), a Field Engineer at SUSE. [I spoke to Rey](https://kubernetespodcast.com/episode/167-kubernetes-1.23/) in December, as he was awaiting the birth of his first child.
Make sure you [subscribe, wherever you get your podcasts](https://kubernetespodcast.com/subscribe/), so you hear all our stories from the Cloud Native community, including the story of 1.24 next week.
*This transcript has been lightly edited and condensed for clarity.*
---
**CRAIG BOX: I'd like to start with what is, of course, on top of everyone's mind at the moment. Let's talk African clawed frogs!**
REY LEJANO: [CHUCKLES] Oh, you mean [Xenopus lavis](https://en.wikipedia.org/wiki/African_clawed_frog), the scientific name for the African clawed frog?
**CRAIG BOX: Of course.**
REY LEJANO: Not many people know, but my background and my degree is actually in microbiology, from the University of California Davis. I did some research for about four years in biochemistry, in a biochemistry lab, and I [do have a research paper published](https://www.sciencedirect.com/science/article/pii/). It's actually on glycoproteins, particularly something called "cortical granule lectin". We used frogs, because they generate lots and lots of eggs, from which we can extract the protein. That protein prevents polyspermy. When the sperm goes into the egg, the egg releases a glycoprotein, cortical granule lectin, to the membrane, and prevents any other sperm from going inside the egg.
**CRAIG BOX: Were you able to take anything from the testing that we did on frogs and generalize that to higher-order mammals, perhaps?**
REY LEJANO: Yes. Since mammals also have cortical granule lectin, we were able to analyze both the convergence and the evolutionary pattern, not just from multiple species of frogs, but also into mammals as well.
**CRAIG BOX: Now, there's a couple of different threads to unravel here. When you were young, what led you into the fields of biology, and perhaps more the technical side of it?**
REY LEJANO: I think it was mostly from family, since I do have a family history in the medical field that goes back generations. So I kind of felt like that was the natural path going into college.
**CRAIG BOX: Now, of course, you're working in a more abstract tech field. What led you out of microbiology?**
REY LEJANO: [CHUCKLES] Well, I've always been interested in tech. Taught myself a little programming when I was younger, before high school, did some web dev stuff. Just kind of got burnt out being in a lab. I was literally in the basement. I had a great opportunity to join a consultancy that specialized in [ITIL](https://www.axelos.com/certifications/itil-service-management/what-is-itil). I actually started off with application performance management, went into monitoring, went into operation management and also ITIL, which is aligning your IT asset management and service managements with business services. Did that for a good number of years, actually.
**CRAIG BOX: It's very interesting, as people describe the things that they went through and perhaps the technologies that they worked on, you can pretty much pinpoint how old they might be. There's a lot of people who come into tech these days that have never heard of ITIL. They have no idea what it is. It's basically just SRE with more process.**
REY LEJANO: Yes, absolutely. It's not very cloud native. [CHUCKLES]
**CRAIG BOX: Not at all.**
REY LEJANO: You don't really hear about it in the cloud native landscape. Definitely, you can tell someone's been in the field for a little bit, if they specialize or have worked with ITIL before.
**CRAIG BOX: You mentioned that you wanted to get out of the basement. That is quite often where people put the programmers. Did they just give you a bit of light in the new basement?**
REY LEJANO: [LAUGHS] They did give us much better lighting. Able to get some vitamin D sometimes, as well.
**CRAIG BOX: To wrap up the discussion about your previous career — over the course of the last year, with all of the things that have happened in the world, I could imagine that microbiology skills may be more in demand than perhaps they were when you studied them?**
REY LEJANO: Oh, absolutely. I could definitely see a big increase of numbers of people going into the field. Also, reading what's going on with the world currently kind of brings back all the education I've learned in the past, as well.
**CRAIG BOX: Do you keep in touch with people you went through school with?**
REY LEJANO: Just some close friends, but not in the microbiology field.
**CRAIG BOX: One thing that I think will probably happen as a result of the pandemic is a renewed interest in some of these STEM fields. It will be interesting to see what impact that has on society at large.**
REY LEJANO: Yeah. I think that'll be great.
**CRAIG BOX: You mentioned working at a consultancy doing IT management, application performance monitoring, and so on. When did Kubernetes come into your professional life?**
REY LEJANO: One of my good friends at the company I worked at, left in mid-2015. He went on to a company that was pretty heavily into Docker. He taught me a little bit. I did my first "docker run" around 2015, maybe 2016. Then, one of the applications we were using for the ITIL framework was containerized around 2018 or so, also in Kubernetes. At that time, it was pretty buggy. That was my initial introduction to Kubernetes and containerised applications.
Then I left that company, and I actually joined my friend over at [RX-M](https://rx-m.com/), which is a cloud native consultancy and training firm. They specialize in Docker and Kubernetes. I was able to get my feet wet. I got my CKD, got my CKA as well. And they were really, really great at encouraging us to learn more about Kubernetes and also to be involved in the community.
**CRAIG BOX: You will have seen, then, the life cycle of people adopting Kubernetes and containerization at large, through your own initial journey and then through helping customers. How would you characterize how that journey has changed from the early days to perhaps today?**
REY LEJANO: I think the early days, there was a lot of questions of, why do I have to containerize? Why can't I just stay with virtual machines?
**CRAIG BOX: It's a line item on your CV.**
REY LEJANO: [CHUCKLES] It is. And nowadays, I think people know the value of using containers, of orchestrating containers with Kubernetes. I don't want to say "jumping on the bandwagon", but it's become the de-facto standard to orchestrate containers.
**CRAIG BOX: It's not something that a consultancy needs to go out and pitch to customers that they should be doing. They're just taking it as, that will happen, and starting a bit further down the path, perhaps.**
REY LEJANO: Absolutely.
**CRAIG BOX: Working at a consultancy like that, how much time do you get to work on improving process, perhaps for multiple customers, and then looking at how you can upstream that work, versus paid work that you do for just an individual customer at a time?**
REY LEJANO: Back then, it would vary. They helped me introduce myself, and I learned a lot about the cloud native landscape and Kubernetes itself. They helped educate me as to how the cloud native landscape, and the tools around it, can be used together. My boss at that company, Randy, he actually encouraged us to start contributing upstream, and encouraged me to join the release team. He just said, this is a great opportunity. Definitely helped me with starting with the contributions early on.
**CRAIG BOX: Was the release team the way that you got involved with upstream Kubernetes contribution?**
REY LEJANO: Actually, no. My first contribution was with SIG Docs. I met Taylor Dolezal — he was the release team lead for 1.19, but he is involved with SIG Docs as well. I met him at KubeCon 2019, I sat at his table during a luncheon. I remember Paris Pittman was hosting this luncheon at the Marriott. Taylor says he was involved with SIG Docs. He encouraged me to join. I started joining into meetings, started doing a few drive-by PRs. That's what we call them — drive-by — little typo fixes. Then did a little bit more, started to send better or higher quality pull requests, and also reviewing PRs.
**CRAIG BOX: When did you first formally take your release team role?**
REY LEJANO: That was in [1.18](https://github.com/kubernetes/sig-release/blob/master/releases/release-1.18/release_team.md), in December. My boss at the time encouraged me to apply. I did, was lucky enough to get accepted for the release notes shadow. Then from there, stayed in with release notes for a few cycles, then went into Docs, naturally then led Docs, then went to Enhancements, and now I'm the release lead for 1.23.
**CRAIG BOX: I don't know that a lot of people think about what goes into a good release note. What would you say does?**
REY LEJANO: [CHUCKLES] You have to tell the end user what has changed or what effect that they might see in the release notes. It doesn't have to be highly technical. It could just be a few lines, and just saying what has changed, what they have to do if they have to do anything as well.
**CRAIG BOX: As you moved through the process of shadowing, how did you learn from the people who were leading those roles?**
REY LEJANO: I said this a few times when I was the release lead for this cycle. You get out of the release team as much as you put in, or it directly aligns to how much you put in. I learned a lot. I went into the release team having that mindset of learning from the role leads, learning from the other shadows, as well. That's actually a saying that my first role lead told me. I still carry it to heart, and that was back in 1.18. That was Eddie, in the very first meeting we had, and I still carry it to heart.
**CRAIG BOX: You, of course, were [the release lead for 1.23](https://github.com/kubernetes/sig-release/tree/master/releases/release-1.23). First of all, congratulations on the release.**
REY LEJANO: Thank you very much.
**CRAIG BOX: The theme for this release is [The Next Frontier](https://kubernetes.io/blog/2021/12/07/kubernetes-1-23-release-announcement/). Tell me the story of how we came to the theme and then the logo.**
REY LEJANO: The Next Frontier represents a few things. It not only represents the next enhancements in this release, but Kubernetes itself also has a history of Star Trek references. The original codename for Kubernetes was Project Seven, a reference to Seven of Nine, originally from Star Trek Voyager. Also the seven spokes in the helm in the logo of Kubernetes as well. And, of course, Borg, the predecessor to Kubernetes.
The Next Frontier continues that Star Trek reference. It's a fusion of two titles in the Star Trek universe. One is [Star Trek V, the Final Frontier](https://en.wikipedia.org/wiki/Star_Trek_V:_The_Final_Frontier), and the Star Trek: The Next Generation.
**CRAIG BOX: Do you have any opinion on the fact that Star Trek V was an odd-numbered movie, and they are [canonically referred to as being lesser than the even-numbered ones](https://screenrant.com/star-trek-movies-odd-number-curse-explained/)?**
REY LEJANO: I can't say, because I am such a sci-fi nerd that I love all of them even though they're bad. Even the post-Next Generation movies, after the series, I still liked all of them, even though I know some weren't that great.
**CRAIG BOX: Am I right in remembering that Star Trek V was the one directed by William Shatner?**
REY LEJANO: Yes, that is correct.
**CRAIG BOX: I think that says it all.**
REY LEJANO: [CHUCKLES] Yes.
**CRAIG BOX: Now, I understand that the theme comes from a part of the [SIG Release charter](https://github.com/kubernetes/community/blob/master/sig-release/charter.md)?**
REY LEJANO: Yes. There's a line in the SIG Release charter, "ensure there is a consistent group of community members in place to support the release process across time." With the release team, we have new shadows that join every single release cycle. With this, we're growing with this community. We're growing the release team members. We're growing SIG Release. We're growing the Kubernetes community itself. For a lot of people, this is their first time contributing to open source, so that's why I say it's their new open source frontier.
**CRAIG BOX: And the logo is obviously very Star Trek-inspired. It sort of surprised me that it took that long for someone to go this route.**
REY LEJANO: I was very surprised as well. I had to relearn Adobe Illustrator to create the logo.
**CRAIG BOX: This your own work, is it?**
REY LEJANO: This is my own work.
**CRAIG BOX: It's very nice.**
REY LEJANO: Thank you very much. Funny, the galaxy actually took me the longest time versus the ship. Took me a few days to get that correct. I'm always fine-tuning it, so there might be a final change when this is actually released.
**CRAIG BOX: No frontier is ever truly final.**
REY LEJANO: True, very true.
**CRAIG BOX: Moving now from the theme of the release to the substance, perhaps, what is new in 1.23?**
REY LEJANO: We have 47 enhancements. I'm going to run through most of the stable ones, if not all of them, some of the key Beta ones, and a few of the Alpha enhancements for 1.23.
One of the key enhancements is [dual-stack IPv4/IPv6](https://github.com/kubernetes/enhancements/issues/563), which went GA in 1.23.
Some background info: dual-stack was introduced as Alpha in 1.15. You probably saw a keynote at KubeCon 2019. Back then, the way dual-stack worked was that you needed two services — you needed a service per IP family. You would need a service for IPv4 and a service for IPv6. It was refactored in 1.20. In 1.21, it was in Beta; clusters were enabled to be dual-stack by default.
And then in 1.23 we did remove the IPv6 dual-stack feature flag. It's not mandatory to use dual-stack. It's actually not "default" still. The pods, the services still default to single-stack. There are some requirements to be able to use dual-stack. The nodes have to be routable on IPv4 and IPv6 network interfaces. You need a CNI plugin that supports dual-stack. The pods themselves have to be configured to be dual-stack. And the services need the ipFamilyPolicy field to specify prefer dual-stack, or require dual-stack.
**CRAIG BOX: This sounds like there's an implication in this that v4 is still required. Do you see a world where we can actually move to v6-only clusters?**
REY LEJANO: I think we'll be talking about IPv4 and IPv6 for many, many years to come. I remember a long time ago, they kept saying "it's going to be all IPv6", and that was decades ago.
**CRAIG BOX: I think I may have mentioned on the show before, but there was [a meeting in London that Vint Cerf attended](https://www.youtube.com/watch?v=AEaJtZVimqs), and he gave a public presentation at the time to say, now is the time of v6. And that was 10 years ago at least. It's still not the time of v6, and my desktop still doesn't have Linux on it. One day.**
REY LEJANO: [LAUGHS] In my opinion, that's one of the big key features that went stable for 1.23.
One of the other highlights of 1.23 is [pod security admission going to Beta](/blog/2021/12/09/pod-security-admission-beta/). I know this feature is going to Beta, but I highlight this because as some people might know, PodSecurityPolicy, which was deprecated in 1.21, is targeted to be removed in 1.25. Pod security admission replaces pod security policy. It's an admission controller. It evaluates the pods against a predefined set of pod security standards to either admit or deny the pod for running.
There's three levels of pod security standards. Privileged, that's totally open. Baseline, known privileges escalations are minimized. Or Restricted, which is hardened. And you could set pod security standards either to run in three modes, which is enforce: reject any pods that are in violation; to audit: pods are allowed to be created, but the violations are recorded; or warn: it will send a warning message to the user, and the pod is allowed.
**CRAIG BOX: You mentioned there that PodSecurityPolicy is due to be deprecated in two releases' time. Are we lining up these features so that pod security admission will be GA at that time?**
REY LEJANO: Yes. Absolutely. I'll talk about that for another feature in a little bit as well. There's also another feature that went to GA. It was an API that went to GA, and therefore the Beta API is now deprecated. I'll talk about that a little bit.
**CRAIG BOX: All right. Let's talk about what's next on the list.**
REY LEJANO: Let's move on to more stable enhancements. One is the [TTL controller](https://github.com/kubernetes/enhancements/issues/592). This cleans up jobs and pods after the jobs are finished. There is a TTL timer that starts when the job or pod is finished. This TTL controller watches all the jobs, and ttlSecondsAfterFinished needs to be set. The controller will see if the ttlSecondsAfterFinished, combined with the last transition time, if it's greater than now. If it is, then it will delete the job and the pods of that job.
**CRAIG BOX: Loosely, it could be called a garbage collector?**
REY LEJANO: Yes. Garbage collector for pods and jobs, or jobs and pods.
**CRAIG BOX: If Kubernetes is truly becoming a programming language, it of course has to have a garbage collector implemented.**
REY LEJANO: Yeah. There's another one, too, coming in Alpha. [CHUCKLES]
**CRAIG BOX: Tell me about that.**
REY LEJANO: That one is coming in in Alpha. It's actually one of my favorite features, because there's only a few that I'm going to highlight today. [PVCs for StafeulSet will be cleaned up](https://github.com/kubernetes/enhancements/issues/1847). It will auto-delete PVCs created by StatefulSets, when you delete that StatefulSet.
**CRAIG BOX: What's next on our tour of stable features?**
REY LEJANO: Next one is, [skip volume ownership change goes to stable](https://github.com/kubernetes/enhancements/issues/695). This is from SIG Storage. There are times when you're running a stateful application, like many databases, they're sensitive to permission bits changing underneath. Currently, when a volume is bind mounted inside the container, the permissions of that volume will change recursively. It might take a really long time.
Now, there's a field, the fsGroupChangePolicy, which allows you, as a user, to tell Kubernetes how you want the permission and ownership change for that volume to happen. You can set it to always, to always change permissions, or just on mismatch, to only do it when the permission ownership changes at the top level is different from what is expected.
**CRAIG BOX: It does feel like a lot of these enhancements came from a very particular use case where someone said, "hey, this didn't work for me and I've plumbed in a feature that works with exactly the thing I need to have".**
REY LEJANO: Absolutely. People create issues for these, then create Kubernetes enhancement proposals, and then get targeted for releases.
**CRAIG BOX: Another GA feature in this release — ephemeral volumes.**
REY LEJANO: We've always been able to use empty dir for ephemeral volumes, but now we could actually have [ephemeral inline volumes](https://github.com/kubernetes/enhancements/issues/1698), meaning that you could take your standard CSI driver and be able to use ephemeral volumes with it.
**CRAIG BOX: And, a long time coming, [CronJobs](https://github.com/kubernetes/enhancements/issues/19).**
REY LEJANO: CronJobs is a funny one, because it was stable before 1.23. For 1.23, it was still tracked,but it was just cleaning up some of the old controller. With CronJobs, there's a v2 controller. What was cleaned up in 1.23 is just the old v1 controller.
**CRAIG BOX: Were there any other duplications or major cleanups of note in this release?**
REY LEJANO: Yeah. There were a few you might see in the major themes. One's a little tricky, around FlexVolumes. This is one of the efforts from SIG Storage. They have an effort to migrate in-tree plugins to CSI drivers. This is a little tricky, because FlexVolumes were actually deprecated in November 2020. We're [formally announcing it in 1.23](https://github.com/kubernetes/community/blob/master/sig-storage/volume-plugin-faq.md#kubernetes-volume-plugin-faq-for-storage-vendors).
**CRAIG BOX: FlexVolumes, in my mind, predate CSI as a concept. So it's about time to get rid of them.**
REY LEJANO: Yes, it is. There's another deprecation, just some [klog specific flags](https://kubernetes.io/docs/concepts/cluster-administration/system-logs/#klog), but other than that, there are no other big deprecations in 1.23.
**CRAIG BOX: The buzzword of the last KubeCon, and in some ways the theme of the last 12 months, has been secure software supply chain. What work is Kubernetes doing to improve in this area?**
REY LEJANO: For 1.23, Kubernetes is now SLSA compliant at Level 1, which means that provenance attestation files that describe the staging and release phases of the release process are satisfactory for the SLSA framework.
**CRAIG BOX: What needs to happen to step up to further levels?**
REY LEJANO: Level 1 means a few things — that the build is scripted; that the provenance is available, meaning that the artifacts are verified and they're handed over from one phase to the next; and describes how the artifact is produced. Level 2 means that the source is version-controlled, which it is, provenance is authenticated, provenance is service-generated, and there is a build service. There are four levels of SLSA compliance.
**CRAIG BOX: It does seem like the levels were largely influenced by what it takes to build a big, secure project like this. It doesn't seem like it will take a lot of extra work to move up to verifiable provenance, for example. There's probably just a few lines of script required to meet many of those requirements.**
REY LEJANO: Absolutely. I feel like we're almost there; we'll see what will come out of 1.24. And I do want to give a big shout-out to SIG Release and Release Engineering, primarily to Adolfo García Veytia, who is aka Puerco on GitHub and on Slack. He's been driving this forward.
**CRAIG BOX: You've mentioned some APIs that are being graduated in time to replace their deprecated version. Tell me about the new HPA API.**
REY LEJANO: The [horizontal pod autoscaler v2 API](https://github.com/kubernetes/enhancements/issues/2702), is now stable, which means that the v2beta2 API is deprecated. Just for everyone's knowledge, the v1 API is not being deprecated. The difference is that v2 adds support for multiple and custom metrics to be used for HPA.
**CRAIG BOX: There's also now a facility to validate my CRDs with an expression language.**
REY LEJANO: Yeah. You can use the [Common Expression Language, or CEL](https://github.com/google/cel-spec), to validate your CRDs, so you no longer need to use webhooks. This also makes the CRDs more self-contained and declarative, because the rules are now kept within the CRD object definition.
**CRAIG BOX: What new features, perhaps coming in Alpha or Beta, have taken your interest?**
REY LEJANO: Aside from pod security policies, I really love [ephemeral containers](https://github.com/kubernetes/enhancements/issues/277) supporting kubectl debug. It launches an ephemeral container and a running pod, shares those pod namespaces, and you can do all your troubleshooting with just running kubectl debug.
**CRAIG BOX: There's also been some interesting changes in the way that events are handled with kubectl.**
REY LEJANO: Yeah. kubectl events has always had some issues, like how things weren't sorted. [kubectl events improved](https://github.com/kubernetes/enhancements/issues/1440) that so now you can do `--watch`, and it will also sort with the `--watch` option as well. That is something new. You can actually combine fields and custom columns. And also, you can list events in the timeline with doing the last N number of minutes. And you can also sort events using other criteria as well.
**CRAIG BOX: You are a field engineer at SUSE. Are there any things that are coming in that your individual customers that you deal with are looking out for?**
REY LEJANO: More of what I look out for to help the customers.
**CRAIG BOX: Right.**
REY LEJANO: I really love kubectl events. Really love the PVCs being cleaned up with StatefulSets. Most of it's for selfish reasons that it will improve troubleshooting efforts. [CHUCKLES]
**CRAIG BOX: I have always hoped that a release team lead would say to me, "yes, I have selfish reasons. And I finally got something I wanted in."**
REY LEJANO: [LAUGHS]
**CRAIG BOX: Perhaps I should run to be release team lead, just so I can finally get init containers fixed once and for all.**
REY LEJANO: Oh, init containers, I've been looking for that for a while. I've actually created animated GIFs on how init containers will be run with that Kubernetes enhancement proposal, but it's halted currently.
**CRAIG BOX: One day.**
REY LEJANO: One day. Maybe I shouldn't stay halted.
**CRAIG BOX: You mentioned there are obviously the things you look out for. Are there any things that are coming down the line, perhaps Alpha features or maybe even just proposals you've seen lately, that you're personally really looking forward to seeing which way they go?**
REY LEJANO: Yeah. Oone is a very interesting one, it affects the whole community, so it's not just for personal reasons. As you may have known, Dockershim is deprecated. And we did release a blog that it will be removed in 1.24.
**CRAIG BOX: Scared a bunch of people.**
REY LEJANO: Scared a bunch of people. From a survey, we saw that a lot of people are still using Docker and Dockershim. One of the enhancements for 1.23 is, [kubelet CRI goes to Beta](https://github.com/kubernetes/enhancements/issues/2040). This promotes the CRI API, which is required. This had to be in Beta for Dockershim to be removed in 1.24.
**CRAIG BOX: Now, in the last release team lead interview, [we spoke with Savitha Raghunathan](https://kubernetespodcast.com/episode/157-kubernetes-1.22/), and she talked about what she would advise you as her successor. It was to look out for the mental health of the team members. How were you able to take that advice on board?**
REY LEJANO: That was great advice from Savitha. A few things I've made note of with each release team meeting. After each release team meeting, I stop the recording, because we do record all the meetings and post them on YouTube. And I open up the floor to anyone who wants to say anything that's not recorded, that's not going to be on the agenda. Also, I tell people not to work on weekends. I broke this rule once, but other than that, I told people it could wait. Just be mindful of your mental health.
**CRAIG BOX: It's just been announced that [James Laverack from Jetstack](https://twitter.com/JamesLaverack/status/1466834312993644551) will be the release team lead for 1.24. James and I shared an interesting Mexican dinner at the last KubeCon in San Diego.**
REY LEJANO: Oh, nice. I didn't know you knew James.
**CRAIG BOX: The British tech scene. We're a very small world. What will your advice to James be?**
REY LEJANO: What I would tell James for 1.24 is use teachable moments in the release team meetings. When you're a shadow for the first time, it's very daunting. It's very difficult, because you don't know the repos. You don't know the release process. Everyone around you seems like they know the release process, and very familiar with what the release process is. But as a first-time shadow, you don't know all the vernacular for the community. I just advise to use teachable moments. Take a few minutes in the release team meetings to make it a little easier for new shadows to ramp up and to be familiar with the release process.
**CRAIG BOX: Has there been major evolution in the process in the time that you've been involved? Or do you think that it's effectively doing what it needs to do?**
REY LEJANO: It's always evolving. I remember my first time in release notes, 1.18, we said that our goal was to automate and program our way out so that we don't have a release notes team anymore. That's changed [CHUCKLES] quite a bit. Although there's been significant advancements in the release notes process by Adolfo and also James, they've created a subcommand in krel to generate release notes.
But nowadays, all their release notes are richer. Still not there at the automation process yet. Every release cycle, there is something a little bit different. For this release cycle, we had a production readiness review deadline. It was a soft deadline. A production readiness review is a review by several people in the community. It's actually been required since 1.21, and it ensures that the enhancements are observable, scalable, supportable, and it's safe to operate in production, and could also be disabled or rolled back. In 1.23, we had a deadline to have the production readiness review completed by a specific date.
**CRAIG BOX: How have you found the change of schedule to three releases per year rather than four?**
REY LEJANO: Moving to three releases a year from four, in my opinion, has been an improvement, because we support the last three releases, and now we can actually support the last releases in a calendar year instead of having 9 months out of 12 months of the year.
**CRAIG BOX: The next event on the calendar is a [Kubernetes contributor celebration](https://www.kubernetes.dev/events/kcc2021/) starting next Monday. What can we expect from that event?**
REY LEJANO: This is our second time running this virtual event. It's a virtual celebration to recognize the whole community and all of our accomplishments of the year, and also contributors. There's a number of events during this week of celebration. It starts the week of December 13.
There's events like the Kubernetes Contributor Awards, where SIGs honor and recognize the hard work of the community and contributors. There's also a DevOps party game as well. There is a cloud native bake-off. I do highly suggest people to go to [kubernetes.dev/celebration](https://www.kubernetes.dev/events/past-events/2021/kcc2021/) to learn more.
**CRAIG BOX: How exactly does one judge a virtual bake-off?**
REY LEJANO: That I don't know. [CHUCKLES]
**CRAIG BOX: I tasted my scones. I think they're the best. I rate them 10 out of 10.**
REY LEJANO: Yeah. That is very difficult to do virtually. I would have to say, probably what the dish is, how closely it is tied with Kubernetes or open source or to CNCF. There's a few judges. I know Josh Berkus and Rin Oliver are a few of the judges running the bake-off.
**CRAIG BOX: Yes. We spoke with Josh about his love of the kitchen, and so he seems like a perfect fit for that role.**
REY LEJANO: He is.
**CRAIG BOX: Finally, your wife and yourself are expecting your first child in January. Have you had a production readiness review for that?**
REY LEJANO: I think we failed that review. [CHUCKLES]
**CRAIG BOX: There's still time.**
REY LEJANO: We are working on refactoring. We're going to refactor a little bit in December, and `--apply` again.
---
_[Rey Lejano](https://twitter.com/reylejano) is a field engineer at SUSE, by way of Rancher Labs, and was the release team lead for Kubernetes 1.23. He is now also a co-chair for SIG Docs. His son Liam is now 3 and a half months old._
_You can find the [Kubernetes Podcast from Google](http://www.kubernetespodcast.com/) at [@KubernetesPod](https://twitter.com/KubernetesPod) on Twitter, and you can [subscribe](https://kubernetespodcast.com/subscribe/) so you never miss an episode._

View File

@ -0,0 +1,25 @@
---
layout: blog
title: "Dockershim: The Historical Context"
date: 2022-05-03
slug: dockershim-historical-context
---
**Author:** Kat Cosgrove
Dockershim has been removed as of Kubernetes v1.24, and this is a positive move for the project. However, context is important for fully understanding something, be it socially or in software development, and this deserves a more in-depth review. Alongside the dockershim removal in Kubernetes v1.24, weve seen some confusion (sometimes at a panic level) and dissatisfaction with this decision in the community, largely due to a lack of context around this removal. The decision to deprecate and eventually remove dockershim from Kubernetes was not made quickly or lightly. Still, its been in the works for so long that many of todays users are newer than that decision, and certainly newer than the choices that led to the dockershim being necessary in the first place.
So what is the dockershim, and why is it going away?
In the early days of Kubernetes, we only supported one container runtime. That runtime was Docker Engine. Back then, there werent really a lot of other options out there and Docker was the dominant tool for working with containers, so this was not a controversial choice. Eventually, we started adding more container runtimes, like rkt and hypernetes, and it became clear that Kubernetes users want a choice of runtimes working best for them. So Kubernetes needed a way to allow cluster operators the flexibility to use whatever runtime they choose.
The [Container Runtime Interface](/blog/2016/12/container-runtime-interface-cri-in-kubernetes/) (CRI) was released to allow that flexibility. The introduction of CRI was great for the project and users alike, but it did introduce a problem: Docker Engines use as a container runtime predates CRI, and Docker Engine is not CRI-compatible. To solve this issue, a small software shim (dockershim) was introduced as part of the kubelet component specifically to fill in the gaps between Docker Engine and CRI, allowing cluster operators to continue using Docker Engine as their container runtime largely uninterrupted.
However, this little software shim was never intended to be a permanent solution. Over the course of years, its existence has introduced a lot of unnecessary complexity to the kubelet itself. Some integrations are inconsistently implemented for Docker because of this shim, resulting in an increased burden on maintainers, and maintaining vendor-specific code is not in line with our open source philosophy. To reduce this maintenance burden and move towards a more collaborative community in support of open standards, [KEP-2221 was introduced](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2221-remove-dockershim), proposing the removal of the dockershim. With the release of Kubernetes v1.20, the deprecation was official.
We didnt do a great job communicating this, and unfortunately, the deprecation announcement led to some panic within the community. Confusion around what this meant for Docker as a company, if container images built by Docker would still run, and what Docker Engine actually is led to a conflagration on social media. This was our fault; we should have more clearly communicated what was happening and why at the time. To combat this, we released [a blog](/blog/2020/12/02/dont-panic-kubernetes-and-docker/) and [accompanying FAQ](/blog/2020/12/02/dockershim-faq/) to allay the communitys fears and correct some misconceptions about what Docker is and how containers work within Kubernetes. As a result of the communitys concerns, Docker and Mirantis jointly agreed to continue supporting the dockershim code in the form of [cri-dockerd](https://www.mirantis.com/blog/the-future-of-dockershim-is-cri-dockerd/), allowing you to continue using Docker Engine as your container runtime if need be. For the interest of users who want to try other runtimes, like containerd or cri-o, [migration documentation was written](/docs/tasks/administer-cluster/migrating-from-dockershim/change-runtime-containerd/).
We later [surveyed the community](https://kubernetes.io/blog/2021/11/12/are-you-ready-for-dockershim-removal/) and [discovered that there are still many users with questions and concerns](/blog/2022/01/07/kubernetes-is-moving-on-from-dockershim). In response, Kubernetes maintainers and the CNCF committed to addressing these concerns by extending documentation and other programs. In fact, this blog post is a part of this program. With so many end users successfully migrated to other runtimes, and improved documentation, we believe that everyone has a paved way to migration now.
Docker is not going away, either as a tool or as a company. Its an important part of the cloud native community and the history of the Kubernetes project. We wouldnt be where we are without them. That said, removing dockershim from kubelet is ultimately good for the community, the ecosystem, the project, and open source at large. This is an opportunity for all of us to come together to support open standards, and were glad to be doing so with the help of Docker and the community.

View File

@ -0,0 +1,242 @@
---
layout: blog
title: "Kubernetes 1.24: Stargazer"
date: 2022-05-03
slug: kubernetes-1-24-release-announcement
---
**Authors**: [Kubernetes 1.24 Release Team](https://github.com/kubernetes/sig-release/blob/master/releases/release-1.24/release-team.md)
We are excited to announce the release of Kubernetes 1.24, the first release of 2022!
This release consists of 46 enhancements: fourteen enhancements have graduated to stable,
fifteen enhancements are moving to beta, and thirteen enhancements are entering alpha.
Also, two features have been deprecated, and two features have been removed.
## Major Themes
### Dockershim Removed from kubelet
After its deprecation in v1.20, the dockershim component has been removed from the kubelet in Kubernetes v1.24.
From v1.24 onwards, you will need to either use one of the other [supported runtimes](/docs/setup/production-environment/container-runtimes/) (such as containerd or CRI-O)
or use cri-dockerd if you are relying on Docker Engine as your container runtime.
For more information about ensuring your cluster is ready for this removal, please
see [this guide](/blog/2022/03/31/ready-for-dockershim-removal/).
### Beta APIs Off by Default
[New beta APIs will not be enabled in clusters by default](https://github.com/kubernetes/enhancements/issues/3136).
Existing beta APIs and new versions of existing beta APIs will continue to be enabled by default.
### Signing Release Artifacts
Release artifacts are [signed](https://github.com/kubernetes/enhancements/issues/3031) using [cosign](https://github.com/sigstore/cosign)
signatures,
and there is experimental support for [verifying image signatures](/docs/tasks/administer-cluster/verify-signed-images/).
Signing and verification of release artifacts is part of [increasing software supply chain security for the Kubernetes release process](https://github.com/kubernetes/enhancements/issues/3027).
### OpenAPI v3
Kubernetes 1.24 offers beta support for publishing its APIs in the [OpenAPI v3 format](https://github.com/kubernetes/enhancements/issues/2896).
### Storage Capacity and Volume Expansion Are Generally Available
[Storage capacity tracking](https://github.com/kubernetes/enhancements/issues/1472)
supports exposing currently available storage capacity via [CSIStorageCapacity objects](/docs/concepts/storage/storage-capacity/#api)
and enhances scheduling of pods that use CSI volumes with late binding.
[Volume expansion](https://github.com/kubernetes/enhancements/issues/284) adds support
for resizing existing persistent volumes.
### NonPreemptingPriority to Stable
This feature adds [a new option to PriorityClasses](https://github.com/kubernetes/enhancements/issues/902),
which can enable or disable pod preemption.
### Storage Plugin Migration
Work is underway to [migrate the internals of in-tree storage plugins](https://github.com/kubernetes/enhancements/issues/625) to call out to CSI Plugins
while maintaining the original API.
The [Azure Disk](https://github.com/kubernetes/enhancements/issues/1490)
and [OpenStack Cinder](https://github.com/kubernetes/enhancements/issues/1489) plugins
have both been migrated.
### gRPC Probes Graduate to Beta
With Kubernetes 1.24, the [gRPC probes functionality](https://github.com/kubernetes/enhancements/issues/2727)
has entered beta and is available by default. You can now [configure startup, liveness, and readiness probes](/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes) for your gRPC app
natively within Kubernetes without exposing an HTTP endpoint or
using an extra executable.
### Kubelet Credential Provider Graduates to Beta
Originally released as Alpha in Kubernetes 1.20, the kubelet's support for
[image credential providers](/docs/tasks/kubelet-credential-provider/kubelet-credential-provider/)
has now graduated to Beta.
This allows the kubelet to dynamically retrieve credentials for a container image registry
using exec plugins rather than storing credentials on the node's filesystem.
### Contextual Logging in Alpha
Kubernetes 1.24 has introduced [contextual logging](https://github.com/kubernetes/enhancements/issues/3077)
that enables the caller of a function to control all aspects of logging (output formatting, verbosity, additional values, and names).
### Avoiding Collisions in IP allocation to Services
Kubernetes 1.24 introduces a new opt-in feature that allows you to
[soft-reserve a range for static IP address assignments](/docs/concepts/services-networking/service/#service-ip-static-sub-range)
to Services.
With the manual enablement of this feature, the cluster will prefer automatic assignment from
the pool of Service IP addresses, thereby reducing the risk of collision.
A Service `ClusterIP` can be assigned:
* dynamically, which means the cluster will automatically pick a free IP within the configured Service IP range.
* statically, which means the user will set one IP within the configured Service IP range.
Service `ClusterIP` are unique; hence, trying to create a Service with a `ClusterIP` that has already been allocated will return an error.
### Dynamic Kubelet Configuration is Removed from the Kubelet
After being deprecated in Kubernetes 1.22, Dynamic Kubelet Configuration has been removed from the kubelet. The feature will be removed from the API server in Kubernetes 1.26.
## CNI Version-Related Breaking Change
Before you upgrade to Kubernetes 1.24, please verify that you are using/upgrading to a container
runtime that has been tested to work correctly with this release.
For example, the following container runtimes are being prepared, or have already been prepared, for Kubernetes:
* containerd v1.6.4 and later, v1.5.11 and later
* CRI-O 1.24 and later
Service issues exist for pod CNI network setup and tear down in containerd
v1.6.0v1.6.3 when the CNI plugins have not been upgraded and/or the CNI config
version is not declared in the CNI config files. The containerd team reports, "these issues are resolved in containerd v1.6.4."
With containerd v1.6.0v1.6.3, if you do not upgrade the CNI plugins and/or
declare the CNI config version, you might encounter the following "Incompatible
CNI versions" or "Failed to destroy network for sandbox" error conditions.
## CSI Snapshot
_This information was added after initial publication._
[VolumeSnapshot v1beta1 CRD has been removed](https://github.com/kubernetes/enhancements/issues/177).
Volume snapshot and restore functionality for Kubernetes and the Container Storage Interface (CSI), which provides standardized APIs design (CRDs) and adds PV snapshot/restore support for CSI volume drivers, moved to GA in v1.20. VolumeSnapshot v1beta1 was deprecated in v1.20 and is now unsupported. Refer to [KEP-177: CSI Snapshot](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/177-volume-snapshot#kep-177-csi-snapshot) and [Volume Snapshot GA blog](https://kubernetes.io/blog/2020/12/10/kubernetes-1.20-volume-snapshot-moves-to-ga/) for more information.
## Other Updates
### Graduations to Stable
This release saw fourteen enhancements promoted to stable:
* [Container Storage Interface (CSI) Volume Expansion](https://github.com/kubernetes/enhancements/issues/284)
* [Pod Overhead](https://github.com/kubernetes/enhancements/issues/688): Account for resources tied to the pod sandbox but not specific containers.
* [Add non-preempting option to PriorityClasses](https://github.com/kubernetes/enhancements/issues/902)
* [Storage Capacity Tracking](https://github.com/kubernetes/enhancements/issues/1472)
* [OpenStack Cinder In-Tree to CSI Driver Migration](https://github.com/kubernetes/enhancements/issues/1489)
* [Azure Disk In-Tree to CSI Driver Migration](https://github.com/kubernetes/enhancements/issues/1490)
* [Efficient Watch Resumption](https://github.com/kubernetes/enhancements/issues/1904): Watch can be efficiently resumed after kube-apiserver reboot.
* [Service Type=LoadBalancer Class Field](https://github.com/kubernetes/enhancements/issues/1959): Introduce a new Service annotation `service.kubernetes.io/load-balancer-class` that allows multiple implementations of `type: LoadBalancer` Services in the same cluster.
* [Indexed Job](https://github.com/kubernetes/enhancements/issues/2214): Add a completion index to Pods of Jobs with a fixed completion count.
* [Add Suspend Field to Jobs API](https://github.com/kubernetes/enhancements/issues/2232): Add a suspend field to the Jobs API to allow orchestrators to create jobs with more control over when pods are created.
* [Pod Affinity NamespaceSelector](https://github.com/kubernetes/enhancements/issues/2249): Add a `namespaceSelector` field for to pod affinity/anti-affinity spec.
* [Leader Migration for Controller Managers](https://github.com/kubernetes/enhancements/issues/2436): kube-controller-manager and cloud-controller-manager can apply new controller-to-controller-manager assignment in HA control plane without downtime.
* [CSR Duration](https://github.com/kubernetes/enhancements/issues/2784): Extend the CertificateSigningRequest API with a mechanism to allow clients to request a specific duration for the issued certificate.
### Major Changes
This release saw two major changes:
* [Dockershim Removal](https://github.com/kubernetes/enhancements/issues/2221)
* [Beta APIs are off by Default](https://github.com/kubernetes/enhancements/issues/3136)
### Release Notes
Check out the full details of the Kubernetes 1.24 release in our [release notes](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md).
### Availability
Kubernetes 1.24 is available for download on [GitHub](https://github.com/kubernetes/kubernetes/releases/tag/v1.24.0).
To get started with Kubernetes, check out these [interactive tutorials](/docs/tutorials/) or run local
Kubernetes clusters using containers as “nodes”, with [kind](https://kind.sigs.k8s.io/).
You can also easily install 1.24 using [kubeadm](/docs/setup/independent/create-cluster-kubeadm/).
### Release Team
This release would not have been possible without the combined efforts of committed individuals
comprising the Kubernetes 1.24 release team. This team came together to deliver all of the components
that go into each Kubernetes release, including code, documentation, release notes, and more.
Special thanks to James Laverack, our release lead, for guiding us through a successful release cycle,
and to all of the release team members for the time and effort they put in to deliver the v1.24
release for the Kubernetes community.
### Release Theme and Logo
**Kubernetes 1.24: Stargazer**
{{< figure src="/images/blog/2022-05-03-kubernetes-release-1.24/kubernetes-1.24.png" alt="" class="release-logo" >}}
The theme for Kubernetes 1.24 is _Stargazer_.
Generations of people have looked to the stars in awe and wonder, from ancient astronomers to the
scientists who built the James Webb Space Telescope. The stars have inspired us, set our imagination
alight, and guided us through long nights on difficult seas.
With this release we gaze upwards, to what is possible when our community comes together. Kubernetes
is the work of hundreds of contributors across the globe and thousands of end-users supporting
applications that serve millions. Every one is a star in our sky, helping us chart our course.
The release logo is made by [Britnee Laverack](https://www.instagram.com/artsyfie/), and depicts a telescope set upon starry skies and the
[Pleiades](https://en.wikipedia.org/wiki/Pleiades), often known in mythology as the “Seven Sisters”. The number seven is especially auspicious
for the Kubernetes project, and is a reference back to our original “Project Seven” name.
This release of Kubernetes is named for those that would look towards the night sky and wonder — for
all the stargazers out there. ✨
### User Highlights
* Check out how leading retail e-commerce company [La Redoute used Kubernetes, alongside other CNCF projects, to transform and streamline its software delivery lifecycle](https://www.cncf.io/case-studies/la-redoute/) - from development to operations.
* Trying to ensure no change to an API call would cause any breaks, [Salt Security built its microservices entirely on Kubernetes, and it communicates via gRPC while Linkerd ensures messages are encrypted](https://www.cncf.io/case-studies/salt-security/).
* In their effort to migrate from private to public cloud, [Allainz Direct engineers redesigned its CI/CD pipeline in just three months while managing to condense 200 workflows down to 10-15](https://www.cncf.io/case-studies/allianz/).
* Check out how [Bink, a UK based fintech company, updated its in-house Kubernetes distribution with Linkerd to build a cloud-agnostic platform that scales as needed whilst allowing them to keep a close eye on performance and stability](https://www.cncf.io/case-studies/bink/).
* Using Kubernetes, the Dutch organization [Stichting Open Nederland](http://www.stichtingopennederland.nl/) created a testing portal in just one-and-a-half months to help safely reopen events in the Netherlands. The [Testing for Entry (Testen voor Toegang)](https://www.testenvoortoegang.org/) platform [leveraged the performance and scalability of Kubernetes to help individuals book over 400,000 COVID-19 testing appointments per day. ](https://www.cncf.io/case-studies/true/)
* Working alongside SparkFabrik and utilizing Backstage, [Santagostino created the developer platform Samaritan to centralize services and documentation, manage the entire lifecycle of services, and simplify the work of Santagostino developers](https://www.cncf.io/case-studies/santagostino/).
### Ecosystem Updates
* KubeCon + CloudNativeCon Europe 2022 will take place in Valencia, Spain, from 16 20 May 2022! You can find more information about the conference and registration on the [event site](https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/).
* In the [2021 Cloud Native Survey](https://www.cncf.io/announcements/2022/02/10/cncf-sees-record-kubernetes-and-container-adoption-in-2021-cloud-native-survey/), the CNCF saw record Kubernetes and container adoption. Take a look at the [results of the survey](https://www.cncf.io/reports/cncf-annual-survey-2021/).
* The [Linux Foundation](https://www.linuxfoundation.org/) and [The Cloud Native Computing Foundation](https://www.cncf.io/) (CNCF) announced the availability of a new [Cloud Native Developer Bootcamp](https://training.linuxfoundation.org/training/cloudnativedev-bootcamp/?utm_source=lftraining&utm_medium=pr&utm_campaign=clouddevbc0322) to provide participants with the knowledge and skills to design, build, and deploy cloud native applications. Check out the [announcement](https://www.cncf.io/announcements/2022/03/15/new-cloud-native-developer-bootcamp-provides-a-clear-path-to-cloud-native-careers/) to learn more.
### Project Velocity
The [CNCF K8s DevStats](https://k8s.devstats.cncf.io/d/12/dashboards?orgId=1&refresh=15m) project
aggregates a number of interesting data points related to the velocity of Kubernetes and various
sub-projects. This includes everything from individual contributions to the number of companies that
are contributing, and is an illustration of the depth and breadth of effort that goes into evolving this ecosystem.
In the v1.24 release cycle, which [ran for 17 weeks](https://github.com/kubernetes/sig-release/tree/master/releases/release-1.24) (January 10 to May 3), we saw contributions from [1029 companies](https://k8s.devstats.cncf.io/d/9/companies-table?orgId=1&var-period_name=v1.23.0%20-%20v1.24.0&var-metric=contributions) and [1179 individuals](https://k8s.devstats.cncf.io/d/66/developer-activity-counts-by-companies?orgId=1&var-period_name=v1.23.0%20-%20v1.24.0&var-metric=contributions&var-repogroup_name=Kubernetes&var-country_name=All&var-companies=All&var-repo_name=kubernetes%2Fkubernetes).
## Upcoming Release Webinar
Join members of the Kubernetes 1.24 release team on Tue May 24, 2022 9:45am 11am PT to learn about
the major features of this release, as well as deprecations and removals to help plan for upgrades.
For more information and registration, visit the [event page](https://community.cncf.io/e/mck3kd/)
on the CNCF Online Programs site.
## Get Involved
The simplest way to get involved with Kubernetes is by joining one of the many [Special Interest Groups](https://github.com/kubernetes/community/blob/master/sig-list.md) (SIGs) that align with your interests.
Have something youd like to broadcast to the Kubernetes community? Share your voice at our weekly [community meeting](https://github.com/kubernetes/community/tree/master/communication), and through the channels below:
* Find out more about contributing to Kubernetes at the [Kubernetes Contributors](https://www.kubernetes.dev/) website
* Follow us on Twitter [@Kubernetesio](https://twitter.com/kubernetesio) for the latest updates
* Join the community discussion on [Discuss](https://discuss.kubernetes.io/)
* Join the community on [Slack](http://slack.k8s.io/)
* Post questions (or answer questions) on [Server Fault](https://serverfault.com/questions/tagged/kubernetes).
* Share your Kubernetes [story](https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform)
* Read more about whats happening with Kubernetes on the [blog](https://kubernetes.io/blog/)
* Learn more about the [Kubernetes Release Team](https://github.com/kubernetes/sig-release/tree/master/release-team)

View File

@ -0,0 +1,103 @@
---
layout: blog
title: "Kubernetes 1.24: Volume Expansion Now A Stable Feature"
date: 2022-05-05
slug: volume-expansion-ga
---
**Author:** Hemant Kumar (Red Hat)
Volume expansion was introduced as a alpha feature in Kubernetes 1.8 and it went beta in 1.11 and with Kubernetes 1.24 we are excited to announce general availability(GA)
of volume expansion.
This feature allows Kubernetes users to simply edit their `PersistentVolumeClaim` objects and specify new size in PVC Spec and Kubernetes will automatically expand the volume
using storage backend and also expand the underlying file system in-use by the Pod without requiring any downtime at all if possible.
### How to use volume expansion
You can trigger expansion for a PersistentVolume by editing the `spec` field of a PVC, specifying a different
(and larger) storage request. For example, given following PVC:
```
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: myclaim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi # specify new size here
```
You can request expansion of the underlying PersistentVolume by specifying a new value instead of old `1Gi` size.
Once you've changed the requested size, watch the `status.conditions` field of the PVC to see if the
resize has completed.
When Kubernetes starts expanding the volume - it will add `Resizing` condition to the PVC, which will be removed once expansion completes. More information about progress of
expansion operation can also be obtained by monitoring events associated with the PVC:
```
kubectl describe pvc <pvc>
```
### Storage driver support
Not every volume type however is expandable by default. Some volume types such as - intree hostpath volumes are not expandable at all. For CSI volumes - the CSI driver
must have capability `EXPAND_VOLUME` in controller or node service (or both if appropriate). Please refer to documentation of your CSI driver, to find out
if it supports volume expansion.
Please refer to volume expansion documentation for intree volume types which support volume expansion - [Expanding Persistent Volumes](/docs/concepts/storage/persistent-volumes/#expanding-persistent-volumes-claims).
In general to provide some degree of control over volumes that can be expanded, only dynamically provisioned PVCs whose storage class has `allowVolumeExpansion` parameter set to `true` are expandable.
A Kubernetes cluster administrator must edit the appropriate StorageClass object and set
the `allowVolumeExpansion` field to `true`. For example:
```
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp2-default
provisioner: kubernetes.io/aws-ebs
parameters:
secretNamespace: ""
secretName: ""
allowVolumeExpansion: true
```
### Online expansion compared to offline expansion
By default, Kubernetes attempts to expand volumes immediately after user requests a resize.
If one or more Pods are using the volume, Kubernetes tries to expands the volume using an online resize;
as a result volume expansion usually requires no application downtime.
Filesystem expansion on the node is also performed online and hence does not require shutting
down any Pod that was using the PVC.
If you expand a PersistentVolume that is not in use, Kubernetes does an offline resize (and,
because the volume isn't in use, there is again no workload disruption).
In some cases though - if underlying Storage Driver can only support offline expansion, users of the PVC must take down their Pod before expansion can succeed. Please refer to documentation of your storage
provider to find out - what mode of volume expansion it supports.
When volume expansion was introduced as an alpha feature, Kubernetes only supported offline filesystem
expansion on the node and hence required users to restart their pods for file system resizing to finish.
His behaviour has been changed and Kubernetes tries its best to fulfil any resize request regardless
of whether the underlying PersistentVolume volume is online or offline. If your storage provider supports
online expansion then no Pod restart should be necessary for volume expansion to finish.
## Next steps
Although volume expansion is now stable as part of the recent v1.24 release,
SIG Storage are working to make it even simpler for users of Kubernetes to expand their persistent storage.
Kubernetes 1.23 introduced features for triggering recovery from failed volume expansion, allowing users
to attempt self-service healing after a failed resize.
See [Recovering from volume expansion failure](/docs/concepts/storage/persistent-volumes/#recovering-from-failure-when-expanding-volumes) for more details.
The Kubernetes contributor community is also discussing the potential for StatefulSet-driven storage expansion. This proposed
feature would let you trigger expansion for all underlying PVs that are providing storage to a StatefulSet,
by directly editing the StatefulSet object.
See the [Support Volume Expansion Through StatefulSets](https://github.com/kubernetes/enhancements/issues/661) enhancement proposal for more details.

View File

@ -0,0 +1,79 @@
---
layout: blog
title: "Storage Capacity Tracking reaches GA in Kubernetes 1.24"
date: 2022-05-06
slug: storage-capacity-ga
---
**Authors:** Patrick Ohly (Intel)
The v1.24 release of Kubernetes brings [storage capacity](/docs/concepts/storage/storage-capacity/)
tracking as a generally available feature.
## Problems we have solved
As explained in more detail in the [previous blog post about this
feature](/blog/2021/04/14/local-storage-features-go-beta/), storage capacity
tracking allows a CSI driver to publish information about remaining
capacity. The kube-scheduler then uses that information to pick suitable nodes
for a Pod when that Pod has volumes that still need to be provisioned.
Without this information, a Pod may get stuck without ever being scheduled onto
a suitable node because kube-scheduler has to choose blindly and always ends up
picking a node for which the volume cannot be provisioned because the
underlying storage system managed by the CSI driver does not have sufficient
capacity left.
Because CSI drivers publish storage capacity information that gets used at a
later time when it might not be up-to-date anymore, it can still happen that a
node is picked that doesn't work out after all. Volume provisioning recovers
from that by informing the scheduler that it needs to try again with a
different node.
[Load
tests](https://github.com/kubernetes-csi/csi-driver-host-path/blob/master/docs/storage-capacity-tracking.md)
that were done again for promotion to GA confirmed that all storage in a
cluster can be consumed by Pods with storage capacity tracking whereas Pods got
stuck without it.
## Problems we have *not* solved
Recovery from a failed volume provisioning attempt has one known limitation: if a Pod
uses two volumes and only one of them could be provisioned, then all future
scheduling decisions are limited by the already provisioned volume. If that
volume is local to a node and the other volume cannot be provisioned there, the
Pod is stuck. This problem pre-dates storage capacity tracking and while the
additional information makes it less likely to occur, it cannot be avoided in
all cases, except of course by only using one volume per Pod.
An idea for solving this was proposed in a [KEP
draft](https://github.com/kubernetes/enhancements/pull/1703): volumes that were
provisioned and haven't been used yet cannot have any valuable data and
therefore could be freed and provisioned again elsewhere. SIG Storage is
looking for interested developers who want to continue working on this.
Also not solved is support in Cluster Autoscaler for Pods with volumes. For CSI
drivers with storage capacity tracking, a prototype was developed and discussed
in [a PR](https://github.com/kubernetes/autoscaler/pull/3887). It was meant to
work with arbitrary CSI drivers, but that flexibility made it hard to configure
and slowed down scale up operations: because autoscaler was unable to simulate
volume provisioning, it only scaled the cluster by one node at a time, which
was seen as insufficient.
Therefore that PR was not merged and a different approach with tighter coupling
between autoscaler and CSI driver will be needed. For this a better
understanding is needed about which local storage CSI drivers are used in
combination with cluster autoscaling. Should this lead to a new KEP, then users
will have to try out an implementation in practice before it can move to beta
or GA. So please reach out to SIG Storage if you have an interest in this
topic.
## Acknowledgements
Thanks a lot to the members of the community who have contributed to this
feature or given feedback including members of [SIG
Scheduling](https://github.com/kubernetes/community/tree/master/sig-scheduling),
[SIG
Autoscaling](https://github.com/kubernetes/community/tree/master/sig-autoscaling),
and of course [SIG
Storage](https://github.com/kubernetes/community/tree/master/sig-storage)!

View File

@ -0,0 +1,209 @@
---
layout: blog
title: "Kubernetes 1.24: gRPC container probes in beta"
date: 2022-05-13
slug: grpc-probes-now-in-beta
---
**Author**: Sergey Kanzhelev (Google)
With Kubernetes 1.24 the gRPC probes functionality entered beta and is available by default.
Now you can configure startup, liveness, and readiness probes for your gRPC app
without exposing any HTTP endpoint, nor do you need an executable. Kubernetes can natively connect to your your workload via gRPC and query its status.
## Some history
It's useful to let the system managing your workload check that the app is
healthy, has started OK, and whether the app considers itself good to accept
traffic. Before the gRPC support was added, Kubernetes already allowed you to
check for health based on running an executable from inside the container image,
by making an HTTP request, or by checking whether a TCP connection succeeded.
For most apps, those checks are enough. If your app provides a gRPC endpoint
for a health (or readiness) check, it is easy
to repurpose the `exec` probe to use it for gRPC health checking.
In the blog article [Health checking gRPC servers on Kubernetes](/blog/2018/10/01/health-checking-grpc-servers-on-kubernetes/),
Ahmet Alp Balkan described how you can do that — a mechanism that still works today.
There is a commonly used tool to enable this that was [created](https://github.com/grpc-ecosystem/grpc-health-probe/commit/2df4478982e95c9a57d5fe3f555667f4365c025d)
on August 21, 2018, and with
the first release at [Sep 19, 2018](https://github.com/grpc-ecosystem/grpc-health-probe/releases/tag/v0.1.0-alpha.1).
This approach for gRPC apps health checking is very popular. There are [3,626 Dockerfiles](https://github.com/search?l=Dockerfile&q=grpc_health_probe&type=code)
with the `grpc_health_probe` and [6,621 yaml](https://github.com/search?l=YAML&q=grpc_health_probe&type=Code) files that are discovered with the
basic search on GitHub (at the moment of writing). This is good indication of the tool popularity
and the need to support this natively.
Kubernetes v1.23 introduced an alpha-quality implementation of native support for
querying a workload status using gRPC. Because it was an alpha feature,
this was disabled by default for the v1.23 release.
## Using the feature
We built gRPC health checking in similar way with other probes and believe
it will be [easy to use](/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-grpc-liveness-probe)
if you are familiar with other probe types in Kubernetes.
The natively supported health probe has many benefits over the workaround involving `grpc_health_probe` executable.
With the native gRPC support you don't need to download and carry `10MB` of an additional executable with your image.
Exec probes are generally slower than a gRPC call as they require instantiating a new process to run an executable.
It also makes the checks less sensible for edge cases when the pod is running at maximum resources and has troubles
instantiating new processes.
There are a few limitations though. Since configuring a client certificate for probes is hard,
services that require client authentication are not supported. The built-in probes are also
not checking the server certificates and ignore related problems.
Built-in checks also cannot be configured to ignore certain types of errors
(`grpc_health_probe` returns different exit codes for different errors),
and cannot be "chained" to run the health check on multiple services in a single probe.
But all these limitations are quite standard for gRPC and there are easy workarounds
for those.
## Try it for yourself
### Cluster-level setup
You can try this feature today. To try native gRPC probes, you can spin up a Kubernetes cluster
yourself with the `GRPCContainerProbe` feature gate enabled, there are many [tools available](/docs/tasks/tools/).
Since the feature gate `GRPCContainerProbe` is enabled by default in 1.24,
many vendors will have this functionality working out of the box.
So you may just create an 1.24 cluster on platform of your choice. Some vendors
allow to enable alpha features on 1.23 clusters.
For example, at the moment of writing, you can spin up the test cluster on GKE for a quick test.
Other vendors may also have similar capabilities, especially if you
are reading this blog post long after the Kubernetes 1.24 release.
On GKE use the following command (note, version is `1.23` and `enable-kubernetes-alpha` are specified).
```shell
gcloud container clusters create test-grpc \
--enable-kubernetes-alpha \
--no-enable-autorepair \
--no-enable-autoupgrade \
--release-channel=rapid \
--cluster-version=1.23
```
You will also need to configure `kubectl` to access the cluster:
```shell
gcloud container clusters get-credentials test-grpc
```
### Trying the feature out
Let's create the pod to test how gRPC probes work. For this test we will use the `agnhost` image.
This is a k8s maintained image with that can be used for all sorts of workload testing.
For example, it has a useful [grpc-health-checking](https://github.com/kubernetes/kubernetes/blob/b2c5bd2a278288b5ef19e25bf7413ecb872577a4/test/images/agnhost/README.md#grpc-health-checking) module
that exposes two ports - one is serving health checking service,
another - http port to react on commands `make-serving` and `make-not-serving`.
Here is an example pod definition. It starts the `grpc-health-checking` module,
exposes ports `5000` and `8080`, and configures gRPC readiness probe:
``` yaml
---
apiVersion: v1
kind: Pod
metadata:
name: test-grpc
spec:
containers:
- name: agnhost
image: k8s.gcr.io/e2e-test-images/agnhost:2.35
command: ["/agnhost", "grpc-health-checking"]
ports:
- containerPort: 5000
- containerPort: 8080
readinessProbe:
grpc:
port: 5000
```
If the file called `test.yaml`, you can create the pod and check it's status.
The pod will be in ready state as indicated by the snippet of the output.
```shell
kubectl apply -f test.yaml
kubectl describe test-grpc
```
The output will contain something like this:
```
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
```
Now let's change the health checking endpoint status to NOT_SERVING.
In order to call the http port of the Pod, let's create a port forward:
```shell
kubectl port-forward test-grpc 8080:8080
```
You can `curl` to call the command...
```shell
curl http://localhost:8080/make-not-serving
```
... and in a few seconds the port status will switch to not ready.
```shell
kubectl describe pod test-grpc
```
The output now will have:
```
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
...
Warning Unhealthy 2s (x6 over 42s) kubelet Readiness probe failed: service unhealthy (responded with "NOT_SERVING")
```
Once it is switched back, in about one second the Pod will get back to ready status:
``` bsh
curl http://localhost:8080/make-serving
kubectl describe test-grpc
```
The output indicates that the Pod went back to being `Ready`:
```
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
```
This new built-in gRPC health probing on Kubernetes makes implementing a health-check via gRPC
much easier than the older approach that relied on using a separate `exec` probe. Read through
the official
[documentation](/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-grpc-liveness-probe)
to learn more and provide feedback before the feature will be promoted to GA.
## Summary
Kubernetes is a popular workload orchestration platform and we add features based on feedback and demand.
Features like gRPC probes support is a minor improvement that will make life of many app developers
easier and apps more resilient. Try it today and give feedback, before the feature went into GA.

View File

@ -0,0 +1,162 @@
---
layout: blog
title: "Kubernetes 1.24: Volume Populators Graduate to Beta"
date: 2022-05-16
slug: volume-populators-beta
---
**Author:**
Ben Swartzlander (NetApp)
The volume populators feature is now two releases old and entering beta! The `AnyVolumeDataSource` feature
gate defaults to enabled in Kubernetes v1.24, which means that users can specify any custom resource
as the data source of a PVC.
An [earlier blog article](/blog/2021/08/30/volume-populators-redesigned/) detailed how the
volume populators feature works. In short, a cluster administrator can install a CRD and
associated populator controller in the cluster, and any user who can create instances of
the CR can create pre-populated volumes by taking advantage of the populator.
Multiple populators can be installed side by side for different purposes. The SIG storage
community is already seeing some implementations in public, and more prototypes should
appear soon.
Cluster administrations are **strongly encouraged** to install the
volume-data-source-validator controller and associated `VolumePopulator` CRD before installing
any populators so that users can get feedback about invalid PVC data sources.
## New Features
The [lib-volume-populator](https://github.com/kubernetes-csi/lib-volume-populator) library
on which populators are built now includes metrics to help operators monitor and detect
problems. This library is now beta and latest release is v1.0.1.
The [volume data source validator](https://github.com/kubernetes-csi/volume-data-source-validator)
controller also has metrics support added, and is in beta. The `VolumePopulator` CRD is
beta and the latest release is v1.0.1.
## Trying it out
To see how this works, you can install the sample "hello" populator and try it
out.
First install the volume-data-source-validator controller.
```shell
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/volume-data-source-validator/v1.0.1/client/config/crd/populator.storage.k8s.io_volumepopulators.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/volume-data-source-validator/v1.0.1/deploy/kubernetes/rbac-data-source-validator.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/volume-data-source-validator/v1.0.1/deploy/kubernetes/setup-data-source-validator.yaml
```
Next install the example populator.
```shell
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/lib-volume-populator/v1.0.1/example/hello-populator/crd.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/lib-volume-populator/87a47467b86052819e9ad13d15036d65b9a32fbb/example/hello-populator/deploy.yaml
```
Your cluster now has a new CustomResourceDefinition that provides a test API named Hello.
Create an instance of the `Hello` custom resource, with some text:
```yaml
apiVersion: hello.example.com/v1alpha1
kind: Hello
metadata:
name: example-hello
spec:
fileName: example.txt
fileContents: Hello, world!
```
Create a PVC that refers to that CR as its data source.
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: example-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Mi
dataSourceRef:
apiGroup: hello.example.com
kind: Hello
name: example-hello
volumeMode: Filesystem
```
Next, run a Job that reads the file in the PVC.
```yaml
apiVersion: batch/v1
kind: Job
metadata:
name: example-job
spec:
template:
spec:
containers:
- name: example-container
image: busybox:latest
command:
- cat
- /mnt/example.txt
volumeMounts:
- name: vol
mountPath: /mnt
restartPolicy: Never
volumes:
- name: vol
persistentVolumeClaim:
claimName: example-pvc
```
Wait for the job to complete (including all of its dependencies).
```shell
kubectl wait --for=condition=Complete job/example-job
```
And last examine the log from the job.
```shell
kubectl logs job/example-job
```
The output should be:
```terminal
Hello, world!
```
Note that the volume already contained a text file with the string contents from
the CR. This is only the simplest example. Actual populators can set up the volume
to contain arbitrary contents.
## How to write your own volume populator
Developers interested in writing new poplators are encouraged to use the
[lib-volume-populator](https://github.com/kubernetes-csi/lib-volume-populator) library
and to only supply a small controller wrapper around the library, and a pod image
capable of attaching to volumes and writing the appropriate data to the volume.
Individual populators can be extremely generic such that they work with every type
of PVC, or they can do vendor specific things to rapidly fill a volume with data
if the volume was provisioned by a specific CSI driver from the same vendor, for
example, by communicating directly with the storage for that volume.
## How can I learn more?
The enhancement proposal,
[Volume Populators](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1495-volume-populators), includes lots of detail about the history and technical implementation
of this feature.
[Volume populators and data sources](/docs/concepts/storage/persistent-volumes/#volume-populators-and-data-sources), within the documentation topic about persistent volumes,
explains how to use this feature in your cluster.
Please get involved by joining the Kubernetes storage SIG to help us enhance this
feature. There are a lot of good ideas already and we'd be thrilled to have more!

View File

@ -0,0 +1,117 @@
---
layout: blog
title: 'Kubernetes 1.24: Prevent unauthorised volume mode conversion'
date: 2022-05-18
slug: prevent-unauthorised-volume-mode-conversion-alpha
---
**Author:** Raunak Pradip Shah (Mirantis)
Kubernetes v1.24 introduces a new alpha-level feature that prevents unauthorised users
from modifying the volume mode of a [`PersistentVolumeClaim`](/docs/concepts/storage/persistent-volumes/) created from an
existing [`VolumeSnapshot`](/docs/concepts/storage/volume-snapshots/) in the Kubernetes cluster.
### The problem
The [Volume Mode](/docs/concepts/storage/persistent-volumes/#volume-mode) determines whether a volume
is formatted into a filesystem or presented as a raw block device.
Users can leverage the `VolumeSnapshot` feature, which has been stable since Kubernetes v1.20,
to create a `PersistentVolumeClaim` (shortened as PVC) from an existing `VolumeSnapshot` in
the Kubernetes cluster. The PVC spec includes a `dataSource` field, which can point to an
existing `VolumeSnapshot` instance.
Visit [Create a PersistentVolumeClaim from a Volume Snapshot](/docs/concepts/storage/persistent-volumes/#create-persistent-volume-claim-from-volume-snapshot) for more details.
When leveraging the above capability, there is no logic that validates whether the mode of the
original volume, whose snapshot was taken, matches the mode of the newly created volume.
This presents a security gap that allows malicious users to potentially exploit an
as-yet-unknown vulnerability in the host operating system.
Many popular storage backup vendors convert the volume mode during the course of a
backup operation, for efficiency purposes, which prevents Kubernetes from blocking
the operation completely and presents a challenge in distinguishing trusted
users from malicious ones.
### Preventing unauthorised users from converting the volume mode
In this context, an authorised user is one who has access rights to perform `Update`
or `Patch` operations on `VolumeSnapshotContents`, which is a cluster-level resource.
It is upto the cluster administrator to provide these rights only to trusted users
or applications, like backup vendors.
If the alpha feature is [enabled](https://kubernetes-csi.github.io/docs/) in
`snapshot-controller`, `snapshot-validation-webhook` and `external-provisioner`,
then unauthorised users will not be allowed to modify the volume mode of a PVC
when it is being created from a `VolumeSnapshot`.
To convert the volume mode, an authorised user must do the following:
1. Identify the `VolumeSnapshot` that is to be used as the data source for a newly
created PVC in the given namespace.
2. Identify the `VolumeSnapshotContent` bound to the above `VolumeSnapshot`.
```shell
kubectl get volumesnapshot -n <namespace>
```
3. Add the annotation [`snapshot.storage.kubernetes.io/allowVolumeModeChange`](/docs/reference/labels-annotations-taints/#snapshot-storage-kubernetes-io-allowvolumemodechange)
to the `VolumeSnapshotContent`.
4. This annotation can be added either via software or manually by the authorised
user. The `VolumeSnapshotContent` annotation must look like following manifest fragment:
```yaml
kind: VolumeSnapshotContent
metadata:
annotations:
- snapshot.storage.kubernetes.io/allowVolumeModeChange: "true"
...
```
**Note**: For pre-provisioned `VolumeSnapshotContents`, you must take an extra
step of setting `spec.sourceVolumeMode` field to either `Filesystem` or `Block`,
depending on the mode of the volume from which this snapshot was taken.
An example is shown below:
```yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
annotations:
- snapshot.storage.kubernetes.io/allowVolumeModeChange: "true"
name: new-snapshot-content-test
spec:
deletionPolicy: Delete
driver: hostpath.csi.k8s.io
source:
snapshotHandle: 7bdd0de3-aaeb-11e8-9aae-0242ac110002
sourceVolumeMode: Filesystem
volumeSnapshotRef:
name: new-snapshot-test
namespace: default
```
Repeat steps 1 to 3 for all `VolumeSnapshotContents` whose volume mode needs to be
converted during a backup or restore operation.
If the annotation shown in step 4 above is present on a `VolumeSnapshotContent`
object, Kubernetes will not prevent the volume mode from being converted.
Users should keep this in mind before they attempt to add the annotation
to any `VolumeSnapshotContent`.
### What's next
[Enable this feature](https://kubernetes-csi.github.io/docs/) and let us know
what you think!
We hope this feature causes no disruption to existing workflows while preventing
malicious users from exploiting security vulnerabilities in their clusters.
For any queries or issues, join [Kubernetes on Slack](https://slack.k8s.io/) and
create a thread in the #sig-storage channel. Alternately, create an issue in the
CSI external-snapshotter [repository](https://github.com/kubernetes-csi/external-snapshotter).

View File

@ -0,0 +1,96 @@
---
layout: blog
title: "Kubernetes 1.24: Introducing Non-Graceful Node Shutdown Alpha"
date: 2022-05-20
slug: kubernetes-1-24-non-graceful-node-shutdown-alpha
---
**Authors** Xing Yang and Yassine Tijani (VMware)
Kubernetes v1.24 introduces alpha support for [Non-Graceful Node Shutdown](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown). This feature allows stateful workloads to failover to a different node after the original node is shutdown or in a non-recoverable state such as hardware failure or broken OS.
## How is this different from Graceful Node Shutdown
You might have heard about the [Graceful Node Shutdown](/docs/concepts/architecture/nodes/#graceful-node-shutdown) capability of Kubernetes,
and are wondering how the Non-Graceful Node Shutdown feature is different from that. Graceful Node Shutdown
allows Kubernetes to detect when a node is shutting down cleanly, and handles that situation appropriately.
A Node Shutdown can be "graceful" only if the node shutdown action can be detected by the kubelet ahead
of the actual shutdown. However, there are cases where a node shutdown action may not be detected by
the kubelet. This could happen either because the shutdown command does not trigger the systemd inhibitor
locks mechanism that kubelet relies upon, or because of a configuration error
(the `ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods` are not configured properly).
Graceful node shutdown relies on Linux-specific support. The kubelet does not watch for upcoming
shutdowns on Windows nodes (this may change in a future Kubernetes release).
When a node is shutdown but without the kubelet detecting it, pods on that node
also shut down ungracefully. For stateless apps, that's often not a problem (a ReplicaSet adds a new pod once
the cluster detects that the affected node or pod has failed). For stateful apps, the story is more complicated.
If you use a StatefulSet and have a pod from that StatefulSet on a node that fails uncleanly, that affected pod
will be marked as terminating; the StatefulSet cannot create a replacement pod because the pod
still exists in the cluster.
As a result, the application running on the StatefulSet may be degraded or even offline. If the original, shut
down node comes up again, the kubelet on that original node reports in, deletes the existing pods, and
the control plane makes a replacement pod for that StatefulSet on a different running node.
If the original node has failed and does not come up, those stateful pods would be stuck in a
terminating status on that failed node indefinitely.
```
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
web-0 1/1 Running 0 100m 10.244.2.4 k8s-node-876-1639279816 <none> <none>
web-1 1/1 Terminating 0 100m 10.244.1.3 k8s-node-433-1639279804 <none> <none>
```
## Try out the new non-graceful shutdown handling
To use the non-graceful node shutdown handling, you must enable the `NodeOutOfServiceVolumeDetach`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) for the `kube-controller-manager`
component.
In the case of a node shutdown, you can manually taint that node as out of service. You should make certain that
the node is truly shutdown (not in the middle of restarting) before you add that taint. You could add that
taint following a shutdown that the kubelet did not detect and handle in advance; another case where you
can use that taint is when the node is in a non-recoverable state due to a hardware failure or a broken OS.
The values you set for that taint can be `node.kubernetes.io/out-of-service=nodeshutdown: "NoExecute"`
or `node.kubernetes.io/out-of-service=nodeshutdown:" NoSchedule"`.
Provided you have enabled the feature gate mentioned earlier, setting the out-of-service taint on a Node
means that pods on the node will be deleted unless if there are matching tolerations on the pods.
Persistent volumes attached to the shutdown node will be detached, and for StatefulSets, replacement pods will
be created successfully on a different running node.
```
$ kubectl taint nodes <node-name> node.kubernetes.io/out-of-service=nodeshutdown:NoExecute
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
web-0 1/1 Running 0 150m 10.244.2.4 k8s-node-876-1639279816 <none> <none>
web-1 1/1 Running 0 10m 10.244.1.7 k8s-node-433-1639279804 <none> <none>
```
Note: Before applying the out-of-service taint, you **must** verify that a node is already in shutdown or power off state (not in the middle of restarting), either because the user intentionally shut it down or the node is down due to hardware failures, OS issues, etc.
Once all the workload pods that are linked to the out-of-service node are moved to a new running node, and the shutdown node has been recovered, you should remove
that taint on the affected node after the node is recovered.
If you know that the node will not return to service, you could instead delete the node from the cluster.
## Whats next?
Depending on feedback and adoption, the Kubernetes team plans to push the Non-Graceful Node Shutdown implementation to Beta in either 1.25 or 1.26.
This feature requires a user to manually add a taint to the node to trigger workloads failover and remove the taint after the node is recovered. In the future, we plan to find ways to automatically detect and fence nodes that are shutdown/failed and automatically failover workloads to another node.
## How can I learn more?
Check out the [documentation](/docs/concepts/architecture/nodes/#non-graceful-node-shutdown)
for non-graceful node shutdown.
## How to get involved?
This feature has a long story. Yassine Tijani ([yastij](https://github.com/yastij)) started the KEP more than two years ago. Xing Yang ([xing-yang](https://github.com/xing-yang)) continued to drive the effort. There were many discussions among SIG Storage, SIG Node, and API reviewers to nail down the design details. Ashutosh Kumar ([sonasingh46](https://github.com/sonasingh46)) did most of the implementation and brought it to Alpha in Kubernetes 1.24.
We want to thank the following people for their insightful reviews: Tim Hockin ([thockin](https://github.com/thockin)) for his guidance on the design, Jing Xu ([jingxu97](https://github.com/jingxu97)), Hemant Kumar ([gnufied](https://github.com/gnufied)), and Michelle Au ([msau42](https://github.com/msau42)) for reviews from SIG Storage side, and Mrunal Patel ([mrunalp](https://github.com/mrunalp)), David Porter ([bobbypage](https://github.com/bobbypage)), Derek Carr ([derekwaynecarr](https://github.com/derekwaynecarr)), and Danielle Endocrimes ([endocrimes](https://github.com/endocrimes)) for reviews from SIG Node side.
There are many people who have helped review the design and implementation along the way. We want to thank everyone who has contributed to this effort including the about 30 people who have reviewed the [KEP](https://github.com/kubernetes/enhancements/pull/1116) and implementation over the last couple of years.
This feature is a collaboration between SIG Storage and SIG Node. For those interested in getting involved with the design and development of any part of the Kubernetes Storage system, join the [Kubernetes Storage Special Interest Group](https://github.com/kubernetes/community/tree/master/sig-storage) (SIG). For those interested in getting involved with the design and development of the components that support the controlled interactions between pods and host resources, join the [Kubernetes Node SIG](https://github.com/kubernetes/community/tree/master/sig-node).

View File

@ -0,0 +1,137 @@
---
layout: blog
title: "Kubernetes 1.24: Avoid Collisions Assigning IP Addresses to Services"
date: 2022-05-23
slug: service-ip-dynamic-and-static-allocation
---
**Author:** Antonio Ojea (Red Hat)
In Kubernetes, [Services](/docs/concepts/services-networking/service/) are an abstract way to expose
an application running on a set of Pods. Services
can have a cluster-scoped virtual IP address (using a Service of `type: ClusterIP`).
Clients can connect using that virtual IP address, and Kubernetes then load-balances traffic to that
Service across the different backing Pods.
## How Service ClusterIPs are allocated?
A Service `ClusterIP` can be assigned:
_dynamically_
: the cluster's control plane automatically picks a free IP address from within the configured IP range for `type: ClusterIP` Services.
_statically_
: you specify an IP address of your choice, from within the configured IP range for Services.
Across your whole cluster, every Service `ClusterIP` must be unique.
Trying to create a Service with a specific `ClusterIP` that has already
been allocated will return an error.
## Why do you need to reserve Service Cluster IPs?
Sometimes you may want to have Services running in well-known IP addresses, so other components and
users in the cluster can use them.
The best example is the DNS Service for the cluster. Some Kubernetes installers assign the 10th address from
the Service IP range to the DNS service. Assuming you configured your cluster with Service IP range
10.96.0.0/16 and you want your DNS Service IP to be 10.96.0.10, you'd have to create a Service like
this:
```yaml
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: CoreDNS
name: kube-dns
namespace: kube-system
spec:
clusterIP: 10.96.0.10
ports:
- name: dns
port: 53
protocol: UDP
targetPort: 53
- name: dns-tcp
port: 53
protocol: TCP
targetPort: 53
selector:
k8s-app: kube-dns
type: ClusterIP
```
but as I explained before, the IP address 10.96.0.10 has not been reserved; if other Services are created
before or in parallel with dynamic allocation, there is a chance they can allocate this IP, hence,
you will not be able to create the DNS Service because it will fail with a conflict error.
## How can you avoid Service ClusterIP conflicts? {#avoid-ClusterIP-conflict}
In Kubernetes 1.24, you can enable a new feature gate `ServiceIPStaticSubrange`.
Turning this on allows you to use a different IP
allocation strategy for Services, reducing the risk of collision.
The `ClusterIP` range will be divided, based on the formula `min(max(16, cidrSize / 16), 256)`,
described as _never less than 16 or more than 256 with a graduated step between them_.
Dynamic IP assignment will use the upper band by default, once this has been exhausted it will
use the lower range. This will allow users to use static allocations on the lower band with a low
risk of collision.
Examples:
#### Service IP CIDR block: 10.96.0.0/24
Range Size: 2<sup>8</sup> - 2 = 254
Band Offset: `min(max(16,256/16),256)` = `min(16,256)` = 16
Static band start: 10.96.0.1
Static band end: 10.96.0.16
Range end: 10.96.0.254
{{< mermaid >}}
pie showData
title 10.96.0.0/24
"Static" : 16
"Dynamic" : 238
{{< /mermaid >}}
#### Service IP CIDR block: 10.96.0.0/20
Range Size: 2<sup>12</sup> - 2 = 4094
Band Offset: `min(max(16,256/16),256)` = `min(256,256)` = 256
Static band start: 10.96.0.1
Static band end: 10.96.1.0
Range end: 10.96.15.254
{{< mermaid >}}
pie showData
title 10.96.0.0/20
"Static" : 256
"Dynamic" : 3838
{{< /mermaid >}}
#### Service IP CIDR block: 10.96.0.0/16
Range Size: 2<sup>16</sup> - 2 = 65534
Band Offset: `min(max(16,65536/16),256)` = `min(4096,256)` = 256
Static band start: 10.96.0.1
Static band ends: 10.96.1.0
Range end: 10.96.255.254
{{< mermaid >}}
pie showData
title 10.96.0.0/16
"Static" : 256
"Dynamic" : 65278
{{< /mermaid >}}
## Get involved with SIG Network
The current SIG-Network [KEPs](https://github.com/orgs/kubernetes/projects/10) and [issues](https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Asig%2Fnetwork) on GitHub illustrate the SIGs areas of emphasis.
[SIG Network meetings](https://github.com/kubernetes/community/tree/master/sig-network) are a friendly, welcoming venue for you to connect with the community and share your ideas.
Looking forward to hearing from you!

View File

@ -0,0 +1,251 @@
---
layout: blog
title: "Contextual Logging in Kubernetes 1.24"
date: 2022-05-25
slug: contextual-logging
canonicalUrl: https://kubernetes.dev/blog/2022/05/25/contextual-logging/
---
**Authors:** Patrick Ohly (Intel)
The [Structured Logging Working
Group](https://github.com/kubernetes/community/blob/master/wg-structured-logging/README.md)
has added new capabilities to the logging infrastructure in Kubernetes
1.24. This blog post explains how developers can take advantage of those to
make log output more useful and how they can get involved with improving Kubernetes.
## Structured logging
The goal of [structured
logging](https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/1602-structured-logging/README.md)
is to replace C-style formatting and the resulting opaque log strings with log
entries that have a well-defined syntax for storing message and parameters
separately, for example as a JSON struct.
When using the traditional klog text output format for structured log calls,
strings were originally printed with `\n` escape sequences, except when
embedded inside a struct. For structs, log entries could still span multiple
lines, with no clean way to split the log stream into individual entries:
```
I1112 14:06:35.783529 328441 structured_logging.go:51] "using InfoS" longData={Name:long Data:Multiple
lines
with quite a bit
of text. internal:0}
I1112 14:06:35.783549 328441 structured_logging.go:52] "using InfoS with\nthe message across multiple lines" int=1 stringData="long: Multiple\nlines\nwith quite a bit\nof text." str="another value"
```
Now, the `<` and `>` markers along with indentation are used to ensure that splitting at a
klog header at the start of a line is reliable and the resulting output is human-readable:
```
I1126 10:31:50.378204 121736 structured_logging.go:59] "using InfoS" longData=<
{Name:long Data:Multiple
lines
with quite a bit
of text. internal:0}
>
I1126 10:31:50.378228 121736 structured_logging.go:60] "using InfoS with\nthe message across multiple lines" int=1 stringData=<
long: Multiple
lines
with quite a bit
of text.
> str="another value"
```
Note that the log message itself is printed with quoting. It is meant to be a
fixed string that identifies a log entry, so newlines should be avoided there.
Before Kubernetes 1.24, some log calls in kube-scheduler still used `klog.Info`
for multi-line strings to avoid the unreadable output. Now all log calls have
been updated to support structured logging.
## Contextual logging
[Contextual logging](https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/3077-contextual-logging/README.md)
is based on the [go-logr API](https://github.com/go-logr/logr#a-minimal-logging-api-for-go). The key
idea is that libraries are passed a logger instance by their caller and use
that for logging instead of accessing a global logger. The binary decides about
the logging implementation, not the libraries. The go-logr API is designed
around structured logging and supports attaching additional information to a
logger.
This enables additional use cases:
- The caller can attach additional information to a logger:
- [`WithName`](https://pkg.go.dev/github.com/go-logr/logr#Logger.WithName) adds a prefix
- [`WithValues`](https://pkg.go.dev/github.com/go-logr/logr#Logger.WithValues) adds key/value pairs
When passing this extended logger into a function and a function uses it
instead of the global logger, the additional information is
then included in all log entries, without having to modify the code that
generates the log entries. This is useful in highly parallel applications
where it can become hard to identify all log entries for a certain operation
because the output from different operations gets interleaved.
- When running unit tests, log output can be associated with the current test.
Then when a test fails, only the log output of the failed test gets shown
by `go test`. That output can also be more verbose by default because it
will not get shown for successful tests. Tests can be run in parallel
without interleaving their output.
One of the design decisions for contextual logging was to allow attaching a
logger as value to a `context.Context`. Since the logger encapsulates all
aspects of the intended logging for the call, it is *part* of the context and
not just *using* it. A practical advantage is that many APIs already have a
`ctx` parameter or adding one has additional advantages, like being able to get
rid of `context.TODO()` calls inside the functions.
Another decision was to not break compatibility with klog v2:
- Libraries that use the traditional klog logging calls in a binary that has
set up contextual logging will work and log through the logging backend
chosen by the binary. However, such log output will not include the
additional information and will not work well in unit tests, so libraries
should be modified to support contextual logging. The [migration guide](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md)
for structured logging has been extended to also cover contextual logging.
- When a library supports contextual logging and retrieves a logger from its
context, it will still work in a binary that does not initialize contextual
logging because it will get a logger that logs through klog.
In Kubernetes 1.24, contextual logging is a new alpha feature with
`ContextualLogging` as feature gate. When disabled (the default), the new klog
API calls for contextual logging (see below) become no-ops to avoid performance
or functional regressions.
No Kubernetes component has been converted yet. An [example program](https://github.com/kubernetes/kubernetes/blob/v1.24.0-beta.0/staging/src/k8s.io/component-base/logs/example/cmd/logger.go)
in the Kubernetes repository demonstrates how to enable contextual logging in a
binary and how the output depends on the binary's parameters:
```console
$ cd $GOPATH/src/k8s.io/kubernetes/staging/src/k8s.io/component-base/logs/example/cmd/
$ go run . --help
...
--feature-gates mapStringBool A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
AllAlpha=true|false (ALPHA - default=false)
AllBeta=true|false (BETA - default=false)
ContextualLogging=true|false (ALPHA - default=false)
$ go run . --feature-gates ContextualLogging=true
...
I0404 18:00:02.916429 451895 logger.go:94] "example/myname: runtime" foo="bar" duration="1m0s"
I0404 18:00:02.916447 451895 logger.go:95] "example: another runtime" foo="bar" duration="1m0s"
```
The `example` prefix and `foo="bar"` were added by the caller of the function
which logs the `runtime` message and `duration="1m0s"` value.
The sample code for klog includes an
[example](https://github.com/kubernetes/klog/blob/v2.60.1/ktesting/example/example_test.go)
for a unit test with per-test output.
## klog enhancements
### Contextual logging API
The following calls manage the lookup of a logger:
[`FromContext`](https://pkg.go.dev/k8s.io/klog/v2#FromContext)
: from a `context` parameter, with fallback to the global logger
[`Background`](https://pkg.go.dev/k8s.io/klog/v2#Background)
: the global fallback, with no intention to support contextual logging
[`TODO`](https://pkg.go.dev/k8s.io/klog/v2#TODO)
: the global fallback, but only as a temporary solution until the function gets extended to accept
a logger through its parameters
[`SetLoggerWithOptions`](https://pkg.go.dev/k8s.io/klog/v2#SetLoggerWithOptions)
: changes the fallback logger; when called with [`ContextualLogger(true)`](https://pkg.go.dev/k8s.io/klog/v2#ContextualLogger),
the logger is ready to be called directly, in which case logging will be done
without going through klog
To support the feature gate mechanism in Kubernetes, klog has wrapper calls for
the corresponding go-logr calls and a global boolean controlling their behavior:
- [`LoggerWithName`](https://pkg.go.dev/k8s.io/klog/v2#LoggerWithName)
- [`LoggerWithValues`](https://pkg.go.dev/k8s.io/klog/v2#LoggerWithValues)
- [`NewContext`](https://pkg.go.dev/k8s.io/klog/v2#NewContext)
- [`EnableContextualLogging`](https://pkg.go.dev/k8s.io/klog/v2#EnableContextualLogging)
Usage of those functions in Kubernetes code is enforced with a linter
check. The klog default for contextual logging is to enable the functionality
because it is considered stable in klog. It is only in Kubernetes binaries
where that default gets overridden and (in some binaries) controlled via the
`--feature-gate` parameter.
### ktesting logger
The new [ktesting](https://pkg.go.dev/k8s.io/klog/v2@v2.60.1/ktesting) package
implements logging through `testing.T` using klog's text output format. It has
a [single API call](https://pkg.go.dev/k8s.io/klog/v2@v2.60.1/ktesting#NewTestContext) for
instrumenting a test case and [support for command line flags](https://pkg.go.dev/k8s.io/klog/v2@v2.60.1/ktesting/init).
### klogr
[`klog/klogr`](https://pkg.go.dev/k8s.io/klog/v2@v2.60.1/klogr) continues to be
supported and it's default behavior is unchanged: it formats structured log
entries using its own, custom format and prints the result via klog.
However, this usage is discouraged because that format is neither
machine-readable (in contrast to real JSON output as produced by zapr, the
go-logr implementation used by Kubernetes) nor human-friendly (in contrast to
the klog text format).
Instead, a klogr instance should be created with
[`WithFormat(FormatKlog)`](https://pkg.go.dev/k8s.io/klog/v2@v2.60.1/klogr#WithFormat)
which chooses the klog text format. A simpler construction method with the same
result is the new
[`klog.NewKlogr`](https://pkg.go.dev/k8s.io/klog/v2#NewKlogr). That is the
logger that klog returns as fallback when nothing else is configured.
### Reusable output test
A lot of go-logr implementations have very similar unit tests where they check
the result of certain log calls. If a developer didn't know about certain
caveats like for example a `String` function that panics when called, then it
is likely that both the handling of such caveats and the unit test are missing.
[`klog.test`](https://pkg.go.dev/k8s.io/klog/v2@v2.60.1/test) is a reusable set
of test cases that can be applied to a go-logr implementation.
### Output flushing
klog used to start a goroutine unconditionally during `init` which flushed
buffered data at a hard-coded interval. Now that goroutine is only started on
demand (i.e. when writing to files with buffering) and can be controlled with
[`StopFlushDaemon`](https://pkg.go.dev/k8s.io/klog/v2#StopFlushDaemon) and
[`StartFlushDaemon`](https://pkg.go.dev/k8s.io/klog/v2#StartFlushDaemon).
When a go-logr implementation buffers data, flushing that data can be
integrated into [`klog.Flush`](https://pkg.go.dev/k8s.io/klog/v2#Flush) by
registering the logger with the
[`FlushLogger`](https://pkg.go.dev/k8s.io/klog/v2#FlushLogger) option.
### Various other changes
For a description of all other enhancements see in the [release notes](https://github.com/kubernetes/klog/releases).
## logcheck
Originally designed as a linter for structured log calls, the
[`logcheck`](https://github.com/kubernetes/klog/tree/788efcdee1e9be0bfbe5b076343d447314f2377e/hack/tools/logcheck)
tool has been enhanced to support also contextual logging and traditional klog
log calls. These enhanced checks already found bugs in Kubernetes, like calling
`klog.Info` instead of `klog.Infof` with a format string and parameters.
It can be included as a plugin in a `golangci-lint` invocation, which is how
[Kubernetes uses it now](https://github.com/kubernetes/kubernetes/commit/17e3c555c5115f8c9176bae10ba45baa04d23a7b),
or get invoked stand-alone.
We are in the process of [moving the tool](https://github.com/kubernetes/klog/issues/312) into a new repository because it isn't
really related to klog and its releases should be tracked and tagged properly.
## Next steps
The [Structured Logging WG](https://github.com/kubernetes/community/tree/master/wg-structured-logging)
is always looking for new contributors. The migration
away from C-style logging is now going to target structured, contextual logging
in one step to reduce the overall code churn and number of PRs. Changing log
calls is good first contribution to Kubernetes and an opportunity to get to
know code in various different areas.

View File

@ -0,0 +1,148 @@
---
layout: blog
title: 'Kubernetes 1.24: Maximum Unavailable Replicas for StatefulSet'
date: 2022-05-27
slug: maxunavailable-for-statefulset
---
**Author:** Mayank Kumar (Salesforce)
Kubernetes [StatefulSets](/docs/concepts/workloads/controllers/statefulset/), since their introduction in
1.5 and becoming stable in 1.9, have been widely used to run stateful applications. They provide stable pod identity, persistent
per pod storage and ordered graceful deployment, scaling and rolling updates. You can think of StatefulSet as the atomic building
block for running complex stateful applications. As the use of Kubernetes has grown, so has the number of scenarios requiring
StatefulSets. Many of these scenarios, require faster rolling updates than the currently supported one-pod-at-a-time updates, in the
case where you're using the `OrderedReady` Pod management policy for a StatefulSet.
Here are some examples:
- I am using a StatefulSet to orchestrate a multi-instance, cache based application where the size of the cache is large. The cache
starts cold and requires some siginificant amount of time before the container can start. There could be more initial startup tasks
that are required. A RollingUpdate on this StatefulSet would take a lot of time before the application is fully updated. If the
StatefulSet supported updating more than one pod at a time, it would result in a much faster update.
- My stateful application is composed of leaders and followers or one writer and multiple readers. I have multiple readers or
followers and my application can tolerate multiple pods going down at the same time. I want to update this application more than
one pod at a time so that i get the new updates rolled out quickly, especially if the number of instances of my application are
large. Note that my application still requires unique identity per pod.
In order to support such scenarios, Kubernetes 1.24 includes a new alpha feature to help. Before you can use the new feature you must
enable the `MaxUnavailableStatefulSet` feature flag. Once you enable that, you can specify a new field called `maxUnavailable`, part
of the `spec` for a StatefulSet. For example:
```
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
namespace: default
spec:
podManagementPolicy: OrderedReady # you must set OrderedReady
replicas: 5
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: k8s.gcr.io/nginx-slim:0.8
imagePullPolicy: IfNotPresent
name: nginx
updateStrategy:
rollingUpdate:
maxUnavailable: 2 # this is the new alpha field, whose default value is 1
partition: 0
type: RollingUpdate
```
If you enable the new feature and you don't specify a value for `maxUnavailable` in a StatefulSet, Kubernetes applies a default
`maxUnavailable: 1`. This matches the behavior you would see if you don't enable the new feature.
I'll run through a scenario based on that example manifest to demonstrate how this feature works. I will deploy a StatefulSet that
has 5 replicas, with `maxUnavailable` set to 2 and `partition` set to 0.
I can trigger a rolling update by changing the image to `k8s.gcr.io/nginx-slim:0.9`. Once I initiate the rolling update, I can
watch the pods update 2 at a time as the current value of maxUnavailable is 2. The below output shows a span of time and is not
complete. The maxUnavailable can be an absolute number (for example, 2) or a percentage of desired Pods (for example, 10%). The
absolute number is calculated from percentage by rounding down.
```
kubectl get pods --watch
```
```
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 85s
web-1 1/1 Running 0 2m6s
web-2 1/1 Running 0 106s
web-3 1/1 Running 0 2m47s
web-4 1/1 Running 0 2m27s
web-4 1/1 Terminating 0 5m43s ----> start terminating 4
web-3 1/1 Terminating 0 6m3s ----> start terminating 3
web-3 0/1 Terminating 0 6m7s
web-3 0/1 Pending 0 0s
web-3 0/1 Pending 0 0s
web-4 0/1 Terminating 0 5m48s
web-4 0/1 Terminating 0 5m48s
web-3 0/1 ContainerCreating 0 2s
web-3 1/1 Running 0 2s
web-4 0/1 Pending 0 0s
web-4 0/1 Pending 0 0s
web-4 0/1 ContainerCreating 0 0s
web-4 1/1 Running 0 1s
web-2 1/1 Terminating 0 5m46s ----> start terminating 2 (only after both 4 and 3 are running)
web-1 1/1 Terminating 0 6m6s ----> start terminating 1
web-2 0/1 Terminating 0 5m47s
web-1 0/1 Terminating 0 6m7s
web-1 0/1 Pending 0 0s
web-1 0/1 Pending 0 0s
web-1 0/1 ContainerCreating 0 1s
web-1 1/1 Running 0 2s
web-2 0/1 Pending 0 0s
web-2 0/1 Pending 0 0s
web-2 0/1 ContainerCreating 0 0s
web-2 1/1 Running 0 1s
web-0 1/1 Terminating 0 6m6s ----> start terminating 0 (only after 2 and 1 are running)
web-0 0/1 Terminating 0 6m7s
web-0 0/1 Pending 0 0s
web-0 0/1 Pending 0 0s
web-0 0/1 ContainerCreating 0 0s
web-0 1/1 Running 0 1s
```
Note that as soon as the rolling update starts, both 4 and 3 (the two highest ordinal pods) start terminating at the same time. Pods
with ordinal 4 and 3 may become ready at their own pace. As soon as both pods 4 and 3 are ready, pods 2 and 1 start terminating at the
same time. When pods 2 and 1 are both running and ready, pod 0 starts terminating.
In Kubernetes, updates to StatefulSets follow a strict ordering when updating Pods. In this example, the update starts at replica 4, then
replica 3, then replica 2, and so on, one pod at a time. When going one pod at a time, its not possible for 3 to be running and ready
before 4. When `maxUnavailable` is more than 1 (in the example scenario I set `maxUnavailable` to 2), it is possible that replica 3 becomes
ready and running before replica 4 is ready&mdash;and that is ok. If you're a developer and you set `maxUnavailable` to more than 1, you should
know that this outcome is possible and you must ensure that your application is able to handle such ordering issues that occur
if any. When you set `maxUnavailable` greater than 1, the ordering is guaranteed in between each batch of pods being updated. That guarantee
means that pods in update batch 2 (replicas 2 and 1) cannot start updating until the pods from batch 0 (replicas 4 and 3) are ready.
Although Kubernetes refers to these as _replicas_, your stateful application may have a different view and each pod of the StatefulSet may
be holding completely different data than other pods. The important thing here is that updates to StatefulSets happen in batches, and you can
now have a batch size larger than 1 (as an alpha feature).
Also note, that the above behavior is with `podManagementPolicy: OrderedReady`. If you defined a StatefulSet as `podManagementPolicy: Parallel`,
not only `maxUnavailable` number of replicas are terminated at the same time; `maxUnavailable` number of replicas start in `ContainerCreating`
phase at the same time as well. This is called bursting.
So, now you may have a lot of questions about:-
- What is the behavior when you set `podManagementPolicy: Parallel`?
- What is the behavior when `partition` to a value other than `0`?
It might be better to try and see it for yourself. This is an alpha feature, and the Kubernetes contributors are looking for feedback on this feature. Did
this help you achieve your stateful scenarios Did you find a bug or do you think the behavior as implemented is not intuitive or can
break applications or catch them by surprise? Please [open an issue](https://github.com/kubernetes/kubernetes/issues) to let us know.
## Further reading and next steps {#next-steps}
- [Maximum unavailable Pods](/docs/concepts/workloads/controllers/statefulset/#maximum-unavailable-pods)
- [KEP for MaxUnavailable for StatefulSet](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/961-maxunavailable-for-statefulset)
- [Implementation](https://github.com/kubernetes/kubernetes/pull/82162/files)
- [Enhancement Tracking Issue](https://github.com/kubernetes/enhancements/issues/961)

View File

@ -0,0 +1,19 @@
---
layout: blog
title: "Annual Report Summary 2021"
date: 2022-06-01
slug: annual-report-summary-2021
---
**Author:** Paris Pittman (Steering Committee)
Last year, we published our first [Annual Report Summary](/blog/2021/06/28/announcing-kubernetes-community-group-annual-reports/) for 2020 and it's already time for our second edition!
[2021 Annual Report Summary](https://www.cncf.io/reports/kubernetes-annual-report-2021/)
This summary reflects the work that has been done in 2021 and the initiatives on deck for the rest of 2022. Please forward to organizations and indidviduals participating in upstream activities, planning cloud native strategies, and/or those looking to help out. To find a specific community group's complete report, go to the [kubernetes/community repo](https://github.com/kubernetes/community) under the groups folder. Example: [sig-api-machinery/annual-report-2021.md](https://github.com/kubernetes/community/blob/master/sig-api-machinery/annual-report-2021.md)
Youll see that this report summary is a growth area in itself. It takes us roughly 6 months to prepare and execute, which isnt helpful or valuable to anyone as a fast moving project with short and long term needs. How can we make this better? Provide your feedback here: https://github.com/kubernetes/steering/issues/242
Reference:
[Annual Report Documentation](https://github.com/kubernetes/community/blob/master/committee-steering/governance/annual-reports.md)

View File

@ -312,16 +312,18 @@ controller deletes the node from its list of nodes.
The third is monitoring the nodes' health. The node controller is
responsible for:
- In the case that a node becomes unreachable, updating the NodeReady condition
of within the Node's `.status`. In this case the node controller sets the
NodeReady condition to `ConditionUnknown`.
- In the case that a node becomes unreachable, updating the `Ready` condition
in the Node's `.status` field. In this case the node controller sets the
`Ready` condition to `Unknown`.
- If a node remains unreachable: triggering
[API-initiated eviction](/docs/concepts/scheduling-eviction/api-eviction/)
for all of the Pods on the unreachable node. By default, the node controller
waits 5 minutes between marking the node as `ConditionUnknown` and submitting
waits 5 minutes between marking the node as `Unknown` and submitting
the first eviction request.
The node controller checks the state of each node every `--node-monitor-period` seconds.
By default, the node controller checks the state of each node every 5 seconds.
This period can be configured using the `--node-monitor-period` flag on the
`kube-controller-manager` component.
### Rate limits on eviction
@ -331,7 +333,7 @@ from more than 1 node per 10 seconds.
The node eviction behavior changes when a node in a given availability zone
becomes unhealthy. The node controller checks what percentage of nodes in the zone
are unhealthy (NodeReady condition is `ConditionUnknown` or `ConditionFalse`) at
are unhealthy (the `Ready` condition is `Unknown` or `False`) at
the same time:
- If the fraction of unhealthy nodes is at least `--unhealthy-zone-threshold`
@ -384,7 +386,7 @@ If you want to explicitly reserve resources for non-Pod processes, see
## Node topology
{{< feature-state state="alpha" for_k8s_version="v1.16" >}}
{{< feature-state state="beta" for_k8s_version="v1.18" >}}
If you have enabled the `TopologyManager`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/), then
@ -412,7 +414,7 @@ enabled by default in 1.21.
Note that by default, both configuration options described below,
`shutdownGracePeriod` and `shutdownGracePeriodCriticalPods` are set to zero,
thus not activating Graceful node shutdown functionality.
thus not activating the graceful node shutdown functionality.
To activate the feature, the two kubelet config settings should be configured appropriately and
set to non-zero values.
@ -450,6 +452,56 @@ Reason: Terminated
Message: Pod was terminated in response to imminent node shutdown.
```
{{< /note >}}
## Non Graceful node shutdown {#non-graceful-node-shutdown}
{{< feature-state state="alpha" for_k8s_version="v1.24" >}}
A node shutdown action may not be detected by kubelet's Node Shutdown Mananger,
either because the command does not trigger the inhibitor locks mechanism used by
kubelet or because of a user error, i.e., the ShutdownGracePeriod and
ShutdownGracePeriodCriticalPods are not configured properly. Please refer to above
section [Graceful Node Shutdown](#graceful-node-shutdown) for more details.
When a node is shutdown but not detected by kubelet's Node Shutdown Manager, the pods
that are part of a StatefulSet will be stuck in terminating status on
the shutdown node and cannot move to a new running node. This is because kubelet on
the shutdown node is not available to delete the pods so the StatefulSet cannot
create a new pod with the same name. If there are volumes used by the pods, the
VolumeAttachments will not be deleted from the original shutdown node so the volumes
used by these pods cannot be attached to a new running node. As a result, the
application running on the StatefulSet cannot function properly. If the original
shutdown node comes up, the pods will be deleted by kubelet and new pods will be
created on a different running node. If the original shutdown node does not come up,
these pods will be stuck in terminating status on the shutdown node forever.
To mitigate the above situation, a user can manually add the taint `node
kubernetes.io/out-of-service` with either `NoExecute` or `NoSchedule` effect to
a Node marking it out-of-service.
If the `NodeOutOfServiceVolumeDetach` [feature gate](/docs/reference/
command-line-tools-reference/feature-gates/) is enabled on
`kube-controller-manager`, and a Node is marked out-of-service with this taint, the
pods on the node will be forcefully deleted if there are no matching tolerations on
it and volume detach operations for the pods terminating on the node will happen
immediately. This allows the Pods on the out-of-service node to recover quickly on a
different node.
During a non-graceful shutdown, Pods are terminated in the two phases:
1. Force delete the Pods that do not have matching `out-of-service` tolerations.
2. Immediately perform detach volume operation for such pods.
{{< note >}}
- Before adding the taint `node.kubernetes.io/out-of-service` , it should be verified
that the node is already in shutdown or power off state (not in the middle of
restarting).
- The user is required to manually remove the out-of-service taint after the pods are
moved to a new node and the user has checked that the shutdown node has been
recovered since the user was the one who originally added the taint.
{{< /note >}}
### Pod Priority based graceful node shutdown {#pod-priority-graceful-node-shutdown}
@ -534,10 +586,18 @@ next priority class value range.
If this feature is enabled and no configuration is provided, then no ordering
action will be taken.
Using this feature, requires enabling the
`GracefulNodeShutdownBasedOnPodPriority` feature gate, and setting the kubelet
config's `ShutdownGracePeriodByPodPriority` to the desired configuration
containing the pod priority class values and their respective shutdown periods.
Using this feature requires enabling the `GracefulNodeShutdownBasedOnPodPriority`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
, and setting `ShutdownGracePeriodByPodPriority` in the
[kubelet config](/docs/reference/config-api/kubelet-config.v1beta1/)
to the desired configuration containing the pod priority class values and
their respective shutdown periods.
{{< note >}}
The ability to take Pod priority into account during graceful node shutdown was introduced
as an Alpha feature in Kubernetes v1.23. In Kubernetes {{< skew currentVersion >}}
the feature is Beta and is enabled by default.
{{< /note >}}
Metrics `graceful_shutdown_start_time_seconds` and `graceful_shutdown_end_time_seconds`
are emitted under the kubelet subsystem to monitor node shutdowns.

View File

@ -59,7 +59,7 @@ Before choosing a guide, here are some considerations:
* [Using Sysctls in a Kubernetes Cluster](/docs/tasks/administer-cluster/sysctl-cluster/) describes to an administrator how to use the `sysctl` command-line tool to set kernel parameters .
* [Auditing](/docs/tasks/debug-application-cluster/audit/) describes how to interact with Kubernetes' audit logs.
* [Auditing](/docs/tasks/debug/debug-cluster/audit/) describes how to interact with Kubernetes' audit logs.
### Securing the kubelet
* [Control Plane-Node communication](/docs/concepts/architecture/control-plane-node-communication/)

View File

@ -20,7 +20,7 @@ This page lists some of the available add-ons and links to their respective inst
* [Calico](https://docs.projectcalico.org/latest/introduction/) is a networking and network policy provider. Calico supports a flexible set of networking options so you can choose the most efficient option for your situation, including non-overlay and overlay networks, with or without BGP. Calico uses the same engine to enforce network policy for hosts, pods, and (if using Istio & Envoy) applications at the service mesh layer.
* [Canal](https://github.com/tigera/canal/tree/master/k8s-install) unites Flannel and Calico, providing networking and network policy.
* [Cilium](https://github.com/cilium/cilium) is a L3 network and network policy plugin that can enforce HTTP/API/L7 policies transparently. Both routing and overlay/encapsulation mode are supported, and it can work on top of other CNI plugins.
* [CNI-Genie](https://github.com/Huawei-PaaS/CNI-Genie) enables Kubernetes to seamlessly connect to a choice of CNI plugins, such as Calico, Canal, Flannel, Romana, or Weave.
* [CNI-Genie](https://github.com/Huawei-PaaS/CNI-Genie) enables Kubernetes to seamlessly connect to a choice of CNI plugins, such as Calico, Canal, Flannel, or Weave.
* [Contrail](https://www.juniper.net/us/en/products-services/sdn/contrail/contrail-networking/), based on [Tungsten Fabric](https://tungsten.io), is an open source, multi-cloud network virtualization and policy management platform. Contrail and Tungsten Fabric are integrated with orchestration systems such as Kubernetes, OpenShift, OpenStack and Mesos, and provide isolation modes for virtual machines, containers/pods and bare metal workloads.
* [Flannel](https://github.com/flannel-io/flannel#deploying-flannel-manually) is an overlay network provider that can be used with Kubernetes.
* [Knitter](https://github.com/ZTE/Knitter/) is a plugin to support multiple network interfaces in a Kubernetes pod.
@ -29,7 +29,7 @@ This page lists some of the available add-ons and links to their respective inst
* [OVN4NFV-K8S-Plugin](https://github.com/opnfv/ovn4nfv-k8s-plugin) is OVN based CNI controller plugin to provide cloud native based Service function chaining(SFC), Multiple OVN overlay networking, dynamic subnet creation, dynamic creation of virtual networks, VLAN Provider network, Direct provider network and pluggable with other Multi-network plugins, ideal for edge based cloud native workloads in Multi-cluster networking
* [NSX-T](https://docs.vmware.com/en/VMware-NSX-T/2.0/nsxt_20_ncp_kubernetes.pdf) Container Plug-in (NCP) provides integration between VMware NSX-T and container orchestrators such as Kubernetes, as well as integration between NSX-T and container-based CaaS/PaaS platforms such as Pivotal Container Service (PKS) and OpenShift.
* [Nuage](https://github.com/nuagenetworks/nuage-kubernetes/blob/v5.1.1-1/docs/kubernetes-1-installation.rst) is an SDN platform that provides policy-based networking between Kubernetes Pods and non-Kubernetes environments with visibility and security monitoring.
* **Romana** is a Layer 3 networking solution for pod networks that also supports the [NetworkPolicy API](/docs/concepts/services-networking/network-policies/). Kubeadm add-on installation details available [here](https://github.com/romana/romana/tree/master/containerize).
* [Romana](https://github.com/romana) is a Layer 3 networking solution for pod networks that also supports the [NetworkPolicy](/docs/concepts/services-networking/network-policies/) API.
* [Weave Net](https://www.weave.works/docs/net/latest/kubernetes/kube-addon/) provides networking and network policy, will carry on working on both sides of a network partition, and does not require an external database.
## Service Discovery

View File

@ -331,7 +331,7 @@ Thus, in a situation with a mixture of servers of different versions
there may be thrashing as long as different servers have different
opinions of the proper content of these objects.
Each `kube-apiserver` makes an inital maintenance pass over the
Each `kube-apiserver` makes an initial maintenance pass over the
mandatory and suggested configuration objects, and after that does
periodic maintenance (once per minute) of those objects.

View File

@ -461,7 +461,7 @@ That's it! The Deployment will declaratively update the deployed nginx applicati
## {{% heading "whatsnext" %}}
- Learn about [how to use `kubectl` for application introspection and debugging](/docs/tasks/debug-application-cluster/debug-application-introspection/).
- Learn about [how to use `kubectl` for application introspection and debugging](/docs/tasks/debug/debug-application/debug-running-pod/).
- See [Configuration Best Practices and Tips](/docs/concepts/configuration/overview/).

View File

@ -110,6 +110,55 @@ I1025 00:15:15.525108 1 example.go:116] "Example" data="This is text with
second line.}
```
### Contextual Logging
{{< feature-state for_k8s_version="v1.24" state="alpha" >}}
Contextual logging builds on top of structured logging. It is primarily about
how developers use logging calls: code based on that concept is more flexible
and supports additional use cases as described in the [Contextual Logging
KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/3077-contextual-logging).
If developers use additional functions like `WithValues` or `WithName` in
their components, then log entries contain additional information that gets
passed into functions by their caller.
Currently this is gated behind the `StructuredLogging` feature gate and
disabled by default. The infrastructure for this was added in 1.24 without
modifying components. The
[`component-base/logs/example`](https://github.com/kubernetes/kubernetes/blob/v1.24.0-beta.0/staging/src/k8s.io/component-base/logs/example/cmd/logger.go)
command demonstrates how to use the new logging calls and how a component
behaves that supports contextual logging.
```console
$ cd $GOPATH/src/k8s.io/kubernetes/staging/src/k8s.io/component-base/logs/example/cmd/
$ go run . --help
...
--feature-gates mapStringBool A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
AllAlpha=true|false (ALPHA - default=false)
AllBeta=true|false (BETA - default=false)
ContextualLogging=true|false (ALPHA - default=false)
$ go run . --feature-gates ContextualLogging=true
...
I0404 18:00:02.916429 451895 logger.go:94] "example/myname: runtime" foo="bar" duration="1m0s"
I0404 18:00:02.916447 451895 logger.go:95] "example: another runtime" foo="bar" duration="1m0s"
```
The `example` prefix and `foo="bar"` were added by the caller of the function
which logs the `runtime` message and `duration="1m0s"` value, without having to
modify that function.
With contextual logging disable, `WithValues` and `WithName` do nothing and log
calls go through the global klog logger. Therefore this additional information
is not in the log output anymore:
```console
$ go run . --feature-gates ContextualLogging=false
...
I0404 18:03:31.171945 452150 logger.go:94] "runtime" duration="1m0s"
I0404 18:03:31.171962 452150 logger.go:95] "another runtime" duration="1m0s"
```
### JSON log format
{{< feature-state for_k8s_version="v1.19" state="alpha" >}}
@ -150,27 +199,6 @@ List of components currently supporting JSON format:
* {{< glossary_tooltip term_id="kube-scheduler" text="kube-scheduler" >}}
* {{< glossary_tooltip term_id="kubelet" text="kubelet" >}}
### Log sanitization
{{< feature-state for_k8s_version="v1.20" state="alpha" >}}
{{<warning >}}
Log sanitization might incur significant computation overhead and therefore should not be enabled in production.
{{< /warning >}}
The `--experimental-logging-sanitization` flag enables the klog sanitization filter.
If enabled all log arguments are inspected for fields tagged as sensitive data (e.g. passwords, keys, tokens) and logging of these fields will be prevented.
List of components currently supporting log sanitization:
* kube-controller-manager
* kube-apiserver
* kube-scheduler
* kubelet
{{< note >}}
The Log sanitization filter does not prevent user workload logs from leaking sensitive data.
{{< /note >}}
### Log verbosity level
The `-v` flag controls log verbosity. Increasing the value increases the number of logged events. Decreasing the value decreases the number of logged events.
@ -197,5 +225,6 @@ The `logrotate` tool rotates logs daily, or once the log size is greater than 10
* Read about the [Kubernetes Logging Architecture](/docs/concepts/cluster-administration/logging/)
* Read about [Structured Logging](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging)
* Read about [Contextual Logging](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/3077-contextual-logging)
* Read about [deprecation of klog flags](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components)
* Read about the [Conventions for logging severity](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md)

View File

@ -279,5 +279,6 @@ to the deleted ConfigMap, it is recommended to recreate these pods.
* Read about [Secrets](/docs/concepts/configuration/secret/).
* Read [Configure a Pod to Use a ConfigMap](/docs/tasks/configure-pod-container/configure-pod-configmap/).
* Read about [changing a ConfigMap (or any other Kubernetes object)](/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/)
* Read [The Twelve-Factor App](https://12factor.net/) to understand the motivation for
separating code from configuration.

View File

@ -47,10 +47,9 @@ or by enforcement (the system prevents the container from ever exceeding the lim
runtimes can have different ways to implement the same restrictions.
{{< note >}}
If a container specifies its own memory limit, but does not specify a memory request, Kubernetes
automatically assigns a memory request that matches the limit. Similarly, if a container specifies its own
CPU limit, but does not specify a CPU request, Kubernetes automatically assigns a CPU request that matches
the limit.
If you specify a limit for a resource, but do not specify any request, and no admission-time
mechanism has applied a default request for that resource, then Kubernetes copies the limit
you specified and uses it as the requested value for the resource.
{{< /note >}}
## Resource types
@ -229,9 +228,9 @@ see the [Troubleshooting](#troubleshooting) section.
The kubelet reports the resource usage of a Pod as part of the Pod
[`status`](/docs/concepts/overview/working-with-objects/kubernetes-objects/#object-spec-and-status).
If optional [tools for monitoring](/docs/tasks/debug-application-cluster/resource-usage-monitoring/)
If optional [tools for monitoring](/docs/tasks/debug/debug-cluster/resource-usage-monitoring/)
are available in your cluster, then Pod resource usage can be retrieved either
from the [Metrics API](/docs/tasks/debug-application-cluster/resource-metrics-pipeline/#metrics-api)
from the [Metrics API](/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#metrics-api)
directly or from your monitoring tools.
## Local ephemeral storage

View File

@ -247,6 +247,8 @@ You can still [manually create](/docs/tasks/configure-pod-container/configure-se
a service account token Secret; for example, if you need a token that never expires.
However, using the [TokenRequest](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/)
subresource to obtain a token to access the API is recommended instead.
You can use the [`kubectl create token`](/docs/reference/generated/kubectl/kubectl-commands#-em-token-em-)
command to obtain a token from the `TokenRequest` API.
{{< /note >}}
#### Projection of Secret keys to specific paths
@ -886,15 +888,30 @@ In this case, `0` means you have created an empty Secret.
### Service account token Secrets
A `kubernetes.io/service-account-token` type of Secret is used to store a
token that identifies a
token credential that identifies a
{{< glossary_tooltip text="service account" term_id="service-account" >}}.
Since 1.22, this type of Secret is no longer used to mount credentials into Pods,
and obtaining tokens via the [TokenRequest](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/)
API is recommended instead of using service account token Secret objects.
Tokens obtained from the `TokenRequest` API are more secure than ones stored in Secret objects,
because they have a bounded lifetime and are not readable by other API clients.
You can use the [`kubectl create token`](/docs/reference/generated/kubectl/kubectl-commands#-em-token-em-)
command to obtain a token from the `TokenRequest` API.
You should only create a service account token Secret object
if you can't use the `TokenRequest` API to obtain a token,
and the security exposure of persisting a non-expiring token credential
in a readable API object is acceptable to you.
When using this Secret type, you need to ensure that the
`kubernetes.io/service-account.name` annotation is set to an existing
service account name. A Kubernetes
{{< glossary_tooltip text="controller" term_id="controller" >}} fills in some
other fields such as the `kubernetes.io/service-account.uid` annotation, and the
`token` key in the `data` field, which is set to contain an authentication
token.
service account name. If you are creating both the ServiceAccount and
the Secret objects, you should create the ServiceAccount object first.
After the Secret is created, a Kubernetes {{< glossary_tooltip text="controller" term_id="controller" >}}
fills in some other fields such as the `kubernetes.io/service-account.uid` annotation, and the
`token` key in the `data` field, which is populated with an authentication token.
The following example configuration declares a service account token Secret:
@ -911,20 +928,14 @@ data:
extra: YmFyCg==
```
When creating a `Pod`, Kubernetes automatically finds or creates a service account
Secret and then automatically modifies your Pod to use this Secret. The service account
token Secret contains credentials for accessing the Kubernetes API.
The automatic creation and use of API credentials can be disabled or
overridden if desired. However, if all you need to do is securely access the
API server, this is the recommended workflow.
After creating the Secret, wait for Kubernetes to populate the `token` key in the `data` field.
See the [ServiceAccount](/docs/tasks/configure-pod-container/configure-service-account/)
documentation for more information on how service accounts work.
You can also check the `automountServiceAccountToken` field and the
`serviceAccountName` field of the
[`Pod`](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#pod-v1-core)
for information on referencing service account from Pods.
for information on referencing service account credentials from within Pods.
### Docker config Secrets
@ -982,7 +993,7 @@ kubectl create secret docker-registry secret-tiger-docker \
```
That command creates a Secret of type `kubernetes.io/dockerconfigjson`.
If you dump the `.data.dockercfgjson` field from that new Secret and then
If you dump the `.data.dockerconfigjson` field from that new Secret and then
decode it from base64:
```shell
@ -1291,7 +1302,7 @@ on that node.
- When deploying applications that interact with the Secret API, you should
limit access using
[authorization policies](/docs/reference/access-authn-authz/authorization/) such as
[RBAC]( /docs/reference/access-authn-authz/rbac/).
[RBAC](/docs/reference/access-authn-authz/rbac/).
- In the Kubernetes API, `watch` and `list` requests for Secrets within a namespace
are extremely powerful capabilities. Avoid granting this access where feasible, since
listing Secrets allows the clients to inspect the values of every Secret in that
@ -1310,7 +1321,7 @@ have access to run a Pod that then exposes the Secret.
- When deploying applications that interact with the Secret API, you should
limit access using
[authorization policies](/docs/reference/access-authn-authz/authorization/) such as
[RBAC]( /docs/reference/access-authn-authz/rbac/).
[RBAC](/docs/reference/access-authn-authz/rbac/).
- In the API server, objects (including Secrets) are persisted into
{{< glossary_tooltip term_id="etcd" >}}; therefore:
- only allow cluster admistrators to access etcd (this includes read-only access);

View File

@ -0,0 +1,83 @@
---
reviewers:
- jayunit100
- jsturtevant
- marosset
- perithompson
title: Resource Management for Windows nodes
content_type: concept
weight: 75
---
<!-- overview -->
This page outlines the differences in how resources are managed between Linux and Windows.
<!-- body -->
On Linux nodes, {{< glossary_tooltip text="cgroups" term_id="cgroup" >}} are used
as a pod boundary for resource control. Containers are created within that boundary
for network, process and file system isolation. The Linux cgroup APIs can be used to
gather CPU, I/O, and memory use statistics.
In contrast, Windows uses a [_job object_](https://docs.microsoft.com/windows/win32/procthread/job-objects) per container with a system namespace filter
to contain all processes in a container and provide logical isolation from the
host.
(Job objects are a Windows process isolation mechanism and are different from
what Kubernetes refers to as a {{< glossary_tooltip term_id="job" text="Job" >}}).
There is no way to run a Windows container without the namespace filtering in
place. This means that system privileges cannot be asserted in the context of the
host, and thus privileged containers are not available on Windows.
Containers cannot assume an identity from the host because the Security Account Manager
(SAM) is separate.
## Memory reservations {#resource-management-memory}
Windows does not have an out-of-memory process killer as Linux does. Windows always
treats all user-mode memory allocations as virtual, and pagefiles are mandatory.
Windows nodes do not overcommit memory for processes running in containers. The
net effect is that Windows won't reach out of memory conditions the same way Linux
does, and processes page to disk instead of being subject to out of memory (OOM)
termination. If memory is over-provisioned and all physical memory is exhausted,
then paging can slow down performance.
You can place bounds on memory use for workloads using the kubelet
parameters `--kubelet-reserve` and/or `--system-reserve`; these account
for memory usage on the node (outside of containers), and reduce
[NodeAllocatable](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable).
As you deploy workloads, set resource limits on containers. This also subtracts from
`NodeAllocatable` and prevents the scheduler from adding more pods once a node is full.
{{< note >}}
When you set memory resource limits for Windows containers, you should either set a
limit and leave the memory request unspecified, or set the request equal to the limit.
{{< /note >}}
On Windows, good practice to avoid over-provisioning is to configure the kubelet
with a system reserved memory of at least 2GiB to account for Windows, Kubernetes
and container runtime overheads.
## CPU reservations {#resource-management-cpu}
To account for CPU use by the operating system, the container runtime, and by
Kubernetes host processes such as the kubelet, you can (and should) reserve a
percentage of total CPU. You should determine this CPU reservation taking account of
to the number of CPU cores available on the node. To decide on the CPU percentage to
reserve, identify the maximum pod density for each node and monitor the CPU usage of
the system services running there, then choose a value that meets your workload needs.
You can place bounds on CPU usage for workloads using the
kubelet parameters `--kubelet-reserve` and/or `--system-reserve` to
account for CPU usage on the node (outside of containers).
This reduces `NodeAllocatable`.
The cluster-wide scheduler then takes this reservation into account when determining
pod placement.
On Windows, the kubelet supports a command-line flag to set the priority of the
kubelet process: `--windows-priorityclass`. This flag allows the kubelet process to get
more CPU time slices when compared to other processes running on the Windows host.
More information on the allowable values and their meaning is available at
[Windows Priority Classes](https://docs.microsoft.com/en-us/windows/win32/procthread/scheduling-priorities#priority-class).
To ensure that running Pods do not starve the kubelet of CPU cycles, set this flag to `ABOVE_NORMAL_PRIORITY_CLASS` or above.

View File

@ -1,7 +1,7 @@
---
reviewers:
- tallclair
- dchen1107
- tallclair
- dchen1107
title: Runtime Class
content_type: concept
weight: 20
@ -16,9 +16,6 @@ This page describes the RuntimeClass resource and runtime selection mechanism.
RuntimeClass is a feature for selecting the container runtime configuration. The container runtime
configuration is used to run a Pod's containers.
<!-- body -->
## Motivation
@ -62,12 +59,15 @@ The RuntimeClass resource currently only has 2 significant fields: the RuntimeCl
(`metadata.name`) and the handler (`handler`). The object definition looks like this:
```yaml
apiVersion: node.k8s.io/v1 # RuntimeClass is defined in the node.k8s.io API group
# RuntimeClass is defined in the node.k8s.io API group
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: myclass # The name the RuntimeClass will be referenced by
# RuntimeClass is a non-namespaced resource
handler: myconfiguration # The name of the corresponding CRI configuration
# The name the RuntimeClass will be referenced by.
# RuntimeClass is a non-namespaced resource.
name: myclass
# The name of the corresponding CRI configuration
handler: myconfiguration
```
The name of a RuntimeClass object must be a valid
@ -75,14 +75,14 @@ The name of a RuntimeClass object must be a valid
{{< note >}}
It is recommended that RuntimeClass write operations (create/update/patch/delete) be
restricted to the cluster administrator. This is typically the default. See [Authorization
Overview](/docs/reference/access-authn-authz/authorization/) for more details.
restricted to the cluster administrator. This is typically the default. See
[Authorization Overview](/docs/reference/access-authn-authz/authorization/) for more details.
{{< /note >}}
## Usage
Once RuntimeClasses are configured for the cluster, using them is very simple. Specify a
`runtimeClassName` in the Pod spec. For example:
Once RuntimeClasses are configured for the cluster, you can specify a
`runtimeClassName` in the Pod spec to use it. For example:
```yaml
apiVersion: v1
@ -97,7 +97,7 @@ spec:
This will instruct the kubelet to use the named RuntimeClass to run this pod. If the named
RuntimeClass does not exist, or the CRI cannot run the corresponding handler, the pod will enter the
`Failed` terminal [phase](/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase). Look for a
corresponding [event](/docs/tasks/debug-application-cluster/debug-application-introspection/) for an
corresponding [event](/docs/tasks/debug/debug-application/debug-running-pod/) for an
error message.
If no `runtimeClassName` is specified, the default RuntimeHandler will be used, which is equivalent
@ -107,16 +107,6 @@ to the behavior when the RuntimeClass feature is disabled.
For more details on setting up CRI runtimes, see [CRI installation](/docs/setup/production-environment/container-runtimes/).
#### dockershim
{{< feature-state for_k8s_version="v1.20" state="deprecated" >}}
Dockershim is deprecated as of Kubernetes v1.20, and will be removed in v1.24. For more information on the deprecation,
see [dockershim deprecation](/blog/2020/12/08/kubernetes-1-20-release-announcement/#dockershim-deprecation)
RuntimeClasses with dockershim must set the runtime handler to `docker`. Dockershim does not support
custom configurable runtime handlers.
#### {{< glossary_tooltip term_id="containerd" >}}
Runtime handlers are configured through containerd's configuration at
@ -126,14 +116,14 @@ Runtime handlers are configured through containerd's configuration at
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.${HANDLER_NAME}]
```
See containerd's config documentation for more details:
https://github.com/containerd/cri/blob/master/docs/config.md
See containerd's [config documentation](https://github.com/containerd/cri/blob/master/docs/config.md)
for more details:
#### {{< glossary_tooltip term_id="cri-o" >}}
Runtime handlers are configured through CRI-O's configuration at `/etc/crio/crio.conf`. Valid
handlers are configured under the [crio.runtime
table](https://github.com/cri-o/cri-o/blob/master/docs/crio.conf.5.md#crioruntime-table):
handlers are configured under the
[crio.runtime table](https://github.com/cri-o/cri-o/blob/master/docs/crio.conf.5.md#crioruntime-table):
```
[crio.runtime.runtimes.${HANDLER_NAME}]
@ -161,27 +151,24 @@ can add `tolerations` to the RuntimeClass. As with the `nodeSelector`, the toler
with the pod's tolerations in admission, effectively taking the union of the set of nodes tolerated
by each.
To learn more about configuring the node selector and tolerations, see [Assigning Pods to
Nodes](/docs/concepts/scheduling-eviction/assign-pod-node/).
To learn more about configuring the node selector and tolerations, see
[Assigning Pods to Nodes](/docs/concepts/scheduling-eviction/assign-pod-node/).
### Pod Overhead
{{< feature-state for_k8s_version="v1.18" state="beta" >}}
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
You can specify _overhead_ resources that are associated with running a Pod. Declaring overhead allows
the cluster (including the scheduler) to account for it when making decisions about Pods and resources.
To use Pod overhead, you must have the PodOverhead [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
enabled (it is on by default).
Pod overhead is defined in RuntimeClass through the `overhead` fields. Through the use of these fields,
Pod overhead is defined in RuntimeClass through the `overhead` field. Through the use of this field,
you can specify the overhead of running pods utilizing this RuntimeClass and ensure these overheads
are accounted for in Kubernetes.
## {{% heading "whatsnext" %}}
- [RuntimeClass Design](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/585-runtime-class/README.md)
- [RuntimeClass Scheduling Design](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/585-runtime-class/README.md#runtimeclass-scheduling)
- Read about the [Pod Overhead](/docs/concepts/scheduling-eviction/pod-overhead/) concept
- [PodOverhead Feature Design](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/688-pod-overhead)

View File

@ -346,8 +346,6 @@ Here are some examples of device plugin implementations:
* The [AMD GPU device plugin](https://github.com/RadeonOpenCompute/k8s-device-plugin)
* The [Intel device plugins](https://github.com/intel/intel-device-plugins-for-kubernetes) for Intel GPU, FPGA, QAT, VPU, SGX, DSA, DLB and IAA devices
* The [KubeVirt device plugins](https://github.com/kubevirt/kubernetes-device-plugins) for hardware-assisted virtualization
* The [NVIDIA GPU device plugin](https://github.com/NVIDIA/k8s-device-plugin)
* Requires [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) 2.0, which allows you to run GPU-enabled Docker containers.
* The [NVIDIA GPU device plugin for Container-Optimized OS](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu)
* The [RDMA device plugin](https://github.com/hustcat/k8s-rdma-device-plugin)
* The [SocketCAN device plugin](https://github.com/collabora/k8s-socketcan)

View File

@ -11,36 +11,52 @@ weight: 10
<!-- overview -->
Network plugins in Kubernetes come in a few flavors:
Kubernetes {{< skew currentVersion >}} supports [Container Network Interface](https://github.com/containernetworking/cni)
(CNI) plugins for cluster networking. You must use a CNI plugin that is compatible with your cluster and that suits your needs. Different plugins are available (both open- and closed- source) in the wider Kubernetes ecosystem.
* CNI plugins: adhere to the [Container Network Interface](https://github.com/containernetworking/cni) (CNI) specification, designed for interoperability.
* Kubernetes follows the [v0.4.0](https://github.com/containernetworking/cni/blob/spec-v0.4.0/SPEC.md) release of the CNI specification.
* Kubenet plugin: implements basic `cbr0` using the `bridge` and `host-local` CNI plugins
A CNI plugin is required to implement the [Kubernetes network model](/docs/concepts/services-networking/#the-kubernetes-network-model).
You must use a CNI plugin that is compatible with the
[v0.4.0](https://github.com/containernetworking/cni/blob/spec-v0.4.0/SPEC.md) or later
releases of the CNI specification. The Kubernetes project recommends using a plugin that is
compatible with the [v1.0.0](https://github.com/containernetworking/cni/blob/spec-v1.0.0/SPEC.md)
CNI specification (plugins can be compatible with multiple spec versions).
<!-- body -->
## Installation
The kubelet has a single default network plugin, and a default network common to the entire cluster. It probes for plugins when it starts up, remembers what it finds, and executes the selected plugin at appropriate times in the pod lifecycle (this is only true for Docker, as CRI manages its own CNI plugins). There are two Kubelet command line parameters to keep in mind when using plugins:
A Container Runtime, in the networking context, is a daemon on a node configured to provide CRI Services for kubelet. In particular, the Container Runtime must be configured to load the CNI plugins required to implement the Kubernetes network model.
* `cni-bin-dir`: Kubelet probes this directory for plugins on startup
* `network-plugin`: The network plugin to use from `cni-bin-dir`. It must match the name reported by a plugin probed from the plugin directory. For CNI plugins, this is `cni`.
{{< note >}}
Prior to Kubernetes 1.24, the CNI plugins could also be managed by the kubelet using the `cni-bin-dir` and `network-plugin` command-line parameters.
These command-line parameters were removed in Kubernetes 1.24, with management of the CNI no longer in scope for kubelet.
See [Troubleshooting CNI plugin-related errors](/docs/tasks/administer-cluster/migrating-from-dockershim/troubleshooting-cni-plugin-related-errors/)
if you are facing issues following the removal of dockershim.
{{< /note >}}
For specific information about how a Container Runtime manages the CNI plugins, see the documentation for that Container Runtime, for example:
- [containerd](https://github.com/containerd/containerd/blob/main/script/setup/install-cni)
- [CRI-O](https://github.com/cri-o/cri-o/blob/main/contrib/cni/README.md)
For specific information about how to install and manage a CNI plugin, see the documentation for that plugin or [networking provider](/docs/concepts/cluster-administration/networking/#how-to-implement-the-kubernetes-networking-model).
## Network Plugin Requirements
Besides providing the [`NetworkPlugin` interface](https://github.com/kubernetes/kubernetes/tree/{{< param "fullversion" >}}/pkg/kubelet/dockershim/network/plugins.go) to configure and clean up pod networking, the plugin may also need specific support for kube-proxy. The iptables proxy obviously depends on iptables, and the plugin may need to ensure that container traffic is made available to iptables. For example, if the plugin connects containers to a Linux bridge, the plugin must set the `net/bridge/bridge-nf-call-iptables` sysctl to `1` to ensure that the iptables proxy functions correctly. If the plugin does not use a Linux bridge (but instead something like Open vSwitch or some other mechanism) it should ensure container traffic is appropriately routed for the proxy.
For plugin developers and users who regularly build or deploy Kubernetes, the plugin may also need specific configuration to support kube-proxy.
The iptables proxy depends on iptables, and the plugin may need to ensure that container traffic is made available to iptables.
For example, if the plugin connects containers to a Linux bridge, the plugin must set the `net/bridge/bridge-nf-call-iptables` sysctl to `1` to ensure that the iptables proxy functions correctly.
If the plugin does not use a Linux bridge, but uses something like Open vSwitch or some other mechanism instead, it should ensure container traffic is appropriately routed for the proxy.
By default if no kubelet network plugin is specified, the `noop` plugin is used, which sets `net/bridge/bridge-nf-call-iptables=1` to ensure simple configurations (like Docker with a bridge) work correctly with the iptables proxy.
By default, if no kubelet network plugin is specified, the `noop` plugin is used, which sets `net/bridge/bridge-nf-call-iptables=1` to ensure simple configurations (like Docker with a bridge) work correctly with the iptables proxy.
### CNI
### Loopback CNI
The CNI plugin is selected by passing Kubelet the `--network-plugin=cni` command-line option. Kubelet reads a file from `--cni-conf-dir` (default `/etc/cni/net.d`) and uses the CNI configuration from that file to set up each pod's network. The CNI configuration file must match the [CNI specification](https://github.com/containernetworking/cni/blob/master/SPEC.md#network-configuration), and any required CNI plugins referenced by the configuration must be present in `--cni-bin-dir` (default `/opt/cni/bin`).
In addition to the CNI plugin installed on the nodes for implementing the Kubernetes network model, Kubernetes also requires the container runtimes to provide a loopback interface `lo`, which is used for each sandbox (pod sandboxes, vm sandboxes, ...).
Implementing the loopback interface can be accomplished by re-using the [CNI loopback plugin.](https://github.com/containernetworking/plugins/blob/master/plugins/main/loopback/loopback.go) or by developing your own code to achieve this (see [this example from CRI-O](https://github.com/cri-o/ocicni/blob/release-1.24/pkg/ocicni/util_linux.go#L91)).
If there are multiple CNI configuration files in the directory, the kubelet uses the configuration file that comes first by name in lexicographic order.
In addition to the CNI plugin specified by the configuration file, Kubernetes requires the standard CNI [`lo`](https://github.com/containernetworking/plugins/blob/master/plugins/main/loopback/loopback.go) plugin, at minimum version 0.2.0
#### Support hostPort
### Support hostPort
The CNI networking plugin supports `hostPort`. You can use the official [portmap](https://github.com/containernetworking/plugins/tree/master/plugins/meta/portmap)
plugin offered by the CNI plugin team or use your own plugin with portMapping functionality.
@ -77,7 +93,7 @@ For example:
}
```
#### Support traffic shaping
### Support traffic shaping
**Experimental Feature**
@ -129,37 +145,4 @@ metadata:
...
```
### kubenet
Kubenet is a very basic, simple network plugin, on Linux only. It does not, of itself, implement more advanced features like cross-node networking or network policy. It is typically used together with a cloud provider that sets up routing rules for communication between nodes, or in single-node environments.
Kubenet creates a Linux bridge named `cbr0` and creates a veth pair for each pod with the host end of each pair connected to `cbr0`. The pod end of the pair is assigned an IP address allocated from a range assigned to the node either through configuration or by the controller-manager. `cbr0` is assigned an MTU matching the smallest MTU of an enabled normal interface on the host.
The plugin requires a few things:
* The standard CNI `bridge`, `lo` and `host-local` plugins are required, at minimum version 0.2.0. Kubenet will first search for them in `/opt/cni/bin`. Specify `cni-bin-dir` to supply additional search path. The first found match will take effect.
* Kubelet must be run with the `--network-plugin=kubenet` argument to enable the plugin
* Kubelet should also be run with the `--non-masquerade-cidr=<clusterCidr>` argument to ensure traffic to IPs outside this range will use IP masquerade.
* The node must be assigned an IP subnet through either the `--pod-cidr` kubelet command-line option or the `--allocate-node-cidrs=true --cluster-cidr=<cidr>` controller-manager command-line options.
### Customizing the MTU (with kubenet)
The MTU should always be configured correctly to get the best networking performance. Network plugins will usually try
to infer a sensible MTU, but sometimes the logic will not result in an optimal MTU. For example, if the
Docker bridge or another interface has a small MTU, kubenet will currently select that MTU. Or if you are
using IPSEC encapsulation, the MTU must be reduced, and this calculation is out-of-scope for
most network plugins.
Where needed, you can specify the MTU explicitly with the `network-plugin-mtu` kubelet option. For example,
on AWS the `eth0` MTU is typically 9001, so you might specify `--network-plugin-mtu=9001`. If you're using IPSEC you
might reduce it to allow for encapsulation overhead; for example: `--network-plugin-mtu=8873`.
This option is provided to the network-plugin; currently **only kubenet supports `network-plugin-mtu`**.
## Usage Summary
* `--network-plugin=cni` specifies that we use the `cni` network plugin with actual CNI plugin binaries located in `--cni-bin-dir` (default `/opt/cni/bin`) and CNI plugin configuration located in `--cni-conf-dir` (default `/etc/cni/net.d`).
* `--network-plugin=kubenet` specifies that we use the `kubenet` network plugin with CNI `bridge`, `lo` and `host-local` plugins placed in `/opt/cni/bin` or `cni-bin-dir`.
* `--network-plugin-mtu=9001` specifies the MTU to use, currently only used by the `kubenet` network plugin.
## {{% heading "whatsnext" %}}

View File

@ -111,6 +111,7 @@ Operator.
{{% thirdparty-content %}}
* [Charmed Operator Framework](https://juju.is/)
* [Java Operator SDK](https://github.com/java-operator-sdk/java-operator-sdk)
* [Kopf](https://github.com/nolar/kopf) (Kubernetes Operator Pythonic Framework)
* [kubebuilder](https://book.kubebuilder.io/)
* [KubeOps](https://buehler.github.io/dotnet-operator-sdk/) (.NET operator SDK)

View File

@ -114,7 +114,7 @@ Containers started by Kubernetes automatically include this DNS server in their
### Container Resource Monitoring
[Container Resource Monitoring](/docs/tasks/debug-application-cluster/resource-usage-monitoring/) records generic time-series metrics
[Container Resource Monitoring](/docs/tasks/debug/debug-cluster/resource-usage-monitoring/) records generic time-series metrics
about containers in a central database, and provides a UI for browsing that data.
### Cluster-level Logging

View File

@ -82,18 +82,42 @@ packages that define the API objects.
### OpenAPI V3
{{< feature-state state="alpha" for_k8s_version="v1.23" >}}
{{< feature-state state="beta" for_k8s_version="v1.24" >}}
Kubernetes v1.23 offers initial support for publishing its APIs as OpenAPI v3; this is an
alpha feature that is disabled by default.
You can enable the alpha feature by turning on the
Kubernetes {{< param "version" >}} offers beta support for publishing its APIs as OpenAPI v3; this is a
beta feature that is enabled by default.
You can disable the beta feature by turning off the
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) named `OpenAPIV3`
for the kube-apiserver component.
With the feature enabled, the Kubernetes API server serves an
aggregated OpenAPI v3 spec per Kubernetes group version at the
`/openapi/v3/apis/<group>/<version>` endpoint. Please refer to the
table below for accepted request headers.
A discovery endpoint `/openapi/v3` is provided to see a list of all
group/versions available. This endpoint only returns JSON. These group/versions
are provided in the following format:
```json
{
"paths": {
...
"api/v1": {
"serverRelativeURL": "/openapi/v3/api/v1?hash=CC0E9BFD992D8C59AEC98A1E2336F899E8318D3CF4C68944C3DEC640AF5AB52D864AC50DAA8D145B3494F75FA3CFF939FCBDDA431DAD3CA79738B297795818CF"
},
"apis/admissionregistration.k8s.io/v1": {
"serverRelativeURL": "/openapi/v3/apis/admissionregistration.k8s.io/v1?hash=E19CC93A116982CE5422FC42B590A8AFAD92CDE9AE4D59B5CAAD568F083AD07946E6CB5817531680BCE6E215C16973CD39003B0425F3477CFD854E89A9DB6597"
},
...
}
```
The relative URLs are pointing to immutable OpenAPI descriptions, in
order to improve client-side caching. The proper HTTP caching headers
are also set by the API server for that purpose (`Expires` to 1 year in
the future, and `Cache-Control` to `immutable`). When an obsolete URL is
used, the API server returns a redirect to the newest URL.
The Kubernetes API server publishes an OpenAPI v3 spec per Kubernetes
group version at the `/openapi/v3/apis/<group>/<version>?hash=<hash>`
endpoint.
Refer to the table below for accepted request headers.
<table>
<caption style="display:none">Valid request header values for OpenAPI v3 queries</caption>
@ -126,9 +150,6 @@ table below for accepted request headers.
</tbody>
</table>
A discovery endpoint `/openapi/v3` is provided to see a list of all
group/versions available. This endpoint only returns JSON.
## Persistence
Kubernetes stores the serialized state of objects by writing them into

View File

@ -442,7 +442,7 @@ pods 0 10
### Cross-namespace Pod Affinity Quota
{{< feature-state for_k8s_version="v1.22" state="beta" >}}
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
Operators can use `CrossNamespacePodAffinity` quota scope to limit which namespaces are allowed to
have pods with affinity terms that cross namespaces. Specifically, it controls which pods are allowed
@ -493,10 +493,6 @@ With the above configuration, pods can use `namespaces` and `namespaceSelector`
if the namespace where they are created have a resource quota object with
`CrossNamespaceAffinity` scope and a hard limit greater than or equal to the number of pods using those fields.
This feature is beta and enabled by default. You can disable it using the
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
`PodAffinityNamespaceSelector` in both kube-apiserver and kube-scheduler.
## Requests compared to Limits {#requests-vs-limits}
When allocating compute resources, each container may specify a request and a limit value for either CPU or memory.

View File

@ -120,7 +120,7 @@ your Pod spec.
For example, consider the following Pod spec:
{{<codenew file="pods/pod-with-node-affinity.yaml">}}
{{< codenew file="pods/pod-with-node-affinity.yaml" >}}
In this example, the following rules apply:
@ -167,7 +167,7 @@ scheduling decision for the Pod.
For example, consider the following Pod spec:
{{<codenew file="pods/pod-with-affinity-anti-affinity.yaml">}}
{{< codenew file="pods/pod-with-affinity-anti-affinity.yaml" >}}
If there are two possible nodes that match the
`requiredDuringSchedulingIgnoredDuringExecution` rule, one with the
@ -302,9 +302,8 @@ the Pod onto a node that is in the same zone as one or more Pods with the label
`topology.kubernetes.io/zone=R` label if there are other nodes in the
same zone currently running Pods with the `Security=S2` Pod label.
See the
[design doc](https://git.k8s.io/community/contributors/design-proposals/scheduling/podaffinity.md)
for many more examples of Pod affinity and anti-affinity.
To get yourself more familiar with the examples of Pod affinity and anti-affinity,
refer to the [design proposal](https://github.com/kubernetes/design-proposals-archive/blob/main/scheduling/podaffinity.md).
You can use the `In`, `NotIn`, `Exists` and `DoesNotExist` values in the
`operator` field for Pod affinity and anti-affinity.
@ -326,19 +325,13 @@ If omitted or empty, `namespaces` defaults to the namespace of the Pod where the
affinity/anti-affinity definition appears.
#### Namespace selector
{{< feature-state for_k8s_version="v1.22" state="beta" >}}
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
You can also select matching namespaces using `namespaceSelector`, which is a label query over the set of namespaces.
The affinity term is applied to namespaces selected by both `namespaceSelector` and the `namespaces` field.
Note that an empty `namespaceSelector` ({}) matches all namespaces, while a null or empty `namespaces` list and
null `namespaceSelector` matches the namespace of the Pod where the rule is defined.
{{<note>}}
This feature is beta and enabled by default. You can disable it via the
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
`PodAffinityNamespaceSelector` in both kube-apiserver and kube-scheduler.
{{</note>}}
#### More practical use-cases
Inter-pod affinity and anti-affinity can be even more useful when they are used with higher

View File

@ -10,17 +10,12 @@ weight: 30
<!-- overview -->
{{< feature-state for_k8s_version="v1.18" state="beta" >}}
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
When you run a Pod on a Node, the Pod itself takes an amount of system resources. These
resources are additional to the resources needed to run the container(s) inside the Pod.
_Pod Overhead_ is a feature for accounting for the resources consumed by the Pod infrastructure
on top of the container requests & limits.
In Kubernetes, _Pod Overhead_ is a way to account for the resources consumed by the Pod
infrastructure on top of the container requests & limits.
<!-- body -->
@ -29,33 +24,30 @@ In Kubernetes, the Pod's overhead is set at
time according to the overhead associated with the Pod's
[RuntimeClass](/docs/concepts/containers/runtime-class/).
When Pod Overhead is enabled, the overhead is considered in addition to the sum of container
resource requests when scheduling a Pod. Similarly, the kubelet will include the Pod overhead when sizing
the Pod cgroup, and when carrying out Pod eviction ranking.
A pod's overhead is considered in addition to the sum of container resource requests when
scheduling a Pod. Similarly, the kubelet will include the Pod overhead when sizing the Pod cgroup,
and when carrying out Pod eviction ranking.
## Enabling Pod Overhead {#set-up}
## Configuring Pod overhead {#set-up}
You need to make sure that the `PodOverhead`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled (it is on by default as of 1.18)
across your cluster, and a `RuntimeClass` is utilized which defines the `overhead` field.
You need to make sure a `RuntimeClass` is utilized which defines the `overhead` field.
## Usage example
To use the PodOverhead feature, you need a RuntimeClass that defines the `overhead` field. As
an example, you could use the following RuntimeClass definition with a virtualizing container runtime
that uses around 120MiB per Pod for the virtual machine and the guest OS:
To work with Pod overhead, you need a RuntimeClass that defines the `overhead` field. As
an example, you could use the following RuntimeClass definition with a virtualization container
runtime that uses around 120MiB per Pod for the virtual machine and the guest OS:
```yaml
---
kind: RuntimeClass
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: kata-fc
name: kata-fc
handler: kata-fc
overhead:
podFixed:
memory: "120Mi"
cpu: "250m"
podFixed:
memory: "120Mi"
cpu: "250m"
```
Workloads which are created which specify the `kata-fc` RuntimeClass handler will take the memory and
@ -92,13 +84,15 @@ updates the workload's PodSpec to include the `overhead` as described in the Run
the Pod will be rejected. In the given example, since only the RuntimeClass name is specified, the admission controller mutates the Pod
to include an `overhead`.
After the RuntimeClass admission controller, you can check the updated PodSpec:
After the RuntimeClass admission controller has made modifications, you can check the updated
Pod overhead value:
```bash
kubectl get pod test-pod -o jsonpath='{.spec.overhead}'
```
The output is:
```
map[cpu:250m memory:120Mi]
```
@ -110,44 +104,50 @@ When the kube-scheduler is deciding which node should run a new Pod, the schedul
`overhead` as well as the sum of container requests for that Pod. For this example, the scheduler adds the
requests and the overhead, then looks for a node that has 2.25 CPU and 320 MiB of memory available.
Once a Pod is scheduled to a node, the kubelet on that node creates a new {{< glossary_tooltip text="cgroup" term_id="cgroup" >}}
for the Pod. It is within this pod that the underlying container runtime will create containers.
Once a Pod is scheduled to a node, the kubelet on that node creates a new {{< glossary_tooltip
text="cgroup" term_id="cgroup" >}} for the Pod. It is within this pod that the underlying
container runtime will create containers.
If the resource has a limit defined for each container (Guaranteed QoS or Bustrable QoS with limits defined),
If the resource has a limit defined for each container (Guaranteed QoS or Burstable QoS with limits defined),
the kubelet will set an upper limit for the pod cgroup associated with that resource (cpu.cfs_quota_us for CPU
and memory.limit_in_bytes memory). This upper limit is based on the sum of the container limits plus the `overhead`
defined in the PodSpec.
For CPU, if the Pod is Guaranteed or Burstable QoS, the kubelet will set `cpu.shares` based on the sum of container
requests plus the `overhead` defined in the PodSpec.
For CPU, if the Pod is Guaranteed or Burstable QoS, the kubelet will set `cpu.shares` based on the
sum of container requests plus the `overhead` defined in the PodSpec.
Looking at our example, verify the container requests for the workload:
```bash
kubectl get pod test-pod -o jsonpath='{.spec.containers[*].resources.limits}'
```
The total container requests are 2000m CPU and 200MiB of memory:
```
map[cpu: 500m memory:100Mi] map[cpu:1500m memory:100Mi]
```
Check this against what is observed by the node:
```bash
kubectl describe node | grep test-pod -B2
```
The output shows 2250m CPU and 320MiB of memory are requested, which includes PodOverhead:
The output shows requests for 2250m CPU, and for 320MiB of memory. The requests include Pod overhead:
```
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default test-pod 2250m (56%) 2250m (56%) 320Mi (1%) 320Mi (1%) 36m
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default test-pod 2250m (56%) 2250m (56%) 320Mi (1%) 320Mi (1%) 36m
```
## Verify Pod cgroup limits
Check the Pod's memory cgroups on the node where the workload is running. In the following example, [`crictl`](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md)
Check the Pod's memory cgroups on the node where the workload is running. In the following example,
[`crictl`](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md)
is used on the node, which provides a CLI for CRI-compatible container runtimes. This is an
advanced example to show PodOverhead behavior, and it is not expected that users should need to check
advanced example to show Pod overhead behavior, and it is not expected that users should need to check
cgroups directly on the node.
First, on the particular node, determine the Pod identifier:
@ -158,17 +158,21 @@ POD_ID="$(sudo crictl pods --name test-pod -q)"
```
From this, you can determine the cgroup path for the Pod:
```bash
# Run this on the node where the Pod is scheduled
sudo crictl inspectp -o=json $POD_ID | grep cgroupsPath
```
The resulting cgroup path includes the Pod's `pause` container. The Pod level cgroup is one directory above.
```
"cgroupsPath": "/kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2/7ccf55aee35dd16aca4189c952d83487297f3cd760f1bbf09620e206e7d0c27a"
"cgroupsPath": "/kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2/7ccf55aee35dd16aca4189c952d83487297f3cd760f1bbf09620e206e7d0c27a"
```
In this specific case, the pod cgroup path is `kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2`. Verify the Pod level cgroup setting for memory:
In this specific case, the pod cgroup path is `kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2`.
Verify the Pod level cgroup setting for memory:
```bash
# Run this on the node where the Pod is scheduled.
# Also, change the name of the cgroup to match the cgroup allocated for your pod.
@ -176,22 +180,20 @@ In this specific case, the pod cgroup path is `kubepods/podd7f4b509-cf94-4951-94
```
This is 320 MiB, as expected:
```
335544320
```
### Observability
A `kube_pod_overhead` metric is available in [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
to help identify when PodOverhead is being utilized and to help observe stability of workloads
running with a defined Overhead. This functionality is not available in the 1.9 release of
kube-state-metrics, but is expected in a following release. Users will need to build kube-state-metrics
from source in the meantime.
Some `kube_pod_overhead_*` metrics are available in [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
to help identify when Pod overhead is being utilized and to help observe stability of workloads
running with a defined overhead.
## {{% heading "whatsnext" %}}
* Learn more about [RuntimeClass](/docs/concepts/containers/runtime-class/)
* Read the [PodOverhead Design](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/688-pod-overhead)
enhancement proposal for extra context
* [RuntimeClass](/docs/concepts/containers/runtime-class/)
* [PodOverhead Design](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/688-pod-overhead)

View File

@ -104,7 +104,7 @@ description: "This priority class should be used for XYZ service pods only."
## Non-preempting PriorityClass {#non-preempting-priority-class}
{{< feature-state for_k8s_version="v1.19" state="beta" >}}
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
Pods with `preemptionPolicy: Never` will be placed in the scheduling queue
ahead of lower-priority pods,
@ -203,9 +203,10 @@ resources reserved for Pod P and also gives users information about preemptions
in their clusters.
Please note that Pod P is not necessarily scheduled to the "nominated Node".
The scheduler always tries the "nominated Node" before iterating over any other nodes.
After victim Pods are preempted, they get their graceful termination period. If
another node becomes available while scheduler is waiting for the victim Pods to
terminate, scheduler will use the other node to schedule Pod P. As a result
terminate, scheduler may use the other node to schedule Pod P. As a result
`nominatedNodeName` and `nodeName` of Pod spec are not always the same. Also, if
scheduler preempts Pods on Node N, but then a higher priority Pod than Pod P
arrives, scheduler may give Node N to the new higher priority Pod. In such a

View File

@ -134,7 +134,7 @@ for the corresponding API object, and then written to the object store (shown as
Kubernetes auditing provides a security-relevant, chronological set of records documenting the sequence of actions in a cluster.
The cluster audits the activities generated by users, by applications that use the Kubernetes API, and by the control plane itself.
For more information, see [Auditing](/docs/tasks/debug-application-cluster/audit/).
For more information, see [Auditing](/docs/tasks/debug/debug-cluster/audit/).
## API server ports and IPs

View File

@ -19,7 +19,7 @@ The Kubernetes [Pod Security Standards](/docs/concepts/security/pod-security-sta
different isolation levels for Pods. These standards let you define how you want to restrict the
behavior of pods in a clear, consistent fashion.
As a Beta feature, Kubernetes offers a built-in _Pod Security_ {{< glossary_tooltip
As a beta feature, Kubernetes offers a built-in _Pod Security_ {{< glossary_tooltip
text="admission controller" term_id="admission-controller" >}}, the successor
to [PodSecurityPolicies](/docs/concepts/security/pod-security-policy/). Pod security restrictions
are applied at the {{< glossary_tooltip text="namespace" term_id="namespace" >}} level when pods
@ -30,25 +30,21 @@ The PodSecurityPolicy API is deprecated and will be
[removed](/docs/reference/using-api/deprecation-guide/#v1-25) from Kubernetes in v1.25.
{{< /note >}}
<!-- body -->
## Enabling the `PodSecurity` admission plugin
## {{% heading "prerequisites" %}}
In v1.23, the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
is a Beta feature and is enabled by default.
To use this mechanism, your cluster must enforce Pod Security admission.
In v1.22, the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
is an Alpha feature and must be enabled in `kube-apiserver` in order to use the built-in admission plugin.
### Built-in Pod Security admission enforcement
```shell
--feature-gates="...,PodSecurity=true"
```
In Kubernetes v{{< skew currentVersion >}}, the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
is a beta feature and is enabled by default. You must have this feature gate enabled.
If you are running a different version of Kubernetes, consult the documentation for that release.
## Alternative: installing the `PodSecurity` admission webhook {#webhook}
### Alternative: installing the `PodSecurity` admission webhook {#webhook}
For environments where the built-in `PodSecurity` admission plugin cannot be used,
either because the cluster is older than v1.22, or the `PodSecurity` feature cannot be enabled,
the `PodSecurity` admission logic is also available as a Beta [validating admission webhook](https://git.k8s.io/pod-security-admission/webhook).
The `PodSecurity` admission logic is also available as a [validating admission webhook](https://git.k8s.io/pod-security-admission/webhook). This implementation is also beta.
For environments where the built-in `PodSecurity` admission plugin cannot be enabled, you can instead enable that logic via a validating admission webhook.
A pre-built container image, certificate generation scripts, and example manifests
are available at [https://git.k8s.io/pod-security-admission/webhook](https://git.k8s.io/pod-security-admission/webhook).
@ -66,6 +62,8 @@ The generated certificate is valid for 2 years. Before it expires,
regenerate the certificate or remove the webhook in favor of the built-in admission plugin.
{{< /note >}}
<!-- body -->
## Pod Security levels
Pod Security admission places requirements on a Pod's [Security
@ -88,7 +86,7 @@ takes if a potential violation is detected:
Mode | Description
:---------|:------------
**enforce** | Policy violations will cause the pod to be rejected.
**audit** | Policy violations will trigger the addition of an audit annotation to the event recorded in the [audit log](/docs/tasks/debug-application-cluster/audit/), but are otherwise allowed.
**audit** | Policy violations will trigger the addition of an audit annotation to the event recorded in the [audit log](/docs/tasks/debug/debug-cluster/audit/), but are otherwise allowed.
**warn** | Policy violations will trigger a user-facing warning, but are otherwise allowed.
{{< /table >}}

View File

@ -658,8 +658,7 @@ added. Capabilities listed in `RequiredDropCapabilities` must not be included in
**DefaultAddCapabilities** - The capabilities which are added to containers by
default, in addition to the runtime defaults. See the
[Docker documentation](https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities)
for the default list of capabilities when using the Docker runtime.
documentation for your container runtime for information on working with Linux capabilities.
### SELinux

View File

@ -29,10 +29,9 @@ This guide outlines the requirements of each policy.
**The _Privileged_ policy is purposely-open, and entirely unrestricted.** This type of policy is
typically aimed at system- and infrastructure-level workloads managed by privileged, trusted users.
The Privileged policy is defined by an absence of restrictions. For allow-by-default enforcement
mechanisms (such as gatekeeper), the Privileged policy may be an absence of applied constraints
rather than an instantiated profile. In contrast, for a deny-by-default mechanism (such as Pod
Security Policy) the Privileged policy should enable all controls (disable all restrictions).
The Privileged policy is defined by an absence of restrictions. Allow-by-default
mechanisms (such as gatekeeper) may be Privileged by default. In contrast, for a deny-by-default mechanism (such as Pod
Security Policy) the Privileged policy should disable all restrictions.
### Baseline
@ -458,6 +457,16 @@ of individual policies are not defined here.
- {{< example file="policy/baseline-psp.yaml" >}}Baseline{{< /example >}}
- {{< example file="policy/restricted-psp.yaml" >}}Restricted{{< /example >}}
### Alternatives
{{% thirdparty-content %}}
Other alternatives for enforcing policies are being developed in the Kubernetes ecosystem, such as:
- [Kubewarden](https://github.com/kubewarden)
- [Kyverno](https://kyverno.io/policies/pod-security/)
- [OPA Gatekeeper](https://github.com/open-policy-agent/gatekeeper)
## FAQ
### Why isn't there a profile between privileged and baseline?
@ -481,14 +490,6 @@ as well as other related parameters outside the Security Context. As of July 202
[Pod Security Policies](/docs/concepts/security/pod-security-policy/) are deprecated in favor of the
built-in [Pod Security Admission Controller](/docs/concepts/security/pod-security-admission/).
{{% thirdparty-content %}}
Other alternatives for enforcing security profiles are being developed in the Kubernetes
ecosystem, such as:
- [OPA Gatekeeper](https://github.com/open-policy-agent/gatekeeper).
- [Kubewarden](https://github.com/kubewarden).
- [Kyverno](https://kyverno.io/policies/pod-security/).
### What profiles should I apply to my Windows Pods?
Windows in Kubernetes has some limitations and differentiators from standard Linux-based

View File

@ -0,0 +1,179 @@
---
reviewers:
title: Role Based Access Control Good Practices
description: >
Principles and practices for good RBAC design for cluster operators.
content_type: concept
---
<!-- overview -->
Kubernetes {{< glossary_tooltip text="RBAC" term_id="rbac" >}} is a key security control
to ensure that cluster users and workloads have only the access to resources required to
execute their roles. It is important to ensure that, when designing permissions for cluster
users, the cluster administrator understands the areas where privilge escalation could occur,
to reduce the risk of excessive access leading to security incidents.
The good practices laid out here should be read in conjunction with the general [RBAC documentation](/docs/reference/access-authn-authz/rbac/#restrictions-on-role-creation-or-update).
<!-- body -->
## General good practice
### Least privilege
Ideally minimal RBAC rights should be assigned to users and service accounts. Only permissions
explicitly required for their operation should be used. Whilst each cluster will be different,
some general rules that can be applied are :
- Assign permissions at the namespace level where possible. Use RoleBindings as opposed to
ClusterRoleBindings to give users rights only within a specific namespace.
- Avoid providing wildcard permissions when possible, especially to all resources.
As Kubernetes is an extensible system, providing wildcard access gives rights
not just to all object types presently in the cluster, but also to all future object types
which are created in the future.
- Administrators should not use `cluster-admin` accounts except where specifically needed.
Providing a low privileged account with [impersonation rights](/docs/reference/access-authn-authz/authentication/#user-impersonation)
can avoid accidental modification of cluster resources.
- Avoid adding users to the `system:masters` group. Any user who is a member of this group
bypasses all RBAC rights checks and will always have unrestricted superuser access, which cannot be
revoked by removing RoleBindings or ClusterRoleBindings. As an aside, if a cluster is
using an authorization webhook, membership of this group also bypasses that webhook (requests
from users who are members of that group are never sent to the webhook)
### Minimize distribution of privileged tokens
Ideally, pods shouldn't be assigned service accounts that have been granted powerful permissions (for example, any of the rights listed under
[privilege escalation risks](#privilege-escalation-risks)).
In cases where a workload requires powerful permissions, consider the following practices:
- Limit the number of nodes running powerful pods. Ensure that any DaemonSets you run
are necessary and are run with least privilege to limit the blast radius of container escapes.
- Avoid running powerful pods alongside untrusted or publicly-exposed ones. Consider using
[Taints and Toleration](/docs/concepts/scheduling-eviction/taint-and-toleration/), [NodeAffinity](/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity), or [PodAntiAffinity](/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity) to ensure
pods don't run alongside untrusted or less-trusted Pods. Pay especial attention to
situations where less-trustworthy Pods are not meeting the **Restricted** Pod Security Standard.
### Hardening
Kubernetes defaults to providing access which may not be required in every cluster. Reviewing
the RBAC rights provided by default can provide opportunities for security hardening.
In general, changes should not be made to rights provided to `system:` accounts some options
to harden cluster rights exist:
- Review bindings for the `system:unauthenticated` group and remove where possible, as this gives
access to anyone who can contact the API server at a network level.
- Avoid the default auto-mounting of service account tokens by setting
`automountServiceAccountToken: false`. For more details, see
[using default service account token](/docs/tasks/configure-pod-container/configure-service-account/#use-the-default-service-account-to-access-the-api-server).
Setting this value for a Pod will overwrite the service account setting, workloads
which require service account tokens can still mount them.
### Periodic review
It is vital to periodically review the Kubernetes RBAC settings for redundant entries and
possible privilege escalations.
If an attacker is able to create a user account with the same name as a deleted user,
they can automatically inherit all the rights of the deleted user, especially the
rights assigned to that user.
## Kubernetes RBAC - privilege escalation risks {#privilege-escalation-risks}
Within Kubernetes RBAC there are a number of privileges which, if granted, can allow a user or a service account
to escalate their privileges in the cluster or affect systems outside the cluster.
This section is intended to provide visibility of the areas where cluster operators
should take care, to ensure that they do not inadvertently allow for more access to clusters than intended.
### Listing secrets
It is generally clear that allowing `get` access on Secrets will allow a user to read their contents.
It is also important to note that `list` and `watch` access also effectively allow for users to reveal the Secret contents.
For example, when a List response is returned (for example, via `kubectl get secrets -A -o yaml`), the response
includes the contents of all Secrets.
### Workload creation
Users who are able to create workloads (either Pods, or
[workload resources](/docs/concepts/workloads/controllers/) that manage Pods) will
be able to gain access to the underlying node unless restrictions based on the Kubernetes
[Pod Security Standards](/docs/concepts/security/pod-security-standards/) are in place.
Users who can run privileged Pods can use that access to gain node access and potentially to
further elevate their privileges. Where you do not fully trust a user or other principal
with the ability to create suitably secure and isolated Pods, you should enforce either the
**Baseline** or **Restricted** Pod Security Standard.
You can use [Pod Security admission](/docs/concepts/security/pod-security-admission/)
or other (third party) mechanisms to implement that enforcement.
You can also use the deprecated [PodSecurityPolicy](/docs/concepts/policy/pod-security-policy/) mechanism
to restrict users' abilities to create privileged Pods (N.B. PodSecurityPolicy is scheduled for removal
in version 1.25).
Creating a workload in a namespace also grants indirect access to Secrets in that namespace.
Creating a pod in kube-system or a similarly privileged namespace can grant a user access to
Secrets they would not have through RBAC directly.
### Persistent volume creation
As noted in the [PodSecurityPolicy](/docs/concepts/policy/pod-security-policy/#volumes-and-file-systems) documentation, access to create PersistentVolumes can allow for escalation of access to the underlying host. Where access to persistent storage is required trusted administrators should create
PersistentVolumes, and constrained users should use PersistentVolumeClaims to access that storage.
### Access to `proxy` subresource of Nodes
Users with access to the proxy sub-resource of node objects have rights to the Kubelet API,
which allows for command execution on every pod on the node(s) which they have rights to.
This access bypasses audit logging and admission control, so care should be taken before
granting rights to this resource.
### Escalate verb
Generally the RBAC system prevents users from creating clusterroles with more rights than
they possess. The exception to this is the `escalate` verb. As noted in the [RBAC documentation](/docs/reference/access-authn-authz/rbac/#restrictions-on-role-creation-or-update),
users with this right can effectively escalate their privileges.
### Bind verb
Similar to the `escalate` verb, granting users this right allows for bypass of Kubernetes
in-built protections against privilege escalation, allowing users to create bindings to
roles with rights they do not already have.
### Impersonate verb
This verb allows users to impersonate and gain the rights of other users in the cluster.
Care should be taken when granting it, to ensure that excessive permissions cannot be gained
via one of the impersonated accounts.
### CSRs and certificate issuing
The CSR API allows for users with `create` rights to CSRs and `update` rights on `certificatesigningrequests/approval`
where the signer is `kubernetes.io/kube-apiserver-client` to create new client certificates
which allow users to authenticate to the cluster. Those client certificates can have arbitrary
names including duplicates of Kubernetes system components. This will effectively allow for privilege escalation.
### Token request
Users with `create` rights on `serviceaccounts/token` can create TokenRequests to issue
tokens for existing service accounts.
### Control admission webhooks
Users with control over `validatingwebhookconfigurations` or `mutatingwebhookconfigurations`
can control webhooks that can read any object admitted to the cluster, and in the case of
mutating webhooks, also mutate admitted objects.
## Kubernetes RBAC - denial of service risks {#denial-of-service-risks}
### Object creation denial-of-service {#object-creation-dos}
Users who have rights to create objects in a cluster may be able to create sufficient large
objects to create a denial of service condition either based on the size or number of objects, as discussed in
[etcd used by Kubernetes is vulnerable to OOM attack](https://github.com/kubernetes/kubernetes/issues/107325). This may be
specifically relevant in multi-tenant clusters if semi-trusted or untrusted users
are allowed limited access to a system.
One option for mitigation of this issue would be to use [resource quotas](/docs/concepts/policy/resource-quotas/#object-count-quota)
to limit the quantity of objects which can be created.
## {{% heading "whatsnext" %}}
* To learn more about RBAC, see the [RBAC documentation](/docs/reference/access-authn-authz/rbac/).

View File

@ -0,0 +1,55 @@
---
reviewers:
- jayunit100
- jsturtevant
- marosset
- perithompson
title: Security For Windows Nodes
content_type: concept
weight: 75
---
<!-- overview -->
This page describes security considerations and best practices specific to the Windows operating system.
<!-- body -->
## Protection for Secret data on nodes
On Windows, data from Secrets are written out in clear text onto the node's local
storage (as compared to using tmpfs / in-memory filesystems on Linux). As a cluster
operator, you should take both of the following additional measures:
1. Use file ACLs to secure the Secrets' file location.
1. Apply volume-level encryption using [BitLocker](https://docs.microsoft.com/windows/security/information-protection/bitlocker/bitlocker-how-to-deploy-on-windows-server).
## Container users
[RunAsUsername](/docs/tasks/configure-pod-container/configure-runasusername)
can be specified for Windows Pods or containers to execute the container
processes as specific user. This is roughly equivalent to
[RunAsUser](/docs/concepts/policy/pod-security-policy/#users-and-groups).
Windows containers offer two default user accounts, ContainerUser and ContainerAdministrator.
The differences between these two user accounts are covered in
[When to use ContainerAdmin and ContainerUser user accounts](https://docs.microsoft.com/virtualization/windowscontainers/manage-containers/container-security#when-to-use-containeradmin-and-containeruser-user-accounts) within Microsoft's _Secure Windows containers_ documentation.
Local users can be added to container images during the container build process.
{{< note >}}
* [Nano Server](https://hub.docker.com/_/microsoft-windows-nanoserver) based images run as `ContainerUser` by default
* [Server Core](https://hub.docker.com/_/microsoft-windows-servercore) based images run as `ContainerAdministrator` by default
{{< /note >}}
Windows containers can also run as Active Directory identities by utilizing [Group Managed Service Accounts](/docs/tasks/configure-pod-container/configure-gmsa/)
## Pod-level security isolation
Linux-specific pod security context mechanisms (such as SELinux, AppArmor, Seccomp, or custom
POSIX capabilities) are not supported on Windows nodes.
Privileged containers are [not supported](/docs/concepts/windows/intro/#compatibility-v1-pod-spec-containers-securitycontext) on Windows.
Instead [HostProcess containers](/docs/tasks/configure-pod-container/create-hostprocess-pod) can be used on Windows to perform many of the tasks performed by privileged containers on Linux.

View File

@ -7,26 +7,25 @@ description: >
## The Kubernetes network model
Every [`Pod`](/docs/concepts/workloads/pods/) gets its own IP address.
Every [`Pod`](/docs/concepts/workloads/pods/) in a cluster gets its own unique cluster-wide IP address.
This means you do not need to explicitly create links between `Pods` and you
almost never need to deal with mapping container ports to host ports.
This creates a clean, backwards-compatible model where `Pods` can be treated
much like VMs or physical hosts from the perspectives of port allocation,
naming, service discovery, [load balancing](/docs/concepts/services-networking/ingress/#load-balancing), application configuration,
and migration.
naming, service discovery, [load balancing](/docs/concepts/services-networking/ingress/#load-balancing),
application configuration, and migration.
Kubernetes imposes the following fundamental requirements on any networking
implementation (barring any intentional network segmentation policies):
* pods on a [node](/docs/concepts/architecture/nodes/) can communicate with all pods on all nodes without NAT
* pods can communicate with all other pods on any other [node](/docs/concepts/architecture/nodes/)
without NAT
* agents on a node (e.g. system daemons, kubelet) can communicate with all
pods on that node
Note: For those platforms that support `Pods` running in the host network (e.g.
Linux):
* pods in the host network of a node can communicate with all pods on all
nodes without NAT
Linux), when pods are attached to the host network of a node they can still communicate
with all pods on all nodes without NAT.
This model is not only less complex overall, but it is principally compatible
with the desire for Kubernetes to enable low-friction porting of apps from VMs

View File

@ -8,8 +8,8 @@ weight: 20
---
<!-- overview -->
Kubernetes creates DNS records for services and pods. You can contact
services with consistent DNS names instead of IP addresses.
Kubernetes creates DNS records for Services and Pods. You can contact
Services with consistent DNS names instead of IP addresses.
<!-- body -->
@ -25,20 +25,20 @@ Pod's own namespace and the cluster's default domain.
### Namespaces of Services
A DNS query may return different results based on the namespace of the pod making
it. DNS queries that don't specify a namespace are limited to the pod's
namespace. Access services in other namespaces by specifying it in the DNS query.
A DNS query may return different results based on the namespace of the Pod making
it. DNS queries that don't specify a namespace are limited to the Pod's
namespace. Access Services in other namespaces by specifying it in the DNS query.
For example, consider a pod in a `test` namespace. A `data` service is in
For example, consider a Pod in a `test` namespace. A `data` Service is in
the `prod` namespace.
A query for `data` returns no results, because it uses the pod's `test` namespace.
A query for `data` returns no results, because it uses the Pod's `test` namespace.
A query for `data.prod` returns the intended result, because it specifies the
namespace.
DNS queries may be expanded using the pod's `/etc/resolv.conf`. Kubelet
sets this file for each pod. For example, a query for just `data` may be
DNS queries may be expanded using the Pod's `/etc/resolv.conf`. Kubelet
sets this file for each Pod. For example, a query for just `data` may be
expanded to `data.test.svc.cluster.local`. The values of the `search` option
are used to expand queries. To learn more about DNS queries, see
[the `resolv.conf` manual page.](https://www.man7.org/linux/man-pages/man5/resolv.conf.5.html)
@ -49,7 +49,7 @@ search <namespace>.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
```
In summary, a pod in the _test_ namespace can successfully resolve either
In summary, a Pod in the _test_ namespace can successfully resolve either
`data.prod` or `data.prod.svc.cluster.local`.
### DNS Records
@ -70,14 +70,14 @@ For more up-to-date specification, see
### A/AAAA records
"Normal" (not headless) Services are assigned a DNS A or AAAA record,
depending on the IP family of the service, for a name of the form
depending on the IP family of the Service, for a name of the form
`my-svc.my-namespace.svc.cluster-domain.example`. This resolves to the cluster IP
of the Service.
"Headless" (without a cluster IP) Services are also assigned a DNS A or AAAA record,
depending on the IP family of the service, for a name of the form
depending on the IP family of the Service, for a name of the form
`my-svc.my-namespace.svc.cluster-domain.example`. Unlike normal
Services, this resolves to the set of IPs of the pods selected by the Service.
Services, this resolves to the set of IPs of the Pods selected by the Service.
Clients are expected to consume the set or else use standard round-robin
selection from the set.
@ -87,36 +87,36 @@ SRV Records are created for named ports that are part of normal or [Headless
Services](/docs/concepts/services-networking/service/#headless-services).
For each named port, the SRV record would have the form
`_my-port-name._my-port-protocol.my-svc.my-namespace.svc.cluster-domain.example`.
For a regular service, this resolves to the port number and the domain name:
For a regular Service, this resolves to the port number and the domain name:
`my-svc.my-namespace.svc.cluster-domain.example`.
For a headless service, this resolves to multiple answers, one for each pod
that is backing the service, and contains the port number and the domain name of the pod
For a headless Service, this resolves to multiple answers, one for each Pod
that is backing the Service, and contains the port number and the domain name of the Pod
of the form `auto-generated-name.my-svc.my-namespace.svc.cluster-domain.example`.
## Pods
### A/AAAA records
In general a pod has the following DNS resolution:
In general a Pod has the following DNS resolution:
`pod-ip-address.my-namespace.pod.cluster-domain.example`.
For example, if a pod in the `default` namespace has the IP address 172.17.0.3,
For example, if a Pod in the `default` namespace has the IP address 172.17.0.3,
and the domain name for your cluster is `cluster.local`, then the Pod has a DNS name:
`172-17-0-3.default.pod.cluster.local`.
Any pods exposed by a Service have the following DNS resolution available:
Any Pods exposed by a Service have the following DNS resolution available:
`pod-ip-address.service-name.my-namespace.svc.cluster-domain.example`.
### Pod's hostname and subdomain fields
Currently when a pod is created, its hostname is the Pod's `metadata.name` value.
Currently when a Pod is created, its hostname is the Pod's `metadata.name` value.
The Pod spec has an optional `hostname` field, which can be used to specify the
Pod's hostname. When specified, it takes precedence over the Pod's name to be
the hostname of the pod. For example, given a Pod with `hostname` set to
the hostname of the Pod. For example, given a Pod with `hostname` set to
"`my-host`", the Pod will have its hostname set to "`my-host`".
The Pod spec also has an optional `subdomain` field which can be used to specify
@ -173,14 +173,14 @@ spec:
name: busybox
```
If there exists a headless service in the same namespace as the pod and with
If there exists a headless Service in the same namespace as the Pod and with
the same name as the subdomain, the cluster's DNS Server also returns an A or AAAA
record for the Pod's fully qualified hostname.
For example, given a Pod with the hostname set to "`busybox-1`" and the subdomain set to
"`default-subdomain`", and a headless Service named "`default-subdomain`" in
the same namespace, the pod will see its own FQDN as
the same namespace, the Pod will see its own FQDN as
"`busybox-1.default-subdomain.my-namespace.svc.cluster-domain.example`". DNS serves an
A or AAAA record at that name, pointing to the Pod's IP. Both pods "`busybox1`" and
A or AAAA record at that name, pointing to the Pod's IP. Both Pods "`busybox1`" and
"`busybox2`" can have their distinct A or AAAA records.
The Endpoints object can specify the `hostname` for any endpoint addresses,
@ -189,7 +189,7 @@ along with its IP.
{{< note >}}
Because A or AAAA records are not created for Pod names, `hostname` is required for the Pod's A or AAAA
record to be created. A Pod with no `hostname` but with `subdomain` will only create the
A or AAAA record for the headless service (`default-subdomain.my-namespace.svc.cluster-domain.example`),
A or AAAA record for the headless Service (`default-subdomain.my-namespace.svc.cluster-domain.example`),
pointing to the Pod's IP address. Also, Pod needs to become ready in order to have a
record unless `publishNotReadyAddresses=True` is set on the Service.
{{< /note >}}
@ -205,17 +205,17 @@ When you set `setHostnameAsFQDN: true` in the Pod spec, the kubelet writes the P
{{< note >}}
In Linux, the hostname field of the kernel (the `nodename` field of `struct utsname`) is limited to 64 characters.
If a Pod enables this feature and its FQDN is longer than 64 character, it will fail to start. The Pod will remain in `Pending` status (`ContainerCreating` as seen by `kubectl`) generating error events, such as Failed to construct FQDN from pod hostname and cluster domain, FQDN `long-FQDN` is too long (64 characters is the max, 70 characters requested). One way of improving user experience for this scenario is to create an [admission webhook controller](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks) to control FQDN size when users create top level objects, for example, Deployment.
If a Pod enables this feature and its FQDN is longer than 64 character, it will fail to start. The Pod will remain in `Pending` status (`ContainerCreating` as seen by `kubectl`) generating error events, such as Failed to construct FQDN from Pod hostname and cluster domain, FQDN `long-FQDN` is too long (64 characters is the max, 70 characters requested). One way of improving user experience for this scenario is to create an [admission webhook controller](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks) to control FQDN size when users create top level objects, for example, Deployment.
{{< /note >}}
### Pod's DNS Policy
DNS policies can be set on a per-pod basis. Currently Kubernetes supports the
following pod-specific DNS policies. These policies are specified in the
DNS policies can be set on a per-Pod basis. Currently Kubernetes supports the
following Pod-specific DNS policies. These policies are specified in the
`dnsPolicy` field of a Pod Spec.
- "`Default`": The Pod inherits the name resolution configuration from the node
that the pods run on.
that the Pods run on.
See [related discussion](/docs/tasks/administer-cluster/dns-custom-nameservers)
for more details.
- "`ClusterFirst`": Any DNS query that does not match the configured cluster
@ -226,6 +226,7 @@ following pod-specific DNS policies. These policies are specified in the
for details on how DNS queries are handled in those cases.
- "`ClusterFirstWithHostNet`": For Pods running with hostNetwork, you should
explicitly set its DNS policy "`ClusterFirstWithHostNet`".
- Note: This is not supported on Windows. See [below](#dns-windows) for details
- "`None`": It allows a Pod to ignore DNS settings from the Kubernetes
environment. All DNS settings are supposed to be provided using the
`dnsConfig` field in the Pod Spec.
@ -306,7 +307,7 @@ For IPv6 setup, search path and name server should be setup like this:
kubectl exec -it dns-example -- cat /etc/resolv.conf
```
The output is similar to this:
```shell
```
nameserver fd00:79:30::a
search default.svc.cluster-domain.example svc.cluster-domain.example cluster-domain.example
options ndots:5
@ -323,8 +324,25 @@ If the feature gate `ExpandedDNSConfig` is enabled for the kube-apiserver and
the kubelet, it is allowed for Kubernetes to have at most 32 search domains and
a list of search domains of up to 2048 characters.
## {{% heading "whatsnext" %}}
## DNS resolution on Windows nodes {#dns-windows}
- ClusterFirstWithHostNet is not supported for Pods that run on Windows nodes.
Windows treats all names with a `.` as a FQDN and skips FQDN resolution.
- On Windows, there are multiple DNS resolvers that can be used. As these come with
slightly different behaviors, using the
[`Resolve-DNSName`](https://docs.microsoft.com/powershell/module/dnsclient/resolve-dnsname)
powershell cmdlet for name query resolutions is recommended.
- On Linux, you have a DNS suffix list, which is used after resolution of a name as fully
qualified has failed.
On Windows, you can only have 1 DNS suffix, which is the DNS suffix associated with that
Pod's namespace (example: `mydns.svc.cluster.local`). Windows can resolve FQDNs, Services,
or network name which can be resolved with this single suffix. For example, a Pod spawned
in the `default` namespace, will have the DNS suffix `default.svc.cluster.local`.
Inside a Windows Pod, you can resolve both `kubernetes.default.svc.cluster.local`
and `kubernetes`, but not the partially qualified names (`kubernetes.default` or
`kubernetes.default.svc`).
## {{% heading "whatsnext" %}}
For guidance on administering DNS configurations, check
[Configure DNS Service](/docs/tasks/administer-cluster/dns-custom-nameservers/)

View File

@ -43,7 +43,7 @@ The following prerequisites are needed in order to utilize IPv4/IPv6 dual-stack
Kubernetes versions, refer to the documentation for that version
of Kubernetes.
* Provider support for dual-stack networking (Cloud provider or otherwise must be able to provide Kubernetes nodes with routable IPv4/IPv6 network interfaces)
* A network plugin that supports dual-stack (such as Kubenet or Calico)
* A [network plugin](/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/) that supports dual-stack networking.
## Configure IPv4/IPv6 dual-stack
@ -239,6 +239,21 @@ If you want to enable egress traffic in order to reach off-cluster destinations
Ensure your {{< glossary_tooltip text="CNI" term_id="cni" >}} provider supports IPv6.
{{< /note >}}
## Windows support
Kubernetes on Windows does not support single-stack "IPv6-only" networking. However,
dual-stack IPv4/IPv6 networking for pods and nodes with single-family services
is supported.
You can use IPv4/IPv6 dual-stack networking with `l2bridge` networks.
{{< note >}}
Overlay (VXLAN) networks on Windows **do not** support dual-stack networking.
{{< /note >}}
You can read more about the different network modes for Windows within the
[Networking on Windows](/docs/concepts/services-networking/windows-networking#network-modes) topic.
## {{% heading "whatsnext" %}}

View File

@ -30,23 +30,8 @@ For clarity, this guide defines the following terms:
Traffic routing is controlled by rules defined on the Ingress resource.
Here is a simple example where an Ingress sends all its traffic to one Service:
{{< mermaid >}}
graph LR;
client([client])-. Ingress-managed <br> load balancer .->ingress[Ingress];
ingress-->|routing rule|service[Service];
subgraph cluster
ingress;
service-->pod1[Pod];
service-->pod2[Pod];
end
classDef plain fill:#ddd,stroke:#fff,stroke-width:4px,color:#000;
classDef k8s fill:#326ce5,stroke:#fff,stroke-width:4px,color:#fff;
classDef cluster fill:#fff,stroke:#bbb,stroke-width:2px,color:#326ce5;
class ingress,service,pod1,pod2 k8s;
class client plain;
class cluster cluster;
{{</ mermaid >}}
{{< figure src="/docs/images/ingress.svg" alt="ingress-diagram" class="diagram-large" caption="Figure. Ingress" link="https://mermaid.live/edit#pako:eNqNkstuwyAQRX8F4U0r2VHqPlSRKqt0UamLqlnaWWAYJygYLB59KMm_Fxcix-qmGwbuXA7DwAEzzQETXKutof0Ovb4vaoUQkwKUu6pi3FwXM_QSHGBt0VFFt8DRU2OWSGrKUUMlVQwMmhVLEV1Vcm9-aUksiuXRaO_CEhkv4WjBfAgG1TrGaLa-iaUw6a0DcwGI-WgOsF7zm-pN881fvRx1UDzeiFq7ghb1kgqFWiElyTjnuXVG74FkbdumefEpuNuRu_4rZ1pqQ7L5fL6YQPaPNiFuywcG9_-ihNyUkm6YSONWkjVNM8WUIyaeOJLO3clTB_KhL8NQDmVe-OJjxgZM5FhFiiFTK5zjDkxHBQ9_4zB4a-x20EGNSZhyaKmXrg7f5hSsvufUwTMXThtMWiot5Jh6p9ffimHijIezaSVoeN0uiqcfMJvf7w" >}}
An Ingress may be configured to give Services externally-reachable URLs, load balance traffic, terminate SSL / TLS, and offer name-based virtual hosting. An [Ingress controller](/docs/concepts/services-networking/ingress-controllers) is responsible for fulfilling the Ingress, usually with a load balancer, though it may also configure your edge router or additional frontends to help handle the traffic.
@ -74,7 +59,7 @@ A minimal Ingress resource example:
{{< codenew file="service/networking/minimal-ingress.yaml" >}}
As with all other Kubernetes resources, an Ingress needs `apiVersion`, `kind`, and `metadata` fields.
An Ingress needs `apiVersion`, `kind`, `metadata` and `spec` fields.
The name of an Ingress object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
For general information about working with config files, see [deploying applications](/docs/tasks/run-application/run-stateless-application-deployment/), [configuring containers](/docs/tasks/configure-pod-container/configure-pod-configmap/), [managing resources](/docs/concepts/cluster-administration/manage-deployment/).
@ -398,25 +383,8 @@ A fanout configuration routes traffic from a single IP address to more than one
based on the HTTP URI being requested. An Ingress allows you to keep the number of load balancers
down to a minimum. For example, a setup like:
{{< mermaid >}}
graph LR;
client([client])-. Ingress-managed <br> load balancer .->ingress[Ingress, 178.91.123.132];
ingress-->|/foo|service1[Service service1:4200];
ingress-->|/bar|service2[Service service2:8080];
subgraph cluster
ingress;
service1-->pod1[Pod];
service1-->pod2[Pod];
service2-->pod3[Pod];
service2-->pod4[Pod];
end
classDef plain fill:#ddd,stroke:#fff,stroke-width:4px,color:#000;
classDef k8s fill:#326ce5,stroke:#fff,stroke-width:4px,color:#fff;
classDef cluster fill:#fff,stroke:#bbb,stroke-width:2px,color:#326ce5;
class ingress,service1,service2,pod1,pod2,pod3,pod4 k8s;
class client plain;
class cluster cluster;
{{</ mermaid >}}
{{< figure src="/docs/images/ingressFanOut.svg" alt="ingress-fanout-diagram" class="diagram-large" caption="Figure. Ingress Fan Out" link="https://mermaid.live/edit#pako:eNqNUslOwzAQ_RXLvYCUhMQpUFzUUzkgcUBwbHpw4klr4diR7bCo8O8k2FFbFomLPZq3jP00O1xpDpjijWHtFt09zAuFUCUFKHey8vf6NE7QrdoYsDZumGIb4Oi6NAskNeOoZJKpCgxK4oXwrFVgRyi7nCVXWZKRPMlysv5yD6Q4Xryf1Vq_WzDPooJs9egLNDbolKTpT03JzKgh3zWEztJZ0Niu9L-qZGcdmAMfj4cxvWmreba613z9C0B-AMQD-V_AdA-A4j5QZu0SatRKJhSqhZR0wjmPrDP6CeikrutQxy-Cuy2dtq9RpaU2dJKm6fzI5Glmg0VOLio4_5dLjx27hFSC015KJ2VZHtuQvY2fuHcaE43G0MaCREOow_FV5cMxHZ5-oPX75UM5avuXhXuOI9yAaZjg_aLuBl6B3RYaKDDtSw4166QrcKE-emrXcubghgunDaY1kxYizDqnH99UhakzHYykpWD9hjS--fEJoIELqQ" >}}
would require an Ingress such as:
@ -460,25 +428,7 @@ you are using, you may need to create a default-http-backend
Name-based virtual hosts support routing HTTP traffic to multiple host names at the same IP address.
{{< mermaid >}}
graph LR;
client([client])-. Ingress-managed <br> load balancer .->ingress[Ingress, 178.91.123.132];
ingress-->|Host: foo.bar.com|service1[Service service1:80];
ingress-->|Host: bar.foo.com|service2[Service service2:80];
subgraph cluster
ingress;
service1-->pod1[Pod];
service1-->pod2[Pod];
service2-->pod3[Pod];
service2-->pod4[Pod];
end
classDef plain fill:#ddd,stroke:#fff,stroke-width:4px,color:#000;
classDef k8s fill:#326ce5,stroke:#fff,stroke-width:4px,color:#fff;
classDef cluster fill:#fff,stroke:#bbb,stroke-width:2px,color:#326ce5;
class ingress,service1,service2,pod1,pod2,pod3,pod4 k8s;
class client plain;
class cluster cluster;
{{</ mermaid >}}
{{< figure src="/docs/images/ingressNameBased.svg" alt="ingress-namebase-diagram" class="diagram-large" caption="Figure. Ingress Name Based Virtual hosting" link="https://mermaid.live/edit#pako:eNqNkl9PwyAUxb8KYS-atM1Kp05m9qSJJj4Y97jugcLtRqTQAPVPdN_dVlq3qUt8gZt7zvkBN7xjbgRgiteW1Rt0_zjLNUJcSdD-ZBn21WmcoDu9tuBcXDHN1iDQVWHnSBkmUMEU0xwsSuK5DK5l745QejFNLtMkJVmSZmT1Re9NcTz_uDXOU1QakxTMJtxUHw7ss-SQLhehQEODTsdH4l20Q-zFyc84-Y67pghv5apxHuweMuj9eS2_NiJdPhix-kMgvwQShOyYMNkJoEUYM3PuGkpUKyY1KqVSdCSEiJy35gnoqCzLvo5fpPAbOqlfI26UsXQ0Ho9nB5CnqesRGTnncPYvSqsdUvqp9KRdlI6KojjEkB0mnLgjDRONhqENBYm6oXbLV5V1y6S7-l42_LowlIN2uFm_twqOcAW2YlK0H_i9c-bYb6CCHNO2FFCyRvkc53rbWptaMA83QnpjMS2ZchBh1nizeNMcU28bGEzXkrV_pArN7Sc0rBTu" >}}
The following Ingress tells the backing load balancer to route requests based on

View File

@ -45,42 +45,7 @@ See the [NetworkPolicy](/docs/reference/generated/kubernetes-api/{{< param "vers
An example NetworkPolicy might look like this:
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: test-network-policy
namespace: default
spec:
podSelector:
matchLabels:
role: db
policyTypes:
- Ingress
- Egress
ingress:
- from:
- ipBlock:
cidr: 172.17.0.0/16
except:
- 172.17.1.0/24
- namespaceSelector:
matchLabels:
project: myproject
- podSelector:
matchLabels:
role: frontend
ports:
- protocol: TCP
port: 6379
egress:
- to:
- ipBlock:
cidr: 10.0.0.0/24
ports:
- protocol: TCP
port: 5978
```
{{< codenew file="service/networking/networkpolicy.yaml" >}}
{{< note >}}
POSTing this to the API server for your cluster will have no effect unless your chosen networking solution supports network policy.
@ -89,7 +54,7 @@ POSTing this to the API server for your cluster will have no effect unless your
__Mandatory Fields__: As with all other Kubernetes config, a NetworkPolicy
needs `apiVersion`, `kind`, and `metadata` fields. For general information
about working with config files, see
[Configure Containers Using a ConfigMap](/docs/tasks/configure-pod-container/configure-pod-configmap/),
[Configure a Pod to Use a ConfigMap](/docs/tasks/configure-pod-container/configure-pod-configmap/),
and [Object Management](/docs/concepts/overview/working-with-objects/object-management).
__spec__: NetworkPolicy [spec](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status) has all the information needed to define a particular network policy in the given namespace.

View File

@ -122,7 +122,7 @@ metadata:
spec:
containers:
- name: nginx
image: nginx:11.14.2
image: nginx:stable
ports:
- containerPort: 80
name: http-web-svc
@ -192,6 +192,7 @@ where it's running, by adding an Endpoints object manually:
apiVersion: v1
kind: Endpoints
metadata:
# the name here should match the name of the Service
name: my-service
subsets:
- addresses:
@ -203,6 +204,10 @@ subsets:
The name of the Endpoints object must be a valid
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
When you create an [Endpoints](docs/reference/kubernetes-api/service-resources/endpoints-v1/)
object for a Service, you set the name of the new object to be the same as that
of the Service.
{{< note >}}
The endpoint IPs _must not_ be: loopback (127.0.0.0/8 for IPv4, ::1/128 for IPv6), or
link-local (169.254.0.0/16 and 224.0.0.0/24 for IPv4, fe80::/64 for IPv6).
@ -394,6 +399,10 @@ You can also set the maximum session sticky time by setting
`service.spec.sessionAffinityConfig.clientIP.timeoutSeconds` appropriately.
(the default value is 10800, which works out to be 3 hours).
{{< note >}}
On Windows, setting the maximum session sticky time for Services is not supported.
{{< /note >}}
## Multi-Port Services
For some Services, you need to expose more than one port.
@ -447,7 +456,7 @@ server will return a 422 HTTP status code to indicate that there's a problem.
You can set the `spec.externalTrafficPolicy` field to control how traffic from external sources is routed.
Valid values are `Cluster` and `Local`. Set the field to `Cluster` to route external traffic to all ready endpoints
and `Local` to only route to ready node-local endpoints. If the traffic policy is `Local` and there are are no node-local
and `Local` to only route to ready node-local endpoints. If the traffic policy is `Local` and there are no node-local
endpoints, the kube-proxy does not forward any traffic for the relevant Service.
{{< note >}}
@ -701,23 +710,25 @@ Specify the assigned IP address as loadBalancerIP. Ensure that you have updated
#### Load balancers with mixed protocol types
{{< feature-state for_k8s_version="v1.20" state="alpha" >}}
{{< feature-state for_k8s_version="v1.24" state="beta" >}}
By default, for LoadBalancer type of Services, when there is more than one port defined, all
ports must have the same protocol, and the protocol must be one which is supported
by the cloud provider.
If the feature gate `MixedProtocolLBService` is enabled for the kube-apiserver it is allowed to use different protocols when there is more than one port defined.
The feature gate `MixedProtocolLBService` (enabled by default for the kube-apiserver as of v1.24) allows the use of
different protocols for LoadBalancer type of Services, when there is more than one port defined.
{{< note >}}
The set of protocols that can be used for LoadBalancer type of Services is still defined by the cloud provider.
The set of protocols that can be used for LoadBalancer type of Services is still defined by the cloud provider. If a
cloud provider does not support mixed protocols they will provide only a single protocol.
{{< /note >}}
#### Disabling load balancer NodePort allocation {#load-balancer-nodeport-allocation}
{{< feature-state for_k8s_version="v1.22" state="beta" >}}
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
You can optionally disable node port allocation for a Service of `type=LoadBalancer`, by setting
the field `spec.allocateLoadBalancerNodePorts` to `false`. This should only be used for load balancer implementations
@ -725,20 +736,12 @@ that route traffic directly to pods as opposed to using node ports. By default,
is `true` and type LoadBalancer Services will continue to allocate node ports. If `spec.allocateLoadBalancerNodePorts`
is set to `false` on an existing Service with allocated node ports, those node ports will **not** be de-allocated automatically.
You must explicitly remove the `nodePorts` entry in every Service port to de-allocate those node ports.
Your cluster must have the `ServiceLBNodePortControl`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
enabled to use this field.
For Kubernetes v{{< skew currentVersion >}}, this feature gate is enabled by default,
and you can use the `spec.allocateLoadBalancerNodePorts` field. For clusters running
other versions of Kubernetes, check the documentation for that release.
#### Specifying class of load balancer implementation {#load-balancer-class}
{{< feature-state for_k8s_version="v1.22" state="beta" >}}
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
`spec.loadBalancerClass` enables you to use a load balancer implementation other than the cloud provider default.
Your cluster must have the `ServiceLoadBalancerClass` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) enabled to use this field. For Kubernetes v{{< skew currentVersion >}}, this feature gate is enabled by default. For clusters running
other versions of Kubernetes, check the documentation for that release.
By default, `spec.loadBalancerClass` is `nil` and a `LoadBalancer` type of Service uses
the cloud provider's default load balancer implementation if the cluster is configured with
a cloud provider using the `--cloud-provider` component flag.
@ -1254,7 +1257,8 @@ someone else's choice. That is an isolation failure.
In order to allow you to choose a port number for your Services, we must
ensure that no two Services can collide. Kubernetes does that by allocating each
Service its own IP address.
Service its own IP address from within the `service-cluster-ip-range`
CIDR range that is configured for the API server.
To ensure each Service receives a unique IP, an internal allocator atomically
updates a global allocation map in {{< glossary_tooltip term_id="etcd" >}}
@ -1268,6 +1272,25 @@ in-memory locking). Kubernetes also uses controllers to check for invalid
assignments (eg due to administrator intervention) and for cleaning up allocated
IP addresses that are no longer used by any Services.
#### IP address ranges for `type: ClusterIP` Services {#service-ip-static-sub-range}
{{< feature-state for_k8s_version="v1.24" state="alpha" >}}
However, there is a problem with this `ClusterIP` allocation strategy, because a user
can also [choose their own address for the service](#choosing-your-own-ip-address).
This could result in a conflict if the internal allocator selects the same IP address
for another Service.
If you enable the `ServiceIPStaticSubrange`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/),
the allocation strategy divides the `ClusterIP` range into two bands, based on
the size of the configured `service-cluster-ip-range` by using the following formula
`min(max(16, cidrSize / 16), 256)`, described as _never less than 16 or more than 256,
with a graduated step function between them_. Dynamic IP allocations will be preferentially
chosen from the upper band, reducing risks of conflicts with the IPs
assigned from the lower band.
This allows users to use the lower band of the `service-cluster-ip-range` for their
Services with static IPs assigned with a very low risk of running into conflicts.
### Service IP addresses {#ips-and-vips}
Unlike Pod IP addresses, which actually route to a fixed destination,

View File

@ -0,0 +1,164 @@
---
reviewers:
- aravindhp
- jayunit100
- jsturtevant
- marosset
title: Networking on Windows
content_type: concept
weight: 75
---
<!-- overview -->
Kubernetes supports running nodes on either Linux or Windows. You can mix both kinds of node
within a single cluster.
This page provides an overview to networking specific to the Windows operating system.
<!-- body -->
## Container networking on Windows {#networking}
Networking for Windows containers is exposed through
[CNI plugins](/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/).
Windows containers function similarly to virtual machines in regards to
networking. Each container has a virtual network adapter (vNIC) which is connected
to a Hyper-V virtual switch (vSwitch). The Host Networking Service (HNS) and the
Host Compute Service (HCS) work together to create containers and attach container
vNICs to networks. HCS is responsible for the management of containers whereas HNS
is responsible for the management of networking resources such as:
* Virtual networks (including creation of vSwitches)
* Endpoints / vNICs
* Namespaces
* Policies including packet encapsulations, load-balancing rules, ACLs, and NAT rules.
The Windows HNS and vSwitch implement namespacing and can
create virtual NICs as needed for a pod or container. However, many configurations such
as DNS, routes, and metrics are stored in the Windows registry database rather than as
files inside `/etc`, which is how Linux stores those configurations. The Windows registry for the container
is separate from that of the host, so concepts like mapping `/etc/resolv.conf` from
the host into a container don't have the same effect they would on Linux. These must
be configured using Windows APIs run in the context of that container. Therefore
CNI implementations need to call the HNS instead of relying on file mappings to pass
network details into the pod or container.
## Network modes
Windows supports five different networking drivers/modes: L2bridge, L2tunnel,
Overlay (Beta), Transparent, and NAT. In a heterogeneous cluster with Windows and Linux
worker nodes, you need to select a networking solution that is compatible on both
Windows and Linux. The following table lists the out-of-tree plugins are supported on Windows,
with recommendations on when to use each CNI:
| Network Driver | Description | Container Packet Modifications | Network Plugins | Network Plugin Characteristics |
| -------------- | ----------- | ------------------------------ | --------------- | ------------------------------ |
| L2bridge | Containers are attached to an external vSwitch. Containers are attached to the underlay network, although the physical network doesn't need to learn the container MACs because they are rewritten on ingress/egress. | MAC is rewritten to host MAC, IP may be rewritten to host IP using HNS OutboundNAT policy. | [win-bridge](https://github.com/containernetworking/plugins/tree/master/plugins/main/windows/win-bridge), [Azure-CNI](https://github.com/Azure/azure-container-networking/blob/master/docs/cni.md), Flannel host-gateway uses win-bridge | win-bridge uses L2bridge network mode, connects containers to the underlay of hosts, offering best performance. Requires user-defined routes (UDR) for inter-node connectivity. |
| L2Tunnel | This is a special case of l2bridge, but only used on Azure. All packets are sent to the virtualization host where SDN policy is applied. | MAC rewritten, IP visible on the underlay network | [Azure-CNI](https://github.com/Azure/azure-container-networking/blob/master/docs/cni.md) | Azure-CNI allows integration of containers with Azure vNET, and allows them to leverage the set of capabilities that [Azure Virtual Network provides](https://azure.microsoft.com/en-us/services/virtual-network/). For example, securely connect to Azure services or use Azure NSGs. See [azure-cni for some examples](https://docs.microsoft.com/azure/aks/concepts-network#azure-cni-advanced-networking) |
| Overlay | Containers are given a vNIC connected to an external vSwitch. Each overlay network gets its own IP subnet, defined by a custom IP prefix.The overlay network driver uses VXLAN encapsulation. | Encapsulated with an outer header. | [win-overlay](https://github.com/containernetworking/plugins/tree/master/plugins/main/windows/win-overlay), Flannel VXLAN (uses win-overlay) | win-overlay should be used when virtual container networks are desired to be isolated from underlay of hosts (e.g. for security reasons). Allows for IPs to be re-used for different overlay networks (which have different VNID tags) if you are restricted on IPs in your datacenter. This option requires [KB4489899](https://support.microsoft.com/help/4489899) on Windows Server 2019. |
| Transparent (special use case for [ovn-kubernetes](https://github.com/openvswitch/ovn-kubernetes)) | Requires an external vSwitch. Containers are attached to an external vSwitch which enables intra-pod communication via logical networks (logical switches and routers). | Packet is encapsulated either via [GENEVE](https://datatracker.ietf.org/doc/draft-gross-geneve/) or [STT](https://datatracker.ietf.org/doc/draft-davie-stt/) tunneling to reach pods which are not on the same host. <br/> Packets are forwarded or dropped via the tunnel metadata information supplied by the ovn network controller. <br/> NAT is done for north-south communication. | [ovn-kubernetes](https://github.com/openvswitch/ovn-kubernetes) | [Deploy via ansible](https://github.com/openvswitch/ovn-kubernetes/tree/master/contrib). Distributed ACLs can be applied via Kubernetes policies. IPAM support. Load-balancing can be achieved without kube-proxy. NATing is done without using iptables/netsh. |
| NAT (*not used in Kubernetes*) | Containers are given a vNIC connected to an internal vSwitch. DNS/DHCP is provided using an internal component called [WinNAT](https://techcommunity.microsoft.com/t5/virtualization/windows-nat-winnat-capabilities-and-limitations/ba-p/382303) | MAC and IP is rewritten to host MAC/IP. | [nat](https://github.com/Microsoft/windows-container-networking/tree/master/plugins/nat) | Included here for completeness |
As outlined above, the [Flannel](https://github.com/coreos/flannel)
[CNI plugin](https://github.com/flannel-io/cni-plugin)
is also [supported](https://github.com/flannel-io/cni-plugin#windows-support-experimental) on Windows via the
[VXLAN network backend](https://github.com/coreos/flannel/blob/master/Documentation/backends.md#vxlan) (**Beta support** ; delegates to win-overlay)
and [host-gateway network backend](https://github.com/coreos/flannel/blob/master/Documentation/backends.md#host-gw) (stable support; delegates to win-bridge).
This plugin supports delegating to one of the reference CNI plugins (win-overlay,
win-bridge), to work in conjunction with Flannel daemon on Windows (Flanneld) for
automatic node subnet lease assignment and HNS network creation. This plugin reads
in its own configuration file (cni.conf), and aggregates it with the environment
variables from the FlannelD generated subnet.env file. It then delegates to one of
the reference CNI plugins for network plumbing, and sends the correct configuration
containing the node-assigned subnet to the IPAM plugin (for example: `host-local`).
For Node, Pod, and Service objects, the following network flows are supported for
TCP/UDP traffic:
* Pod → Pod (IP)
* Pod → Pod (Name)
* Pod → Service (Cluster IP)
* Pod → Service (PQDN, but only if there are no ".")
* Pod → Service (FQDN)
* Pod → external (IP)
* Pod → external (DNS)
* Node → Pod
* Pod → Node
## IP address management (IPAM) {#ipam}
The following IPAM options are supported on Windows:
* [host-local](https://github.com/containernetworking/plugins/tree/master/plugins/ipam/host-local)
* [azure-vnet-ipam](https://github.com/Azure/azure-container-networking/blob/master/docs/ipam.md) (for azure-cni only)
* [Windows Server IPAM](https://docs.microsoft.com/windows-server/networking/technologies/ipam/ipam-top) (fallback option if no IPAM is set)
## Load balancing and Services
A Kubernetes {{< glossary_tooltip text="Service" term_id="service" >}} is an abstraction
that defines a logical set of Pods and a means to access them over a network.
In a cluster that includes Windows nodes, you can use the following types of Service:
* `NodePort`
* `ClusterIP`
* `LoadBalancer`
* `ExternalName`
Windows container networking differs in some important ways from Linux networking.
The [Microsoft documentation for Windows Container Networking](https://docs.microsoft.com/en-us/virtualization/windowscontainers/container-networking/architecture)
provides additional details and background.
On Windows, you can use the following settings to configure Services and load
balancing behavior:
{{< table caption="Windows Service Settings" >}}
| Feature | Description | Minimum Supported Windows OS build | How to enable |
| ------- | ----------- | -------------------------- | ------------- |
| Session affinity | Ensures that connections from a particular client are passed to the same Pod each time. | Windows Server 2022 | Set `service.spec.sessionAffinity` to "ClientIP" |
| Direct Server Return (DSR) | Load balancing mode where the IP address fixups and the LBNAT occurs at the container vSwitch port directly; service traffic arrives with the source IP set as the originating pod IP. | Windows Server 2019 | Set the following flags in kube-proxy: `--feature-gates="WinDSR=true" --enable-dsr=true` |
| Preserve-Destination | Skips DNAT of service traffic, thereby preserving the virtual IP of the target service in packets reaching the backend Pod. Also disables node-node forwarding. | Windows Server, version 1903 | Set `"preserve-destination": "true"` in service annotations and enable DSR in kube-proxy. |
| IPv4/IPv6 dual-stack networking | Native IPv4-to-IPv4 in parallel with IPv6-to-IPv6 communications to, from, and within a cluster | Windows Server 2019 | See [IPv4/IPv6 dual-stack](#ipv4ipv6-dual-stack) |
| Client IP preservation | Ensures that source IP of incoming ingress traffic gets preserved. Also disables node-node forwarding. | Windows Server 2019 | Set `service.spec.externalTrafficPolicy` to "Local" and enable DSR in kube-proxy |
{{< /table >}}
{{< warning >}}
There are known issue with NodePort Services on overlay networking, if the destination node is running Windows Server 2022.
To avoid the issue entirely, you can configure the service with `externalTrafficPolicy: Local`.
There are known issues with Pod to Pod connectivity on l2bridge network on Windows Server 2022 with KB5005619 or higher installed.
To workaround the issue and restore Pod to Pod connectivity, you can disable the WinDSR feature in kube-proxy.
These issues require OS fixes.
Please follow https://github.com/microsoft/Windows-Containers/issues/204 for updates.
{{< /warning >}}
## Limitations
The following networking functionality is _not_ supported on Windows nodes:
* Host networking mode
* Local NodePort access from the node itself (works for other nodes or external clients)
* More than 64 backend pods (or unique destination addresses) for a single Service
* IPv6 communication between Windows pods connected to overlay networks
* Local Traffic Policy in non-DSR mode
* Outbound communication using the ICMP protocol via the `win-overlay`, `win-bridge`, or using the Azure-CNI plugin.\
Specifically, the Windows data plane ([VFP](https://www.microsoft.com/research/project/azure-virtual-filtering-platform/))
doesn't support ICMP packet transpositions, and this means:
* ICMP packets directed to destinations within the same network (such as pod to pod communication via ping)
work as expected;
* TCP/UDP packets work as expected;
* ICMP packets directed to pass through a remote network (e.g. pod to external internet communication via ping)
cannot be transposed and thus will not be routed back to their source;
* Since TCP/UDP packets can still be transposed, you can substitute `ping <destination>` with
`curl <destination>` when debugging connectivity with the outside world.
Other limitations:
* Windows reference network plugins win-bridge and win-overlay do not implement
[CNI spec](https://github.com/containernetworking/cni/blob/master/SPEC.md) v0.4.0,
due to a missing `CHECK` implementation.
* The Flannel VXLAN CNI plugin has the following limitations on Windows:
* Node-pod connectivity is only possible for local pods with Flannel v0.12.0 (or higher).
* Flannel is restricted to using VNI 4096 and UDP port 4789. See the official
[Flannel VXLAN](https://github.com/coreos/flannel/blob/master/Documentation/backends.md#vxlan)
backend docs for more details on these parameters.

View File

@ -127,14 +127,17 @@ instructions.
### CSI driver restrictions
{{< feature-state for_k8s_version="v1.21" state="deprecated" >}}
CSI ephemeral volumes allow users to provide `volumeAttributes`
directly to the CSI driver as part of the Pod spec. A CSI driver
allowing `volumeAttributes` that are typically restricted to
administrators is NOT suitable for use in an inline ephemeral volume.
For example, parameters that are normally defined in the StorageClass
should not be exposed to users through the use of inline ephemeral volumes.
As a cluster administrator, you can use a [PodSecurityPolicy](/docs/concepts/security/pod-security-policy/) to control which CSI drivers can be used in a Pod, specified with the
[`allowedCSIDrivers` field](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podsecuritypolicyspec-v1beta1-policy).
{{< note >}}
PodSecurityPolicy is deprecated and will be removed in the Kubernetes v1.25 release.
{{< /note >}}
Cluster administrators who need to restrict the CSI drivers that are
allowed to be used as inline volumes within a Pod spec may do so by:
- Removing `Ephemeral` from `volumeLifecycleModes` in the CSIDriver spec, which prevents the driver from being used as an inline ephemeral volume.
- Using an [admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/) to restrict how this driver is used.
### Generic ephemeral volumes
@ -248,14 +251,8 @@ same namespace, so that these conflicts can't occur.
Enabling the GenericEphemeralVolume feature allows users to create
PVCs indirectly if they can create Pods, even if they do not have
permission to create PVCs directly. Cluster administrators must be
aware of this. If this does not fit their security model, they have
two choices:
- Use an [admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/)
that rejects objects like Pods that have a generic ephemeral
volume.
- Use a [Pod Security Policy](/docs/concepts/security/pod-security-policy/)
where the `volumes` list does not contain the `ephemeral` volume type
(deprecated since Kubernetes 1.21).
aware of this. If this does not fit their security model, they should
use an [admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/) that rejects objects like Pods that have a generic ephemeral volume.
The normal [namespace quota for PVCs](/docs/concepts/policy/resource-quotas/#storage-resource-quota) still applies, so
even if users are allowed to use this new mechanism, they cannot use

View File

@ -175,6 +175,74 @@ spec:
However, the particular path specified in the custom recycler Pod template in the `volumes` part is replaced with the particular path of the volume that is being recycled.
### PersistentVolume deletion protection finalizer
{{< feature-state for_k8s_version="v1.23" state="alpha" >}}
Finalizers can be added on a PersistentVolume to ensure that PersistentVolumes
having `Delete` reclaim policy are deleted only after the backing storage are deleted.
The newly introduced finalizers `kubernetes.io/pv-controller` and `external-provisioner.volume.kubernetes.io/finalizer`
are only added to dynamically provisioned volumes.
The finalizer `kubernetes.io/pv-controller` is added to in-tree plugin volumes. The following is an example
```shell
kubectl describe pv pvc-74a498d6-3929-47e8-8c02-078c1ece4d78
Name: pvc-74a498d6-3929-47e8-8c02-078c1ece4d78
Labels: <none>
Annotations: kubernetes.io/createdby: vsphere-volume-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: yes
pv.kubernetes.io/provisioned-by: kubernetes.io/vsphere-volume
Finalizers: [kubernetes.io/pv-protection kubernetes.io/pv-controller]
StorageClass: vcp-sc
Status: Bound
Claim: default/vcp-pvc-1
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 1Gi
Node Affinity: <none>
Message:
Source:
Type: vSphereVolume (a Persistent Disk resource in vSphere)
VolumePath: [vsanDatastore] d49c4a62-166f-ce12-c464-020077ba5d46/kubernetes-dynamic-pvc-74a498d6-3929-47e8-8c02-078c1ece4d78.vmdk
FSType: ext4
StoragePolicyName: vSAN Default Storage Policy
Events: <none>
```
The finalizer `external-provisioner.volume.kubernetes.io/finalizer` is added for CSI volumes.
The following is an example:
```shell
Name: pvc-2f0bab97-85a8-4552-8044-eb8be45cf48d
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: csi.vsphere.vmware.com
Finalizers: [kubernetes.io/pv-protection external-provisioner.volume.kubernetes.io/finalizer]
StorageClass: fast
Status: Bound
Claim: demo-app/nginx-logs
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 200Mi
Node Affinity: <none>
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: csi.vsphere.vmware.com
FSType: ext4
VolumeHandle: 44830fa8-79b4-406b-8b58-621ba25353fd
ReadOnly: false
VolumeAttributes: storage.kubernetes.io/csiProvisionerIdentity=1648442357185-8081-csi.vsphere.vmware.com
type=vSphere CNS Block Volume
Events: <none>
```
Enabling the `CSIMigration` feature for a specific in-tree volume plugin will remove
the `kubernetes.io/pv-controller` finalizer, while adding the `external-provisioner.volume.kubernetes.io/finalizer`
finalizer. Similarly, disabling `CSIMigration` will remove the `external-provisioner.volume.kubernetes.io/finalizer`
finalizer, while adding the `kubernetes.io/pv-controller` finalizer.
### Reserving a PersistentVolume
The control plane can [bind PersistentVolumeClaims to matching PersistentVolumes](#binding) in the
@ -284,18 +352,13 @@ FlexVolumes (deprecated since Kubernetes v1.23) allow resize if the driver is co
#### Resizing an in-use PersistentVolumeClaim
{{< feature-state for_k8s_version="v1.15" state="beta" >}}
{{< note >}}
Expanding in-use PVCs is available as beta since Kubernetes 1.15, and as alpha since 1.11. The `ExpandInUsePersistentVolumes` feature must be enabled, which is the case automatically for many clusters for beta features. Refer to the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) documentation for more information.
{{< /note >}}
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
In this case, you don't need to delete and recreate a Pod or deployment that is using an existing PVC.
Any in-use PVC automatically becomes available to its Pod as soon as its file system has been expanded.
This feature has no effect on PVCs that are not in use by a Pod or deployment. You must create a Pod that
uses the PVC before the expansion can complete.
Similar to other volume types - FlexVolume volumes can also be expanded when in-use by a Pod.
{{< note >}}
@ -329,7 +392,7 @@ If expanding underlying storage fails, the cluster administrator can manually re
Recovery from failing PVC expansion by users is available as an alpha feature since Kubernetes 1.23. The `RecoverVolumeExpansionFailure` feature must be enabled for this feature to work. Refer to the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) documentation for more information.
{{< /note >}}
If the feature gates `ExpandPersistentVolumes` and `RecoverVolumeExpansionFailure` are both
If the feature gates `RecoverVolumeExpansionFailure` is
enabled in your cluster, and expansion has failed for a PVC, you can retry expansion with a
smaller size than the previously requested value. To request a new expansion attempt with a
smaller proposed size, edit `.spec.resources` for that PVC and choose a value that is less than the
@ -477,6 +540,15 @@ In the CLI, the access modes are abbreviated to:
* RWX - ReadWriteMany
* RWOP - ReadWriteOncePod
{{< note >}}
Kubernetes uses volume access modes to match PersistentVolumeClaims and PersistentVolumes.
In some cases, the volume access modes also constrain where the PersistentVolume can be mounted.
Volume access modes do **not** enforce write protection once the storage has been mounted.
Even if the access modes are specified as ReadWriteOnce, ReadOnlyMany, or ReadWriteMany, they don't set any constraints on the volume.
For example, even if a PersistentVolume is created as ReadOnlyMany, it is no guarantee that it will be read-only.
If the access modes are specified as ReadWriteOncePod, the volume is constrained and can be mounted on only a single Pod.
{{< /note >}}
> __Important!__ A volume can only be mounted using one access mode at a time, even if it supports many. For example, a GCEPersistentDisk can be mounted as ReadWriteOnce by a single node or ReadOnlyMany by many nodes, but not at the same time.
@ -849,17 +921,12 @@ spec:
## Volume populators and data sources
{{< feature-state for_k8s_version="v1.22" state="alpha" >}}
{{< feature-state for_k8s_version="v1.24" state="beta" >}}
{{< note >}}
Kubernetes supports custom volume populators; this alpha feature was introduced
in Kubernetes 1.18. Kubernetes 1.22 reimplemented the mechanism with a redesigned API.
Check that you are reading the version of the Kubernetes documentation that matches your
cluster. {{% version-check %}}
Kubernetes supports custom volume populators.
To use custom volume populators, you must enable the `AnyVolumeDataSource`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) for
the kube-apiserver and kube-controller-manager.
{{< /note >}}
Volume populators take advantage of a PVC spec field called `dataSourceRef`. Unlike the
`dataSource` field, which can only contain either a reference to another PersistentVolumeClaim
@ -877,6 +944,7 @@ contents.
There are two differences between the `dataSourceRef` field and the `dataSource` field that
users should be aware of:
* The `dataSource` field ignores invalid values (as if the field was blank) while the
`dataSourceRef` field never ignores values and will cause an error if an invalid value is
used. Invalid values are any core object (objects with no apiGroup) except for PVCs.

View File

@ -16,37 +16,41 @@ Storage capacity is limited and may vary depending on the node on
which a pod runs: network-attached storage might not be accessible by
all nodes, or storage is local to a node to begin with.
{{< feature-state for_k8s_version="v1.21" state="beta" >}}
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
This page describes how Kubernetes keeps track of storage capacity and
how the scheduler uses that information to schedule Pods onto nodes
how the scheduler uses that information to [schedule Pods](/docs/concepts/scheduling-eviction/) onto nodes
that have access to enough storage capacity for the remaining missing
volumes. Without storage capacity tracking, the scheduler may choose a
node that doesn't have enough capacity to provision a volume and
multiple scheduling retries will be needed.
Tracking storage capacity is supported for {{< glossary_tooltip
text="Container Storage Interface" term_id="csi" >}} (CSI) drivers and
[needs to be enabled](#enabling-storage-capacity-tracking) when installing a CSI driver.
## {{% heading "prerequisites" %}}
Kubernetes v{{< skew currentVersion >}} includes cluster-level API support for
storage capacity tracking. To use this you must also be using a CSI driver that
supports capacity tracking. Consult the documentation for the CSI drivers that
you use to find out whether this support is available and, if so, how to use
it. If you are not running Kubernetes v{{< skew currentVersion >}}, check the
documentation for that version of Kubernetes.
<!-- body -->
## API
There are two API extensions for this feature:
- CSIStorageCapacity objects:
- [CSIStorageCapacity](/docs/reference/kubernetes-api/config-and-storage-resources/csi-storage-capacity-v1/) objects:
these get produced by a CSI driver in the namespace
where the driver is installed. Each object contains capacity
information for one storage class and defines which nodes have
access to that storage.
- [The `CSIDriverSpec.StorageCapacity` field](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#csidriverspec-v1-storage-k8s-io):
- [The `CSIDriverSpec.StorageCapacity` field](/docs/reference/kubernetes-api/config-and-storage-resources/csi-driver-v1/#CSIDriverSpec):
when set to `true`, the Kubernetes scheduler will consider storage
capacity for volumes that use the CSI driver.
## Scheduling
Storage capacity information is used by the Kubernetes scheduler if:
- the `CSIStorageCapacity` feature gate is true,
- a Pod uses a volume that has not been created yet,
- that volume uses a {{< glossary_tooltip text="StorageClass" term_id="storage-class" >}} which references a CSI driver and
uses `WaitForFirstConsumer` [volume binding
@ -97,20 +101,9 @@ multiple volumes: one volume might have been created already in a
topology segment which then does not have enough capacity left for
another volume. Manual intervention is necessary to recover from this,
for example by increasing capacity or deleting the volume that was
already created. [Further
work](https://github.com/kubernetes/enhancements/pull/1703) is needed
to handle this automatically.
## Enabling storage capacity tracking
Storage capacity tracking is a beta feature and enabled by default in
a Kubernetes cluster since Kubernetes 1.21. In addition to having the
feature enabled in the cluster, a CSI driver also has to support
it. Please refer to the driver's documentation for details.
already created.
## {{% heading "whatsnext" %}}
- For more information on the design, see the
[Storage Capacity Constraints for Pod Scheduling KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1472-storage-capacity-tracking/README.md).
- For more information on further development of this feature, see the [enhancement tracking issue #1472](https://github.com/kubernetes/enhancements/issues/1472).
- Learn about [Kubernetes Scheduler](/docs/concepts/scheduling-eviction/kube-scheduler/)

View File

@ -49,7 +49,7 @@ metadata:
name: standard
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp3
type: gp2
reclaimPolicy: Retain
allowVolumeExpansion: true
mountOptions:
@ -271,9 +271,9 @@ parameters:
fsType: ext4
```
* `type`: `io1`, `gp2`, `gp3`, `sc1`, `st1`. See
* `type`: `io1`, `gp2`, `sc1`, `st1`. See
[AWS docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html)
for details. Default: `gp3`.
for details. Default: `gp2`.
* `zone` (Deprecated): AWS zone. If neither `zone` nor `zones` is specified, volumes are
generally round-robin-ed across all active zones where Kubernetes cluster
has a node. `zone` and `zones` parameters must not be used at the same time.

View File

@ -24,7 +24,7 @@ If a CSI Driver supports Volume Health Monitoring feature from the controller si
The External Health Monitor {{< glossary_tooltip text="controller" term_id="controller" >}} also watches for node failure events. You can enable node failure monitoring by setting the `enable-node-watcher` flag to true. When the external health monitor detects a node failure event, the controller reports an Event will be reported on the PVC to indicate that pods using this PVC are on a failed node.
If a CSI Driver supports Volume Health Monitoring feature from the node side, an Event will be reported on every Pod using the PVC when an abnormal volume condition is detected on a CSI volume.
If a CSI Driver supports Volume Health Monitoring feature from the node side, an Event will be reported on every Pod using the PVC when an abnormal volume condition is detected on a CSI volume. In addition, Volume Health information is exposed as Kubelet VolumeStats metrics. A new metric kubelet_volume_stats_health_status_abnormal is added. This metric includes two labels: `namespace` and `persistentvolumeclaim`. The count is either 1 or 0. 1 indicates the volume is unhealthy, 0 indicates volume is healthy. For more information, please check [KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1432-volume-health-monitor#kubelet-metrics-changes).
{{< note >}}
You need to enable the `CSIVolumeHealth` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to use this feature from the node side.

View File

@ -120,6 +120,7 @@ spec:
driver: hostpath.csi.k8s.io
source:
volumeHandle: ee0cfb94-f8d4-11e9-b2d8-0242ac110002
sourceVolumeMode: Filesystem
volumeSnapshotClassName: csi-hostpath-snapclass
volumeSnapshotRef:
name: new-snapshot-test
@ -141,6 +142,7 @@ spec:
driver: hostpath.csi.k8s.io
source:
snapshotHandle: 7bdd0de3-aaeb-11e8-9aae-0242ac110002
sourceVolumeMode: Filesystem
volumeSnapshotRef:
name: new-snapshot-test
namespace: default
@ -148,6 +150,51 @@ spec:
`snapshotHandle` is the unique identifier of the volume snapshot created on the storage backend. This field is required for the pre-provisioned snapshots. It specifies the CSI snapshot id on the storage system that this `VolumeSnapshotContent` represents.
`sourceVolumeMode` is the mode of the volume whose snapshot is taken. The value
of the `sourceVolumeMode` field can be either `Filesystem` or `Block`. If the
source volume mode is not specified, Kubernetes treats the snapshot as if the
source volume's mode is unknown.
## Converting the volume mode of a Snapshot {#convert-volume-mode}
If the `VolumeSnapshots` API installed on your cluster supports the `sourceVolumeMode`
field, then the API has the capability to prevent unauthorized users from converting
the mode of a volume.
To check if your cluster has capability for this feature, run the following command:
```yaml
$ kubectl get crd volumesnapshotcontent -o yaml
```
If you want to allow users to create a `PersistentVolumeClaim` from an existing
`VolumeSnapshot`, but with a different volume mode than the source, the annotation
`snapshot.storage.kubernetes.io/allowVolumeModeChange: "true"`needs to be added to
the `VolumeSnapshotContent` that corresponds to the `VolumeSnapshot`.
For pre-provisioned snapshots, `Spec.SourceVolumeMode` needs to be populated
by the cluster administrator.
An example `VolumeSnapshotContent` resource with this feature enabled would look like:
```yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
name: new-snapshot-content-test
annotations:
- snapshot.storage.kubernetes.io/allowVolumeModeChange: "true"
spec:
deletionPolicy: Delete
driver: hostpath.csi.k8s.io
source:
snapshotHandle: 7bdd0de3-aaeb-11e8-9aae-0242ac110002
sourceVolumeMode: Filesystem
volumeSnapshotRef:
name: new-snapshot-test
namespace: default
```
## Provisioning Volumes from Snapshots
You can provision a new volume, pre-populated with data from a snapshot, by using

View File

@ -64,7 +64,9 @@ a different volume.
Kubernetes supports several types of volumes.
### awsElasticBlockStore {#awselasticblockstore}
### awsElasticBlockStore (deprecated) {#awselasticblockstore}
{{< feature-state for_k8s_version="v1.17" state="deprecated" >}}
An `awsElasticBlockStore` volume mounts an Amazon Web Services (AWS)
[EBS volume](https://aws.amazon.com/ebs/) into your pod. Unlike
@ -135,7 +137,9 @@ beta features must be enabled.
To disable the `awsElasticBlockStore` storage plugin from being loaded by the controller manager
and the kubelet, set the `InTreePluginAWSUnregister` flag to `true`.
### azureDisk {#azuredisk}
### azureDisk (deprecated) {#azuredisk}
{{< feature-state for_k8s_version="v1.19" state="deprecated" >}}
The `azureDisk` volume type mounts a Microsoft Azure [Data Disk](https://docs.microsoft.com/en-us/azure/aks/csi-storage-drivers) into a pod.
@ -143,14 +147,13 @@ For more details, see the [`azureDisk` volume plugin](https://github.com/kuberne
#### azureDisk CSI migration
{{< feature-state for_k8s_version="v1.19" state="beta" >}}
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
The `CSIMigration` feature for `azureDisk`, when enabled, redirects all plugin operations
from the existing in-tree plugin to the `disk.csi.azure.com` Container
Storage Interface (CSI) Driver. In order to use this feature, the [Azure Disk CSI
Driver](https://github.com/kubernetes-sigs/azuredisk-csi-driver)
must be installed on the cluster and the `CSIMigration` and `CSIMigrationAzureDisk`
features must be enabled.
Storage Interface (CSI) Driver. In order to use this feature, the
[Azure Disk CSI Driver](https://github.com/kubernetes-sigs/azuredisk-csi-driver)
must be installed on the cluster and the `CSIMigration` feature must be enabled.
#### azureDisk CSI migration complete
@ -159,7 +162,9 @@ features must be enabled.
To disable the `azureDisk` storage plugin from being loaded by the controller manager
and the kubelet, set the `InTreePluginAzureDiskUnregister` flag to `true`.
### azureFile {#azurefile}
### azureFile (deprecated) {#azurefile}
{{< feature-state for_k8s_version="v1.21" state="deprecated" >}}
The `azureFile` volume type mounts a Microsoft Azure File volume (SMB 2.1 and 3.0)
into a pod.
@ -177,7 +182,8 @@ Driver](https://github.com/kubernetes-sigs/azurefile-csi-driver)
must be installed on the cluster and the `CSIMigration` and `CSIMigrationAzureFile`
[feature gates](/docs/reference/command-line-tools-reference/feature-gates/) must be enabled.
Azure File CSI driver does not support using same volume with different fsgroups, if Azurefile CSI migration is enabled, using same volume with different fsgroups won't be supported at all.
Azure File CSI driver does not support using same volume with different fsgroups. If
`CSIMigrationAzureFile` is enabled, using same volume with different fsgroups won't be supported at all.
#### azureFile CSI migration complete
@ -201,7 +207,9 @@ You must have your own Ceph server running with the share exported before you ca
See the [CephFS example](https://github.com/kubernetes/examples/tree/master/volumes/cephfs/) for more details.
### cinder
### cinder (deprecated) {#cinder}
{{< feature-state for_k8s_version="v1.18" state="deprecated" >}}
{{< note >}}
Kubernetes must be configured with the OpenStack cloud provider.
@ -233,17 +241,17 @@ spec:
#### OpenStack CSI migration
{{< feature-state for_k8s_version="v1.21" state="beta" >}}
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
The `CSIMigration` feature for Cinder is enabled by default in Kubernetes 1.21.
The `CSIMigration` feature for Cinder is enabled by default since Kubernetes 1.21.
It redirects all plugin operations from the existing in-tree plugin to the
`cinder.csi.openstack.org` Container Storage Interface (CSI) Driver.
[OpenStack Cinder CSI Driver](https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/cinder-csi-plugin/using-cinder-csi-plugin.md)
must be installed on the cluster.
You can disable Cinder CSI migration for your cluster by setting the `CSIMigrationOpenStack`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to `false`.
If you disable the `CSIMigrationOpenStack` feature, the in-tree Cinder volume plugin takes responsibility
for all aspects of Cinder volume storage management.
To disable the in-tree Cinder plugin from being loaded by the controller manager
and the kubelet, you can enable the `InTreePluginOpenStackUnregister`
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
### configMap
@ -390,7 +398,9 @@ You must have your own Flocker installation running before you can use it.
See the [Flocker example](https://github.com/kubernetes/examples/tree/master/staging/volumes/flocker) for more details.
### gcePersistentDisk
### gcePersistentDisk (deprecated) {#gcepersistentdisk}
{{< feature-state for_k8s_version="v1.17" state="deprecated" >}}
A `gcePersistentDisk` volume mounts a Google Compute Engine (GCE)
[persistent disk](https://cloud.google.com/compute/docs/disks) (PD) into your Pod.
@ -969,66 +979,15 @@ spec:
For more information about StorageOS, dynamic provisioning, and PersistentVolumeClaims, see the
[StorageOS examples](https://github.com/kubernetes/examples/blob/master/volumes/storageos).
### vsphereVolume {#vspherevolume}
### vsphereVolume (deprecated) {#vspherevolume}
{{< note >}}
You must configure the Kubernetes vSphere Cloud Provider. For cloudprovider
configuration, refer to the [vSphere Getting Started guide](https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/).
We recommend to use vSphere CSI out-of-tree driver instead.
{{< /note >}}
A `vsphereVolume` is used to mount a vSphere VMDK volume into your Pod. The contents
of a volume are preserved when it is unmounted. It supports both VMFS and VSAN datastore.
{{< note >}}
You must create vSphere VMDK volume using one of the following methods before using with a Pod.
{{< /note >}}
#### Creating a VMDK volume {#creating-vmdk-volume}
Choose one of the following methods to create a VMDK.
{{< tabs name="tabs_volumes" >}}
{{% tab name="Create using vmkfstools" %}}
First ssh into ESX, then use the following command to create a VMDK:
```shell
vmkfstools -c 2G /vmfs/volumes/DatastoreName/volumes/myDisk.vmdk
```
{{% /tab %}}
{{% tab name="Create using vmware-vdiskmanager" %}}
Use the following command to create a VMDK:
```shell
vmware-vdiskmanager -c -t 0 -s 40GB -a lsilogic myDisk.vmdk
```
{{% /tab %}}
{{< /tabs >}}
#### vSphere VMDK configuration example {#vsphere-vmdk-configuration}
```yaml
apiVersion: v1
kind: Pod
metadata:
name: test-vmdk
spec:
containers:
- image: k8s.gcr.io/test-webserver
name: test-container
volumeMounts:
- mountPath: /test-vmdk
name: test-volume
volumes:
- name: test-volume
# This VMDK volume must already exist.
vsphereVolume:
volumePath: "[DatastoreName] volumes/myDisk"
fsType: ext4
```
For more information, see the [vSphere volume](https://github.com/kubernetes/examples/tree/master/staging/volumes/vsphere) examples.
#### vSphere CSI migration {#vsphere-csi-migration}
@ -1040,8 +999,15 @@ from the existing in-tree plugin to the `csi.vsphere.vmware.com` {{< glossary_to
[vSphere CSI driver](https://github.com/kubernetes-sigs/vsphere-csi-driver)
must be installed on the cluster and the `CSIMigration` and `CSIMigrationvSphere`
[feature gates](/docs/reference/command-line-tools-reference/feature-gates/) must be enabled.
You can find additional advice on how to migrate in VMware's
documentation page [Migrating In-Tree vSphere Volumes to vSphere Container Storage Plug-in](https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-968D421F-D464-4E22-8127-6CB9FF54423F.html).
This also requires minimum vSphere vCenter/ESXi Version to be 7.0u1 and minimum HW Version to be VM version 15.
Kubernetes v{{< skew currentVersion >}} requires that you are using vSphere 7.0u2 or later
in order to migrate to the out-of-tree CSI driver.
If you are running a version of Kubernetes other than v{{< skew currentVersion >}}, consult
the documentation for that version of Kubernetes.
If you are running Kubernetes v{{< skew currentVersion >}} and an older version of vSphere,
consider upgrading to at least vSphere 7.0u2.
{{< note >}}
The following StorageClass parameters from the built-in `vsphereVolume` plugin are not supported by the vSphere CSI driver:
@ -1211,7 +1177,6 @@ A `csi` volume can be used in a Pod in three different ways:
* through a reference to a [PersistentVolumeClaim](#persistentvolumeclaim)
* with a [generic ephemeral volume](/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volume)
(alpha feature)
* with a [CSI ephemeral volume](/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volume)
if the driver supports that (beta feature)
@ -1285,6 +1250,20 @@ for more information.
For more information on how to develop a CSI driver, refer to the
[kubernetes-csi documentation](https://kubernetes-csi.github.io/docs/)
#### Windows CSI proxy
{{< feature-state for_k8s_version="v1.22" state="stable" >}}
CSI node plugins need to perform various privileged
operations like scanning of disk devices and mounting of file systems. These operations
differ for each host operating system. For Linux worker nodes, containerized CSI node
node plugins are typically deployed as privileged containers. For Windows worker nodes,
privileged operations for containerized CSI node plugins is supported using
[csi-proxy](https://github.com/kubernetes-csi/csi-proxy), a community-managed,
stand-alone binary that needs to be pre-installed on each Windows node.
For more details, refer to the deployment guide of the CSI plugin you wish to deploy.
#### Migrating to CSI drivers from in-tree plugins
{{< feature-state for_k8s_version="v1.17" state="beta" >}}
@ -1301,6 +1280,14 @@ provisioning/delete, attach/detach, mount/unmount and resizing of volumes.
In-tree plugins that support `CSIMigration` and have a corresponding CSI driver implemented
are listed in [Types of Volumes](#volume-types).
The following in-tree plugins support persistent storage on Windows nodes:
* [`awsElasticBlockStore`](#awselasticblockstore)
* [`azureDisk`](#azuredisk)
* [`azureFile`](#azurefile)
* [`gcePersistentDisk`](#gcepersistentdisk)
* [`vsphereVolume`](#vspherevolume)
### flexVolume
{{< feature-state for_k8s_version="v1.23" state="deprecated" >}}
@ -1312,6 +1299,12 @@ volume plugin path on each node and in some cases the control plane nodes as wel
Pods interact with FlexVolume drivers through the `flexVolume` in-tree volume plugin.
For more details, see the FlexVolume [README](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-storage/flexvolume.md#readme) document.
The following FlexVolume [plugins](https://github.com/Microsoft/K8s-Storage-Plugins/tree/master/flexvolume/windows),
deployed as PowerShell scripts on the host, support Windows nodes:
* [SMB](https://github.com/microsoft/K8s-Storage-Plugins/tree/master/flexvolume/windows/plugins/microsoft.com~smb.cmd)
* [iSCSI](https://github.com/microsoft/K8s-Storage-Plugins/tree/master/flexvolume/windows/plugins/microsoft.com~iscsi.cmd)
{{< note >}}
FlexVolume is deprecated. Using an out-of-tree CSI driver is the recommended way to integrate external storage with Kubernetes.

View File

@ -0,0 +1,71 @@
---
reviewers:
- jingxu97
- mauriciopoppe
- jayunit100
- jsturtevant
- marosset
- aravindhp
title: Windows Storage
content_type: concept
---
<!-- overview -->
This page provides an storage overview specific to the Windows operating system.
<!-- body -->
## Persistent storage {#storage}
Windows has a layered filesystem driver to mount container layers and create a copy
filesystem based on NTFS. All file paths in the container are resolved only within
the context of that container.
* With Docker, volume mounts can only target a directory in the container, and not
an individual file. This limitation does not apply to containerd.
* Volume mounts cannot project files or directories back to the host filesystem.
* Read-only filesystems are not supported because write access is always required
for the Windows registry and SAM database. However, read-only volumes are supported.
* Volume user-masks and permissions are not available. Because the SAM is not shared
between the host & container, there's no mapping between them. All permissions are
resolved within the context of the container.
As a result, the following storage functionality is not supported on Windows nodes:
* Volume subpath mounts: only the entire volume can be mounted in a Windows container
* Subpath volume mounting for Secrets
* Host mount projection
* Read-only root filesystem (mapped volumes still support `readOnly`)
* Block device mapping
* Memory as the storage medium (for example, `emptyDir.medium` set to `Memory`)
* File system features like uid/gid; per-user Linux filesystem permissions
* Setting [secret permissions with DefaultMode](/docs/concepts/configuration/secret/#secret-files-permissions) (due to UID/GID dependency)
* NFS based storage/volume support
* Expanding the mounted volume (resizefs)
Kubernetes {{< glossary_tooltip text="volumes" term_id="volume" >}} enable complex
applications, with data persistence and Pod volume sharing requirements, to be deployed
on Kubernetes. Management of persistent volumes associated with a specific storage
back-end or protocol includes actions such as provisioning/de-provisioning/resizing
of volumes, attaching/detaching a volume to/from a Kubernetes node and
mounting/dismounting a volume to/from individual containers in a pod that needs to
persist data.
Volume management components are shipped as Kubernetes volume
[plugin](/docs/concepts/storage/volumes/#types-of-volumes).
The following broad classes of Kubernetes volume plugins are supported on Windows:
* [`FlexVolume plugins`](/docs/concepts/storage/volumes/#flexVolume)
* Please note that FlexVolumes have been deprecated as of 1.23
* [`CSI Plugins`](/docs/concepts/storage/volumes/#csi)
##### In-tree volume plugins
The following in-tree plugins support persistent storage on Windows nodes:
* [`awsElasticBlockStore`](/docs/concepts/storage/volumes/#awselasticblockstore)
* [`azureDisk`](/docs/concepts/storage/volumes/#azuredisk)
* [`azureFile`](/docs/concepts/storage/volumes/#azurefile)
* [`gcePersistentDisk`](/docs/concepts/storage/volumes/#gcepersistentdisk)
* [`vsphereVolume`](/docs/concepts/storage/volumes/#vspherevolume)

View File

@ -0,0 +1,384 @@
---
reviewers:
- jayunit100
- jsturtevant
- marosset
- perithompson
title: Windows containers in Kubernetes
content_type: concept
weight: 65
---
<!-- overview -->
Windows applications constitute a large portion of the services and applications that
run in many organizations. [Windows containers](https://aka.ms/windowscontainers)
provide a way to encapsulate processes and package dependencies, making it easier
to use DevOps practices and follow cloud native patterns for Windows applications.
Organizations with investments in Windows-based applications and Linux-based
applications don't have to look for separate orchestrators to manage their workloads,
leading to increased operational efficiencies across their deployments, regardless
of operating system.
<!-- body -->
## Windows nodes in Kubernetes
To enable the orchestration of Windows containers in Kubernetes, include Windows nodes
in your existing Linux cluster. Scheduling Windows containers in
{{< glossary_tooltip text="Pods" term_id="pod" >}} on Kubernetes is similar to
scheduling Linux-based containers.
In order to run Windows containers, your Kubernetes cluster must include
multiple operating systems.
While you can only run the {{< glossary_tooltip text="control plane" term_id="control-plane" >}} on Linux,
you can deploy worker nodes running either Windows or Linux.
Windows {{< glossary_tooltip text="nodes" term_id="node" >}} are
[supported](#windows-os-version-support) provided that the operating system is
Windows Server 2019.
This document uses the term *Windows containers* to mean Windows containers with
process isolation. Kubernetes does not support running Windows containers with
[Hyper-V isolation](https://docs.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/hyperv-container).
## Compatibility and limitations {#limitations}
Some node features are only available if you use a specific
[container runtime](#container-runtime); others are not available on Windows nodes,
including:
* HugePages: not supported for Windows containers
* Privileged containers: not supported for Windows containers
* TerminationGracePeriod: requires containerD
Not all features of shared namespaces are supported. See [API compatibility](#api)
for more details.
See [Windows OS version compatibility](#windows-os-version-support) for details on
the Windows versions that Kubernetes is tested against.
From an API and kubectl perspective, Windows containers behave in much the same
way as Linux-based containers. However, there are some notable differences in key
functionality which are outlined in this section.
### Comparison with Linux {#compatibility-linux-similarities}
Key Kubernetes elements work the same way in Windows as they do in Linux. This
section refers to several key workload abstractions and how they map to Windows.
* [Pods](/docs/concepts/workloads/pods/)
A Pod is the basic building block of Kubernetesthe smallest and simplest unit in
the Kubernetes object model that you create or deploy. You may not deploy Windows and
Linux containers in the same Pod. All containers in a Pod are scheduled onto a single
Node where each Node represents a specific platform and architecture. The following
Pod capabilities, properties and events are supported with Windows containers:
* Single or multiple containers per Pod with process isolation and volume sharing
* Pod `status` fields
* Readiness and Liveness probes
* postStart & preStop container lifecycle hooks
* ConfigMap, Secrets: as environment variables or volumes
* `emptyDir` volumes
* Named pipe host mounts
* Resource limits
* OS field:
The `.spec.os.name` field should be set to `windows` to indicate that the current Pod uses Windows containers.
The `IdentifyPodOS` feature gate needs to be enabled for this field to be recognized.
{{< note >}}
Starting from 1.24, the `IdentifyPodOS` feature gate is in Beta stage and defaults to be enabled.
{{< /note >}}
If the `IdentifyPodOS` feature gate is enabled and you set the `.spec.os.name` field to `windows`,
you must not set the following fields in the `.spec` of that Pod:
* `spec.hostPID`
* `spec.hostIPC`
* `spec.securityContext.seLinuxOptions`
* `spec.securityContext.seccompProfile`
* `spec.securityContext.fsGroup`
* `spec.securityContext.fsGroupChangePolicy`
* `spec.securityContext.sysctls`
* `spec.shareProcessNamespace`
* `spec.securityContext.runAsUser`
* `spec.securityContext.runAsGroup`
* `spec.securityContext.supplementalGroups`
* `spec.containers[*].securityContext.seLinuxOptions`
* `spec.containers[*].securityContext.seccompProfile`
* `spec.containers[*].securityContext.capabilities`
* `spec.containers[*].securityContext.readOnlyRootFilesystem`
* `spec.containers[*].securityContext.privileged`
* `spec.containers[*].securityContext.allowPrivilegeEscalation`
* `spec.containers[*].securityContext.procMount`
* `spec.containers[*].securityContext.runAsUser`
* `spec.containers[*].securityContext.runAsGroup`
In the above list, wildcards (`*`) indicate all elements in a list.
For example, `spec.containers[*].securityContext` refers to the SecurityContext object
for all containers. If any of these fields is specified, the Pod will
not be admited by the API server.
* [Workload resources](/docs/concepts/workloads/controllers/) including:
* ReplicaSet
* Deployment
* StatefulSet
* DaemonSet
* Job
* CronJob
* ReplicationController
* {{< glossary_tooltip text="Services" term_id="service" >}}
See [Load balancing and Services](#load-balancing-and-services) for more details.
Pods, workload resources, and Services are critical elements to managing Windows
workloads on Kubernetes. However, on their own they are not enough to enable
the proper lifecycle management of Windows workloads in a dynamic cloud native
environment. Kubernetes also supports:
* `kubectl exec`
* Pod and container metrics
* {{< glossary_tooltip text="Horizontal pod autoscaling" term_id="horizontal-pod-autoscaler" >}}
* {{< glossary_tooltip text="Resource quotas" term_id="resource-quota" >}}
* Scheduler preemption
### Command line options for the kubelet {#kubelet-compatibility}
Some kubelet command line options behave differently on Windows, as described below:
* The `--windows-priorityclass` lets you set the scheduling priority of the kubelet process
(see [CPU resource management](/docs/concepts/configuration/windows-resource-management/#resource-management-cpu))
* The `--kubelet-reserve`, `--system-reserve` , and `--eviction-hard` flags update
[NodeAllocatable](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable)
* Eviction by using `--enforce-node-allocable` is not implemented
* Eviction by using `--eviction-hard` and `--eviction-soft` are not implemented
* A kubelet running on a Windows node does not have memory
restrictions. `--kubelet-reserve` and `--system-reserve` do not set limits on
kubelet or processes running on the host. This means kubelet or a process on the host
could cause memory resource starvation outside the node-allocatable and scheduler.
* The `MemoryPressure` Condition is not implemented
* The kubelet does not take OOM eviction actions
### API compatibility {#api}
There are subtle differences in the way the Kubernetes APIs work for Windows due to the OS
and container runtime. Some workload properties were designed for Linux, and fail to run on Windows.
At a high level, these OS concepts are different:
* Identity - Linux uses userID (UID) and groupID (GID) which
are represented as integer types. User and group names
are not canonical - they are just an alias in `/etc/groups`
or `/etc/passwd` back to UID+GID. Windows uses a larger binary
[security identifier](https://docs.microsoft.com/en-us/windows/security/identity-protection/access-control/security-identifiers) (SID)
which is stored in the Windows Security Access Manager (SAM) database. This
database is not shared between the host and containers, or between containers.
* File permissions - Windows uses an access control list based on (SIDs), whereas
POSIX systems such as Linux use a bitmask based on object permissions and UID+GID,
plus _optional_ access control lists.
* File paths - the convention on Windows is to use `\` instead of `/`. The Go IO
libraries typically accept both and just make it work, but when you're setting a
path or command line that's interpreted inside a container, `\` may be needed.
* Signals - Windows interactive apps handle termination differently, and can
implement one or more of these:
* A UI thread handles well-defined messages including `WM_CLOSE`.
* Console apps handle Ctrl-C or Ctrl-break using a Control Handler.
* Services register a Service Control Handler function that can accept
`SERVICE_CONTROL_STOP` control codes.
Container exit codes follow the same convention where 0 is success, and nonzero is failure.
The specific error codes may differ across Windows and Linux. However, exit codes
passed from the Kubernetes components (kubelet, kube-proxy) are unchanged.
##### Field compatibility for container specifications {#compatibility-v1-pod-spec-containers}
The following list documents differences between how Pod container specifications
work between Windows and Linux:
* Huge pages are not implemented in the Windows container
runtime, and are not available. They require [asserting a user
privilege](https://docs.microsoft.com/en-us/windows/desktop/Memory/large-page-support)
that's not configurable for containers.
* `requests.cpu` and `requests.memory` - requests are subtracted
from node available resources, so they can be used to avoid overprovisioning a
node. However, they cannot be used to guarantee resources in an overprovisioned
node. They should be applied to all containers as a best practice if the operator
wants to avoid overprovisioning entirely.
* `securityContext.allowPrivilegeEscalation` -
not possible on Windows; none of the capabilities are hooked up
* `securityContext.capabilities` -
POSIX capabilities are not implemented on Windows
* `securityContext.privileged` -
Windows doesn't support privileged containers
* `securityContext.procMount` -
Windows doesn't have a `/proc` filesystem
* `securityContext.readOnlyRootFilesystem` -
not possible on Windows; write access is required for registry & system
processes to run inside the container
* `securityContext.runAsGroup` -
not possible on Windows as there is no GID support
* `securityContext.runAsNonRoot` -
this setting will prevent containers from running as `ContainerAdministrator`
which is the closest equivalent to a root user on Windows.
* `securityContext.runAsUser` -
use [`runAsUserName`](/docs/tasks/configure-pod-container/configure-runasusername)
instead
* `securityContext.seLinuxOptions` -
not possible on Windows as SELinux is Linux-specific
* `terminationMessagePath` -
this has some limitations in that Windows doesn't support mapping single files. The
default value is `/dev/termination-log`, which does work because it does not
exist on Windows by default.
##### Field compatibility for Pod specifications {#compatibility-v1-pod}
The following list documents differences between how Pod specifications work between Windows and Linux:
* `hostIPC` and `hostpid` - host namespace sharing is not possible on Windows
* `hostNetwork` - There is no Windows OS support to share the host network
* `dnsPolicy` - setting the Pod `dnsPolicy` to `ClusterFirstWithHostNet` is
not supported on Windows because host networking is not provided. Pods always
run with a container network.
* `podSecurityContext` (see below)
* `shareProcessNamespace` - this is a beta feature, and depends on Linux namespaces
which are not implemented on Windows. Windows cannot share process namespaces or
the container's root filesystem. Only the network can be shared.
* `terminationGracePeriodSeconds` - this is not fully implemented in Docker on Windows,
see the [GitHub issue](https://github.com/moby/moby/issues/25982).
The behavior today is that the ENTRYPOINT process is sent CTRL_SHUTDOWN_EVENT,
then Windows waits 5 seconds by default, and finally shuts down
all processes using the normal Windows shutdown behavior. The 5
second default is actually in the Windows registry
[inside the container](https://github.com/moby/moby/issues/25982#issuecomment-426441183),
so it can be overridden when the container is built.
* `volumeDevices` - this is a beta feature, and is not implemented on Windows.
Windows cannot attach raw block devices to pods.
* `volumes`
* If you define an `emptyDir` volume, you cannot set its volume source to `memory`.
* You cannot enable `mountPropagation` for volume mounts as this is not
supported on Windows.
##### Field compatibility for Pod security context {#compatibility-v1-pod-spec-containers-securitycontext}
None of the Pod [`securityContext`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context) fields work on Windows.
### Node problem detector
The node problem detector (see
[Monitor Node Health](/docs/tasks/debug/debug-cluster/monitor-node-health/))
is not compatible with Windows.
### Pause container
In a Kubernetes Pod, an infrastructure or “pause” container is first created
to host the container. In Linux, the cgroups and namespaces that make up a pod
need a process to maintain their continued existence; the pause process provides
this. Containers that belong to the same pod, including infrastructure and worker
containers, share a common network endpoint (same IPv4 and / or IPv6 address, same
network port spaces). Kubernetes uses pause containers to allow for worker containers
crashing or restarting without losing any of the networking configuration.
Kubernetes maintains a multi-architecture image that includes support for Windows.
For Kubernetes v{{< skew currentVersion >}} the recommended pause image is `k8s.gcr.io/pause:3.6`.
The [source code](https://github.com/kubernetes/kubernetes/tree/master/build/pause)
is available on GitHub.
Microsoft maintains a different multi-architecture image, with Linux and Windows
amd64 support, that you can find as `mcr.microsoft.com/oss/kubernetes/pause:3.6`.
This image is built from the same source as the Kubernetes maintained image but
all of the Windows binaries are [authenticode signed](https://docs.microsoft.com/en-us/windows-hardware/drivers/install/authenticode) by Microsoft.
The Kubernetes project recommends using the Microsoft maintained image if you are
deploying to a production or production-like environment that requires signed
binaries.
### Container runtimes {#container-runtime}
You need to install a
{{< glossary_tooltip text="container runtime" term_id="container-runtime" >}}
into each node in the cluster so that Pods can run there.
The following container runtimes work with Windows:
{{% thirdparty-content %}}
#### cri-containerd
{{< feature-state for_k8s_version="v1.20" state="stable" >}}
You can use {{< glossary_tooltip term_id="containerd" text="ContainerD" >}} 1.4.0+
as the container runtime for Kubernetes nodes that run Windows.
Learn how to [install ContainerD on a Windows node](/docs/setup/production-environment/container-runtimes/#install-containerd).
{{< note >}}
There is a [known limitation](/docs/tasks/configure-pod-container/configure-gmsa/#gmsa-limitations)
when using GMSA with containerd to access Windows network shares, which requires a
kernel patch.
{{< /note >}}
#### Mirantis Container Runtime {#mcr}
[Mirantis Container Runtime](https://docs.mirantis.com/mcr/20.10/overview.html) (MCR) is available as a container runtime for all Windows Server 2019 and later versions.
See [Install MCR on Windows Servers](https://docs.mirantis.com/mcr/20.10/install/mcr-windows.html) for more information.
## Windows OS version compatibility {#windows-os-version-support}
On Windows nodes, strict compatibility rules apply where the host OS version must
match the container base image OS version. Only Windows containers with a container
operating system of Windows Server 2019 are fully supported.
For Kubernetes v{{< skew currentVersion >}}, operating system compatibility for Windows nodes (and Pods)
is as follows:
Windows Server LTSC release
: Windows Server 2019
: Windows Server 2022
Windows Server SAC release
: Windows Server version 20H2
The Kubernetes [version-skew policy](/docs/setup/release/version-skew-policy/) also applies.
## Getting help and troubleshooting {#troubleshooting}
Your main source of help for troubleshooting your Kubernetes cluster should start
with the [Troubleshooting](/docs/tasks/debug/)
page.
Some additional, Windows-specific troubleshooting help is included
in this section. Logs are an important element of troubleshooting
issues in Kubernetes. Make sure to include them any time you seek
troubleshooting assistance from other contributors. Follow the
instructions in the
SIG Windows [contributing guide on gathering logs](https://github.com/kubernetes/community/blob/master/sig-windows/CONTRIBUTING.md#gathering-logs).
### Reporting issues and feature requests
If you have what looks like a bug, or you would like to
make a feature request, please follow the [SIG Windows contributing guide](https://github.com/kubernetes/community/blob/master/sig-windows/CONTRIBUTING.md#reporting-issues-and-feature-requests) to create a new issue.
You should first search the list of issues in case it was
reported previously and comment with your experience on the issue and add additional
logs. SIG-Windows Slack is also a great avenue to get some initial support and
troubleshooting ideas prior to creating a ticket.
## {{% heading "whatsnext" %}}
### Deployment tools
The kubeadm tool helps you to deploy a Kubernetes cluster, providing the control
plane to manage the cluster it, and nodes to run your workloads.
[Adding Windows nodes](/docs/tasks/administer-cluster/kubeadm/adding-windows-nodes/)
explains how to deploy Windows nodes to your cluster using kubeadm.
The Kubernetes [cluster API](https://cluster-api.sigs.k8s.io/) project also provides means to automate deployment of Windows nodes.
### Windows distribution channels
For a detailed explanation of Windows distribution channels see the [Microsoft documentation](https://docs.microsoft.com/en-us/windows-server/get-started-19/servicing-channels-19).
Information on the different Windows Server servicing channels
including their support models can be found at
[Windows Server servicing channels](https://docs.microsoft.com/en-us/windows-server/get-started/servicing-channels-comparison).

View File

@ -3,7 +3,6 @@ reviewers:
- jayunit100
- jsturtevant
- marosset
- perithompson
title: Guide for scheduling Windows containers in Kubernetes
content_type: concept
weight: 75
@ -11,31 +10,29 @@ weight: 75
<!-- overview -->
Windows applications constitute a large portion of the services and applications that run in many organizations.
This guide walks you through the steps to configure and deploy a Windows container in Kubernetes.
Windows applications constitute a large portion of the services and applications that run in many organizations.
This guide walks you through the steps to configure and deploy Windows containers in Kubernetes.
<!-- body -->
## Objectives
* Configure an example deployment to run Windows containers on the Windows node
* (Optional) Configure an Active Directory Identity for your Pod using Group Managed Service Accounts (GMSA)
* Highlight Windows specific funcationality in Kubernetes
## Before you begin
* Create a Kubernetes cluster that includes a
* Create a Kubernetes cluster that includes a
control plane and a [worker node running Windows Server](/docs/tasks/administer-cluster/kubeadm/adding-windows-nodes/)
* It is important to note that creating and deploying services and workloads on Kubernetes
behaves in much the same way for Linux and Windows containers.
[Kubectl commands](/docs/reference/kubectl/) to interface with the cluster are identical.
* It is important to note that creating and deploying services and workloads on Kubernetes
behaves in much the same way for Linux and Windows containers.
[Kubectl commands](/docs/reference/kubectl/) to interface with the cluster are identical.
The example in the section below is provided to jumpstart your experience with Windows containers.
## Getting Started: Deploying a Windows container
To deploy a Windows container on Kubernetes, you must first create an example application.
The example YAML file below creates a simple webserver application.
The example YAML file below deploys a simple webserver application running inside a Windows container.
Create a service spec named `win-webserver.yaml` with the contents below:
```yaml
@ -83,8 +80,8 @@ spec:
```
{{< note >}}
Port mapping is also supported, but for simplicity in this example
the container port 80 is exposed directly to the service.
Port mapping is also supported, but for simplicity this example exposes
port 80 of the container directly to the Service.
{{< /note >}}
1. Check that all nodes are healthy:
@ -104,20 +101,19 @@ the container port 80 is exposed directly to the service.
1. Check that the deployment succeeded. To verify:
* Two containers per pod on the Windows node, use `docker ps`
* Two pods listed from the Linux control plane node, use `kubectl get pods`
* Node-to-pod communication across the network, `curl` port 80 of your pod IPs from the Linux control plane node
* Two pods listed from the Linux control plane node, use `kubectl get pods`
* Node-to-pod communication across the network, `curl` port 80 of your pod IPs from the Linux control plane node
to check for a web server response
* Pod-to-pod communication, ping between pods (and across hosts, if you have more than one Windows node)
* Pod-to-pod communication, ping between pods (and across hosts, if you have more than one Windows node)
using docker exec or kubectl exec
* Service-to-pod communication, `curl` the virtual service IP (seen under `kubectl get services`)
* Service-to-pod communication, `curl` the virtual service IP (seen under `kubectl get services`)
from the Linux control plane node and from individual pods
* Service discovery, `curl` the service name with the Kubernetes [default DNS suffix](/docs/concepts/services-networking/dns-pod-service/#services)
* Inbound connectivity, `curl` the NodePort from the Linux control plane node or machines outside of the cluster
* Outbound connectivity, `curl` external IPs from inside the pod using kubectl exec
{{< note >}}
Windows container hosts are not able to access the IP of services scheduled on them due to current platform limitations of the Windows networking stack.
Windows container hosts are not able to access the IP of services scheduled on them due to current platform limitations of the Windows networking stack.
Only Windows pods are able to access service IPs.
{{< /note >}}
@ -125,78 +121,85 @@ Only Windows pods are able to access service IPs.
### Capturing logs from workloads
Logs are an important element of observability; they enable users to gain insights
into the operational aspect of workloads and are a key ingredient to troubleshooting issues.
Because Windows containers and workloads inside Windows containers behave differently from Linux containers,
users had a hard time collecting logs, limiting operational visibility.
Windows workloads for example are usually configured to log to ETW (Event Tracing for Windows)
or push entries to the application event log.
[LogMonitor](https://github.com/microsoft/windows-container-tools/tree/master/LogMonitor), an open source tool by Microsoft,
is the recommended way to monitor configured log sources inside a Windows container.
LogMonitor supports monitoring event logs, ETW providers, and custom application logs,
Logs are an important element of observability; they enable users to gain insights
into the operational aspect of workloads and are a key ingredient to troubleshooting issues.
Because Windows containers and workloads inside Windows containers behave differently from Linux containers,
users had a hard time collecting logs, limiting operational visibility.
Windows workloads for example are usually configured to log to ETW (Event Tracing for Windows)
or push entries to the application event log.
[LogMonitor](https://github.com/microsoft/windows-container-tools/tree/master/LogMonitor), an open source tool by Microsoft,
is the recommended way to monitor configured log sources inside a Windows container.
LogMonitor supports monitoring event logs, ETW providers, and custom application logs,
piping them to STDOUT for consumption by `kubectl logs <pod>`.
Follow the instructions in the LogMonitor GitHub page to copy its binaries and configuration files
Follow the instructions in the LogMonitor GitHub page to copy its binaries and configuration files
to all your containers and add the necessary entrypoints for LogMonitor to push your logs to STDOUT.
## Using configurable Container usernames
## Configuring container user
Starting with Kubernetes v1.16, Windows containers can be configured to run their entrypoints and processes
with different usernames than the image defaults.
The way this is achieved is a bit different from the way it is done for Linux containers.
### Using configurable Container usernames
Windows containers can be configured to run their entrypoints and processes
with different usernames than the image defaults.
Learn more about it [here](/docs/tasks/configure-pod-container/configure-runasusername/).
## Managing Workload Identity with Group Managed Service Accounts
### Managing Workload Identity with Group Managed Service Accounts
Starting with Kubernetes v1.14, Windows container workloads can be configured to use Group Managed Service Accounts (GMSA).
Group Managed Service Accounts are a specific type of Active Directory account that provides automatic password management,
simplified service principal name (SPN) management, and the ability to delegate the management to other administrators across multiple servers.
Containers configured with a GMSA can access external Active Directory Domain resources while carrying the identity configured with the GMSA.
Windows container workloads can be configured to use Group Managed Service Accounts (GMSA).
Group Managed Service Accounts are a specific type of Active Directory account that provide automatic password management,
simplified service principal name (SPN) management, and the ability to delegate the management to other administrators across multiple servers.
Containers configured with a GMSA can access external Active Directory Domain resources while carrying the identity configured with the GMSA.
Learn more about configuring and using GMSA for Windows containers [here](/docs/tasks/configure-pod-container/configure-gmsa/).
## Taints and Tolerations
Users today need to use some combination of taints and node selectors in order to
keep Linux and Windows workloads on their respective OS-specific nodes.
This likely imposes a burden only on Windows users. The recommended approach is outlined below,
Users need to use some combination of taints and node selectors in order to
schedule Linux and Windows workloads to their respective OS-specific nodes.
The recommended approach is outlined below,
with one of its main goals being that this approach should not break compatibility for existing Linux workloads.
{{< note >}}
If the `IdentifyPodOS` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is
enabled, you can (and should) set `.spec.os.name` for a Pod to indicate the operating system
that the containers in that Pod are designed for. For Pods that run Linux containers, set
`.spec.os.name` to `linux`. For Pods that run Windows containers, set `.spec.os.name`
to Windows.
{{< note >}}
Starting from 1.24, the `IdentifyPodOS` feature is in Beta stage and defaults to be enabled.
{{< /note >}}
The scheduler does not use the value of `.spec.os.name` when assigning Pods to nodes. You should
use normal Kubernetes mechanisms for
[assigning pods to nodes](/docs/concepts/scheduling-eviction/assign-pod-node/)
to ensure that the control plane for your cluster places pods onto nodes that are running the
appropriate operating system.
no effect on the scheduling of the Windows pods, so taints and tolerations and node selectors are still required
The `.spec.os.name` value has no effect on the scheduling of the Windows pods,
so taints and tolerations and node selectors are still required
to ensure that the Windows pods land onto appropriate Windows nodes.
{{< /note >}}
### Ensuring OS-specific workloads land on the appropriate container host
Users can ensure Windows containers can be scheduled on the appropriate host using Taints and Tolerations.
Users can ensure Windows containers can be scheduled on the appropriate host using Taints and Tolerations.
All Kubernetes nodes today have the following default labels:
* kubernetes.io/os = [windows|linux]
* kubernetes.io/arch = [amd64|arm64|...]
If a Pod specification does not specify a nodeSelector like `"kubernetes.io/os": windows`,
it is possible the Pod can be scheduled on any host, Windows or Linux.
This can be problematic since a Windows container can only run on Windows and a Linux container can only run on Linux.
If a Pod specification does not specify a nodeSelector like `"kubernetes.io/os": windows`,
it is possible the Pod can be scheduled on any host, Windows or Linux.
This can be problematic since a Windows container can only run on Windows and a Linux container can only run on Linux.
The best practice is to use a nodeSelector.
However, we understand that in many cases users have a pre-existing large number of deployments for Linux containers,
as well as an ecosystem of off-the-shelf configurations, such as community Helm charts, and programmatic Pod generation cases, such as with Operators.
In those situations, you may be hesitant to make the configuration change to add nodeSelectors.
The alternative is to use Taints. Because the kubelet can set Taints during registration,
However, we understand that in many cases users have a pre-existing large number of deployments for Linux containers,
as well as an ecosystem of off-the-shelf configurations, such as community Helm charts, and programmatic Pod generation cases, such as with Operators.
In those situations, you may be hesitant to make the configuration change to add nodeSelectors.
The alternative is to use Taints. Because the kubelet can set Taints during registration,
it could easily be modified to automatically add a taint when running on Windows only.
For example: `--register-with-taints='os=windows:NoSchedule'`
By adding a taint to all Windows nodes, nothing will be scheduled on them (that includes existing Linux Pods).
By adding a taint to all Windows nodes, nothing will be scheduled on them (that includes existing Linux Pods).
In order for a Windows Pod to be scheduled on a Windows node,
it would need both the nodeSelector and the appropriate matching toleration to choose Windows.
@ -216,26 +219,24 @@ tolerations:
The Windows Server version used by each pod must match that of the node. If you want to use multiple Windows
Server versions in the same cluster, then you should set additional node labels and nodeSelectors.
Kubernetes 1.17 automatically adds a new label `node.kubernetes.io/windows-build` to simplify this.
Kubernetes 1.17 automatically adds a new label `node.kubernetes.io/windows-build` to simplify this.
If you're running an older version, then it's recommended to add this label manually to Windows nodes.
This label reflects the Windows major, minor, and build number that need to match for compatibility.
This label reflects the Windows major, minor, and build number that need to match for compatibility.
Here are values used today for each Windows Server version.
| Product Name | Build Number(s) |
|--------------------------------------|------------------------|
| Windows Server 2019 | 10.0.17763 |
| Windows Server version 1809 | 10.0.17763 |
| Windows Server version 1903 | 10.0.18362 |
| Windows Server, Version 20H2 | 10.0.19042 |
| Windows Server 2022 | 10.0.20348 |
### Simplifying with RuntimeClass
[RuntimeClass] can be used to simplify the process of using taints and tolerations.
[RuntimeClass] can be used to simplify the process of using taints and tolerations.
A cluster administrator can create a `RuntimeClass` object which is used to encapsulate these taints and tolerations.
1. Save this file to `runtimeClasses.yml`. It includes the appropriate `nodeSelector`
1. Save this file to `runtimeClasses.yml`. It includes the appropriate `nodeSelector`
for the Windows OS, architecture, and version.
```yaml
@ -306,7 +307,4 @@ spec:
app: iis-2019
```
[RuntimeClass]: https://kubernetes.io/docs/concepts/containers/runtime-class/

Some files were not shown because too many files have changed in this diff Show More