Merge branch 'main' into fixes-31657
commit
3272b3577c
|
@ -1,6 +1,7 @@
|
|||
[submodule "themes/docsy"]
|
||||
path = themes/docsy
|
||||
url = https://github.com/google/docsy.git
|
||||
branch = v0.2.0
|
||||
[submodule "api-ref-generator"]
|
||||
path = api-ref-generator
|
||||
url = https://github.com/kubernetes-sigs/reference-docs
|
||||
|
|
|
@ -27,16 +27,18 @@ RUN mkdir $HOME/src && \
|
|||
FROM golang:1.16-alpine
|
||||
|
||||
RUN apk add --no-cache \
|
||||
runuser \
|
||||
git \
|
||||
openssh-client \
|
||||
rsync \
|
||||
npm && \
|
||||
npm install -D autoprefixer postcss-cli
|
||||
|
||||
RUN mkdir -p /usr/local/src && \
|
||||
cd /usr/local/src && \
|
||||
RUN mkdir -p /var/hugo && \
|
||||
addgroup -Sg 1000 hugo && \
|
||||
adduser -Sg hugo -u 1000 -h /src hugo
|
||||
adduser -Sg hugo -u 1000 -h /var/hugo hugo && \
|
||||
chown -R hugo: /var/hugo && \
|
||||
runuser -u hugo -- git config --global --add safe.directory /src
|
||||
|
||||
COPY --from=0 /go/bin/hugo /usr/local/bin/hugo
|
||||
|
||||
|
|
3
Makefile
3
Makefile
|
@ -71,6 +71,9 @@ container-image: ## Build a container image for the preview of the website
|
|||
--tag $(CONTAINER_IMAGE) \
|
||||
--build-arg HUGO_VERSION=$(HUGO_VERSION)
|
||||
|
||||
container-push: container-image ## Push container image for the preview of the website
|
||||
$(CONTAINER_ENGINE) push $(CONTAINER_IMAGE)
|
||||
|
||||
container-build: module-check
|
||||
$(CONTAINER_RUN) --read-only --mount type=tmpfs,destination=/tmp,tmpfs-mode=01777 $(CONTAINER_IMAGE) sh -c "npm ci && hugo --minify --environment development"
|
||||
|
||||
|
|
|
@ -127,6 +127,7 @@ aliases:
|
|||
# MasayaAoyama
|
||||
- nasa9084
|
||||
# oke-py
|
||||
- ptux
|
||||
sig-docs-ko-owners: # Admins for Korean content
|
||||
- ClaudiaJKang
|
||||
- gochist
|
||||
|
@ -178,6 +179,7 @@ aliases:
|
|||
- tanjunchen
|
||||
- tengqm
|
||||
- xichengliudui
|
||||
- ydFu
|
||||
# zhangxiaoyu-zidif
|
||||
sig-docs-pt-owners: # Admins for Portuguese content
|
||||
- edsoncelio
|
||||
|
@ -248,7 +250,6 @@ aliases:
|
|||
- cpanato # SIG Technical Lead
|
||||
- jeremyrickard # SIG Technical Lead
|
||||
- justaugustus # SIG Chair
|
||||
- LappleApple # SIG Program Manager
|
||||
- puerco # SIG Technical Lead
|
||||
- saschagrunert # SIG Chair
|
||||
release-engineering-approvers:
|
||||
|
|
52
README-ja.md
52
README-ja.md
|
@ -4,6 +4,9 @@
|
|||
|
||||
このリポジトリには、[KubernetesのWebサイトとドキュメント](https://kubernetes.io/)をビルドするために必要な全アセットが格納されています。貢献に興味を持っていただきありがとうございます!
|
||||
|
||||
- [ドキュメントに貢献する](#contributing-to-the-docs)
|
||||
- [翻訳された`README.md`一覧](#localization-readmemds)
|
||||
|
||||
# リポジトリの使い方
|
||||
|
||||
Hugo(Extended version)を使用してWebサイトをローカルで実行することも、コンテナランタイムで実行することもできます。コンテナランタイムを使用することを強くお勧めします。これにより、本番Webサイトとのデプロイメントの一貫性が得られます。
|
||||
|
@ -56,6 +59,43 @@ make serve
|
|||
|
||||
これで、Hugoのサーバーが1313番ポートを使って開始します。お使いのブラウザにて http://localhost:1313 にアクセスしてください。リポジトリ内のソースファイルに変更を加えると、HugoがWebサイトの内容を更新してブラウザに反映します。
|
||||
|
||||
## API reference pagesをビルドする
|
||||
|
||||
`content/en/docs/reference/kubernetes-api`に配置されているAPIリファレンスページは<https://github.com/kubernetes-sigs/reference-docs/tree/master/gen-resourcesdocs>を使ってSwagger仕様書からビルドされています。
|
||||
|
||||
新しいKubernetesリリースのためにリファレンスページをアップデートするには、次の手順を実行します:
|
||||
|
||||
1. `api-ref-generator`サブモジュールをプルする:
|
||||
|
||||
```bash
|
||||
git submodule update --init --recursive --depth 1
|
||||
```
|
||||
|
||||
2. Swagger仕様書を更新する:
|
||||
|
||||
```bash
|
||||
curl 'https://raw.githubusercontent.com/kubernetes/kubernetes/master/api/openapi-spec/swagger.json' > api-ref-assets/api/swagger.json
|
||||
```
|
||||
|
||||
3. 新しいリリースの変更を反映するため、`api-ref-assets/config/`で`toc.yaml`と`fields.yaml`を適用する。
|
||||
|
||||
4. 次に、ページをビルドする:
|
||||
|
||||
```bash
|
||||
make api-reference
|
||||
```
|
||||
|
||||
コンテナイメージからサイトを作成・サーブする事でローカルで結果をテストすることができます:
|
||||
|
||||
```bash
|
||||
make container-image
|
||||
make container-serve
|
||||
```
|
||||
|
||||
APIリファレンスを見るために、ブラウザで<http://localhost:1313/docs/reference/kubernetes-api/>を開いてください。
|
||||
|
||||
5. 新しいコントラクトのすべての変更が設定ファイル`toc.yaml`と`fields.yaml`に反映されたら、新しく生成されたAPIリファレンスページとともにPull Requestを作成します。
|
||||
|
||||
## トラブルシューティング
|
||||
|
||||
### error: failed to transform resource: TOCSS: failed to transform "scss/main.scss" (text/x-scss): this feature is not available in your current Hugo version
|
||||
|
@ -107,7 +147,7 @@ sudo launchctl load -w /Library/LaunchDaemons/limit.maxfiles.plist
|
|||
- [Slack](https://kubernetes.slack.com/messages/kubernetes-docs-ja)
|
||||
- [メーリングリスト](https://groups.google.com/forum/#!forum/kubernetes-sig-docs)
|
||||
|
||||
## ドキュメントに貢献する
|
||||
## ドキュメントに貢献する {#contributing-to-the-docs}
|
||||
|
||||
GitHubの画面右上にある**Fork**ボタンをクリックすると、お使いのGitHubアカウントに紐付いた本リポジトリのコピーが作成され、このコピーのことを*フォーク*と呼びます。フォークリポジトリの中ではお好きなように変更を加えていただいて構いません。加えた変更をこのリポジトリに追加したい任意のタイミングにて、フォークリポジトリからPull Reqeustを作成してください。
|
||||
|
||||
|
@ -124,7 +164,15 @@ Kubernetesのドキュメントへの貢献に関する詳細については以
|
|||
* [ドキュメントのスタイルガイド](https://kubernetes.io/docs/contribute/style/style-guide/)
|
||||
* [Kubernetesドキュメントの翻訳方法](https://kubernetes.io/docs/contribute/localization/)
|
||||
|
||||
## 翻訳された`README.md`一覧
|
||||
### New Contributor Ambassadors
|
||||
|
||||
コントリビュートする時に何か助けが必要なら、[New Contributor Ambassadors](https://kubernetes.io/docs/contribute/advanced/#serve-as-a-new-contributor-ambassador)に聞いてみると良いでしょう。彼らはSIG Docsのapproverで、最初の数回のPull Requestを通して新しいコントリビューターを指導し助けることを責務としています。New Contributors Ambassadorsにコンタクトするには、[Kubernetes Slack](https://slack.k8s.io)が最適な場所です。現在のSIG DocsのNew Contributor Ambassadorは次の通りです:
|
||||
|
||||
| 名前 | Slack | GitHub |
|
||||
| -------------------------- | -------------------------- | -------------------------- |
|
||||
| Arsh Sharma | @arsh | @RinkiyaKeDad |
|
||||
|
||||
## 翻訳された`README.md`一覧 {#localization-readmemds}
|
||||
|
||||
| Language | Language |
|
||||
|---|---|
|
||||
|
|
198
README-zh.md
198
README-zh.md
|
@ -13,7 +13,14 @@ This repository contains the assets required to build the [Kubernetes website an
|
|||
我们非常高兴您想要参与贡献!
|
||||
|
||||
<!--
|
||||
# Using this repository
|
||||
- [Contributing to the docs](#contributing-to-the-docs)
|
||||
- [Localization ReadMes](#localization-readmemds)
|
||||
-->
|
||||
- [为文档做贡献](#为文档做贡献)
|
||||
- [README.md 本地化](#readmemd-本地化)
|
||||
|
||||
<!--
|
||||
## Using this repository
|
||||
|
||||
You can run the website locally using Hugo (Extended version), or you can run it in a container runtime. We strongly recommend using the container runtime, as it gives deployment consistency with the live website.
|
||||
-->
|
||||
|
@ -46,7 +53,7 @@ Before you start, install the dependencies. Clone the repository and navigate to
|
|||
-->
|
||||
开始前,先安装这些依赖。克隆本仓库并进入对应目录:
|
||||
|
||||
```
|
||||
```bash
|
||||
git clone https://github.com/kubernetes/website.git
|
||||
cd website
|
||||
```
|
||||
|
@ -57,7 +64,7 @@ The Kubernetes website uses the [Docsy Hugo theme](https://github.com/google/doc
|
|||
|
||||
Kubernetes 网站使用的是 [Docsy Hugo 主题](https://github.com/google/docsy#readme)。 即使你打算在容器中运行网站,我们也强烈建议你通过运行以下命令来引入子模块和其他开发依赖项:
|
||||
|
||||
```
|
||||
```bash
|
||||
# pull in the Docsy submodule
|
||||
git submodule update --init --recursive --depth 1
|
||||
```
|
||||
|
@ -72,15 +79,23 @@ To build the site in a container, run the following to build the container image
|
|||
|
||||
要在容器中构建网站,请通过以下命令来构建容器镜像并运行:
|
||||
|
||||
```
|
||||
```bash
|
||||
make container-image
|
||||
make container-serve
|
||||
```
|
||||
|
||||
<!--
|
||||
Open up your browser to http://localhost:1313 to view the website. As you make changes to the source files, Hugo updates the website and forces a browser refresh.
|
||||
If you see errors, it probably means that the hugo container did not have enough computing resources available. To solve it, increase the amount of allowed CPU and memory usage for Docker on your machine ([MacOSX](https://docs.docker.com/docker-for-mac/#resources) and [Windows](https://docs.docker.com/docker-for-windows/#resources)).
|
||||
-->
|
||||
启动浏览器,打开 http://localhost:1313 来查看网站。
|
||||
如果您看到错误,这可能意味着 hugo 容器没有足够的可用计算资源。
|
||||
要解决这个问题,请增加机器([MacOSX](https://docs.docker.com/docker-for-mac/#resources)
|
||||
和 [Windows](https://docs.docker.com/docker-for-windows/#resources))上
|
||||
Docker 允许的 CPU 和内存使用量。
|
||||
|
||||
<!--
|
||||
Open up your browser to <http://localhost:1313> to view the website. As you make changes to the source files, Hugo updates the website and forces a browser refresh.
|
||||
-->
|
||||
启动浏览器,打开 <http://localhost:1313> 来查看网站。
|
||||
当你对源文件作出修改时,Hugo 会更新网站并强制浏览器执行刷新操作。
|
||||
|
||||
<!--
|
||||
|
@ -104,18 +119,84 @@ make serve
|
|||
```
|
||||
|
||||
<!--
|
||||
This will start the local Hugo server on port 1313. Open up your browser to http://localhost:1313 to view the website. As you make changes to the source files, Hugo updates the website and forces a browser refresh.
|
||||
This will start the local Hugo server on port 1313. Open up your browser to <http://localhost:1313> to view the website. As you make changes to the source files, Hugo updates the website and forces a browser refresh.
|
||||
-->
|
||||
上述命令会在端口 1313 上启动本地 Hugo 服务器。
|
||||
启动浏览器,打开 http://localhost:1313 来查看网站。
|
||||
启动浏览器,打开 <http://localhost:1313> 来查看网站。
|
||||
当你对源文件作出修改时,Hugo 会更新网站并强制浏览器执行刷新操作。
|
||||
|
||||
<!--
|
||||
## Building the API reference pages
|
||||
-->
|
||||
## 构建 API 参考页面
|
||||
|
||||
<!--
|
||||
The API reference pages located in `content/en/docs/reference/kubernetes-api` are built from the Swagger specification, using <https://github.com/kubernetes-sigs/reference-docs/tree/master/gen-resourcesdocs>.
|
||||
|
||||
To update the reference pages for a new Kubernetes release follow these steps:
|
||||
-->
|
||||
位于 `content/en/docs/reference/kubernetes-api` 的 API 参考页面是根据 Swagger 规范构建的,使用 <https://github.com/kubernetes-sigs/reference-docs/tree/master/gen-resourcesdocs>。
|
||||
|
||||
要更新新 Kubernetes 版本的参考页面,请执行以下步骤:
|
||||
|
||||
<!--
|
||||
1. Pull in the `api-ref-generator` submodule:
|
||||
-->
|
||||
1. 拉取 `api-ref-generator` 子模块:
|
||||
|
||||
```bash
|
||||
git submodule update --init --recursive --depth 1
|
||||
```
|
||||
|
||||
<!--
|
||||
2. Update the Swagger specification:
|
||||
-->
|
||||
2. 更新 Swagger 规范:
|
||||
|
||||
```bash
|
||||
curl 'https://raw.githubusercontent.com/kubernetes/kubernetes/master/api/openapi-spec/swagger.json' > api-ref-assets/api/swagger.json
|
||||
```
|
||||
|
||||
<!--
|
||||
3. In `api-ref-assets/config/`, adapt the files `toc.yaml` and `fields.yaml` to reflect the changes of the new release.
|
||||
-->
|
||||
3. 在 `api-ref-assets/config/` 中,调整文件 `toc.yaml` 和 `fields.yaml` 以反映新版本的变化。
|
||||
|
||||
<!--
|
||||
4. Next, build the pages:
|
||||
-->
|
||||
4. 接下来,构建页面:
|
||||
|
||||
```bash
|
||||
make api-reference
|
||||
```
|
||||
|
||||
<!--
|
||||
You can test the results locally by making and serving the site from a container image:
|
||||
-->
|
||||
您可以通过从容器映像创建和提供站点来在本地测试结果:
|
||||
|
||||
```bash
|
||||
make container-image
|
||||
make container-serve
|
||||
```
|
||||
|
||||
<!--
|
||||
In a web browser, go to <http://localhost:1313/docs/reference/kubernetes-api/> to view the API reference.
|
||||
-->
|
||||
在 Web 浏览器中,打开 <http://localhost:1313/docs/reference/kubernetes-api/> 查看 API 参考。
|
||||
|
||||
<!--
|
||||
5. When all changes of the new contract are reflected into the configuration files `toc.yaml` and `fields.yaml`, create a Pull Request with the newly generated API reference pages.
|
||||
-->
|
||||
5. 当所有新的更改都反映到配置文件 `toc.yaml` 和 `fields.yaml` 中时,使用新生成的 API 参考页面创建一个 Pull Request。
|
||||
|
||||
<!--
|
||||
## Troubleshooting
|
||||
|
||||
### error: failed to transform resource: TOCSS: failed to transform "scss/main.scss" (text/x-scss): this feature is not available in your current Hugo version
|
||||
|
||||
Hugo is shipped in two set of binaries for technical reasons. The current website runs based on the **Hugo Extended** version only. In the [release page](https://github.com/gohugoio/hugo/releases) look for archives with `extended` in the name. To confirm, run `hugo version` and look for the word `extended`.
|
||||
|
||||
-->
|
||||
## 故障排除
|
||||
|
||||
|
@ -131,22 +212,28 @@ Hugo is shipped in two set of binaries for technical reasons. The current websit
|
|||
If you run `make serve` on macOS and receive the following error:
|
||||
|
||||
-->
|
||||
### 对 macOs 上打开太多文件的故障排除
|
||||
### 对 macOS 上打开太多文件的故障排除
|
||||
|
||||
如果在 macOS 上运行 `make serve` 收到以下错误:
|
||||
|
||||
```
|
||||
```bash
|
||||
ERROR 2020/08/01 19:09:18 Error: listen tcp 127.0.0.1:1313: socket: too many open files
|
||||
make: *** [serve] Error 1
|
||||
```
|
||||
|
||||
<!--
|
||||
Try checking the current limit for open files:
|
||||
-->
|
||||
试着查看一下当前打开文件数的限制:
|
||||
|
||||
`launchctl limit maxfiles`
|
||||
|
||||
然后运行以下命令(参考https://gist.github.com/tombigel/d503800a282fcadbee14b537735d202c):
|
||||
<!--
|
||||
Then run the following commands (adapted from <https://gist.github.com/tombigel/d503800a282fcadbee14b537735d202c>):
|
||||
-->
|
||||
然后运行以下命令(参考 <https://gist.github.com/tombigel/d503800a282fcadbee14b537735d202c>):
|
||||
|
||||
```
|
||||
```shell
|
||||
#!/bin/sh
|
||||
|
||||
# These are the original gist links, linking to my gists now.
|
||||
|
@ -165,6 +252,9 @@ sudo chown root:wheel /Library/LaunchDaemons/limit.maxproc.plist
|
|||
sudo launchctl load -w /Library/LaunchDaemons/limit.maxfiles.plist
|
||||
```
|
||||
|
||||
<!--
|
||||
This works for Catalina as well as Mojave macOS.
|
||||
-->
|
||||
这适用于 Catalina 和 Mojave macOS。
|
||||
|
||||
<!--
|
||||
|
@ -174,7 +264,8 @@ Learn more about SIG Docs Kubernetes community and meetings on the [community pa
|
|||
|
||||
You can also reach the maintainers of this project at:
|
||||
|
||||
- [Slack](https://kubernetes.slack.com/messages/sig-docs) [Get an invite for this Slack](https://slack.k8s.io/)
|
||||
- [Slack](https://kubernetes.slack.com/messages/sig-docs)
|
||||
- [Get an invite for this Slack](https://slack.k8s.io/)
|
||||
- [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-sig-docs)
|
||||
-->
|
||||
# 参与 SIG Docs 工作
|
||||
|
@ -184,20 +275,21 @@ You can also reach the maintainers of this project at:
|
|||
|
||||
你也可以通过以下渠道联系本项目的维护人员:
|
||||
|
||||
- [Slack](https://kubernetes.slack.com/messages/sig-docs) [加入Slack](https://slack.k8s.io/)
|
||||
- [Slack](https://kubernetes.slack.com/messages/sig-docs)
|
||||
- [获得此 Slack 的邀请](https://slack.k8s.io/)
|
||||
- [邮件列表](https://groups.google.com/forum/#!forum/kubernetes-sig-docs)
|
||||
|
||||
<!--
|
||||
## Contributing to the docs
|
||||
|
||||
You can click the **Fork** button in the upper-right area of the screen to create a copy of this repository in your GitHub account. This copy is called a *fork*. Make any changes you want in your fork, and when you are ready to send those changes to us, go to your fork and create a new pull request to let us know about it.
|
||||
You can click the **Fork** button in the upper-right area of the screen to create a copy of this repository in your GitHub account. This copy is called a _fork_. Make any changes you want in your fork, and when you are ready to send those changes to us, go to your fork and create a new pull request to let us know about it.
|
||||
|
||||
Once your pull request is created, a Kubernetes reviewer will take responsibility for providing clear, actionable feedback. As the owner of the pull request, **it is your responsibility to modify your pull request to address the feedback that has been provided to you by the Kubernetes reviewer.**
|
||||
-->
|
||||
# 为文档做贡献
|
||||
|
||||
你也可以点击屏幕右上方区域的 **Fork** 按钮,在你自己的 GitHub
|
||||
账号下创建本仓库的拷贝。此拷贝被称作 *fork*。
|
||||
账号下创建本仓库的拷贝。此拷贝被称作 _fork_。
|
||||
你可以在自己的拷贝中任意地修改文档,并在你已准备好将所作修改提交给我们时,
|
||||
在你自己的拷贝下创建一个拉取请求(Pull Request),以便让我们知道。
|
||||
|
||||
|
@ -220,17 +312,65 @@ Furthermore, in some cases, one of your reviewers might ask for a technical revi
|
|||
<!--
|
||||
For more information about contributing to the Kubernetes documentation, see:
|
||||
|
||||
* [Contribute to Kubernetes docs](https://kubernetes.io/docs/contribute/)
|
||||
* [Page Content Types](https://kubernetes.io/docs/contribute/style/page-content-types/)
|
||||
* [Documentation Style Guide](https://kubernetes.io/docs/contribute/style/style-guide/)
|
||||
* [Localizing Kubernetes Documentation](https://kubernetes.io/docs/contribute/localization/)
|
||||
- [Contribute to Kubernetes docs](https://kubernetes.io/docs/contribute/)
|
||||
- [Page Content Types](https://kubernetes.io/docs/contribute/style/page-content-types/)
|
||||
- [Documentation Style Guide](https://kubernetes.io/docs/contribute/style/style-guide/)
|
||||
- [Localizing Kubernetes Documentation](https://kubernetes.io/docs/contribute/localization/)
|
||||
-->
|
||||
有关为 Kubernetes 文档做出贡献的更多信息,请参阅:
|
||||
|
||||
* [贡献 Kubernetes 文档](https://kubernetes.io/docs/contribute/)
|
||||
* [页面内容类型](https://kubernetes.io/docs/contribute/style/page-content-types/)
|
||||
* [文档风格指南](https://kubernetes.io/docs/contribute/style/style-guide/)
|
||||
* [本地化 Kubernetes 文档](https://kubernetes.io/docs/contribute/localization/)
|
||||
- [贡献 Kubernetes 文档](https://kubernetes.io/docs/contribute/)
|
||||
- [页面内容类型](https://kubernetes.io/docs/contribute/style/page-content-types/)
|
||||
- [文档风格指南](https://kubernetes.io/docs/contribute/style/style-guide/)
|
||||
- [本地化 Kubernetes 文档](https://kubernetes.io/docs/contribute/localization/)
|
||||
|
||||
<!--
|
||||
### New contributor ambassadors
|
||||
-->
|
||||
### 新贡献者大使
|
||||
|
||||
<!--
|
||||
If you need help at any point when contributing, the [New Contributor Ambassadors](https://kubernetes.io/docs/contribute/advanced/#serve-as-a-new-contributor-ambassador) are a good point of contact. These are SIG Docs approvers whose responsibilities include mentoring new contributors and helping them through their first few pull requests. The best place to contact the New Contributors Ambassadors would be on the [Kubernetes Slack](https://slack.k8s.io/). Current New Contributors Ambassadors for SIG Docs:
|
||||
-->
|
||||
如果您在贡献时需要帮助,[新贡献者大使](https://kubernetes.io/docs/contribute/advanced/#serve-as-a-new-contributor-ambassador)是一个很好的联系人。
|
||||
这些是 SIG Docs 批准者,其职责包括指导新贡献者并帮助他们完成最初的几个拉取请求。
|
||||
联系新贡献者大使的最佳地点是 [Kubernetes Slack](https://slack.k8s.io/)。
|
||||
SIG Docs 的当前新贡献者大使:
|
||||
|
||||
<!--
|
||||
| Name | Slack | GitHub |
|
||||
| -------------------------- | -------------------------- | -------------------------- |
|
||||
| Arsh Sharma | @arsh | @RinkiyaKeDad |
|
||||
-->
|
||||
| 姓名 | Slack | GitHub |
|
||||
| -------------------------- | -------------------------- | -------------------------- |
|
||||
| Arsh Sharma | @arsh | @RinkiyaKeDad |
|
||||
|
||||
<!--
|
||||
## Localization `README.md`'s
|
||||
-->
|
||||
## `README.md` 本地化
|
||||
|
||||
<!--
|
||||
| Language | Language |
|
||||
| -------------------------- | -------------------------- |
|
||||
| [Chinese](README-zh.md) | [Korean](README-ko.md) |
|
||||
| [French](README-fr.md) | [Polish](README-pl.md) |
|
||||
| [German](README-de.md) | [Portuguese](README-pt.md) |
|
||||
| [Hindi](README-hi.md) | [Russian](README-ru.md) |
|
||||
| [Indonesian](README-id.md) | [Spanish](README-es.md) |
|
||||
| [Italian](README-it.md) | [Ukrainian](README-uk.md) |
|
||||
| [Japanese](README-ja.md) | [Vietnamese](README-vi.md) |
|
||||
-->
|
||||
| 语言 | 语言 |
|
||||
| -------------------------- | -------------------------- |
|
||||
| [中文](README-zh.md) | [韩语](README-ko.md) |
|
||||
| [法语](README-fr.md) | [波兰语](README-pl.md) |
|
||||
| [德语](README-de.md) | [葡萄牙语](README-pt.md) |
|
||||
| [印地语](README-hi.md) | [俄语](README-ru.md) |
|
||||
| [印尼语](README-id.md) | [西班牙语](README-es.md) |
|
||||
| [意大利语](README-it.md) | [乌克兰语](README-uk.md) |
|
||||
| [日语](README-ja.md) | [越南语](README-vi.md) |
|
||||
|
||||
# 中文本地化
|
||||
|
||||
|
@ -241,19 +381,19 @@ For more information about contributing to the Kubernetes documentation, see:
|
|||
* [Slack channel](https://kubernetes.slack.com/messages/kubernetes-docs-zh)
|
||||
|
||||
<!--
|
||||
### Code of conduct
|
||||
## Code of conduct
|
||||
|
||||
Participation in the Kubernetes community is governed by the [CNCF Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md).
|
||||
-->
|
||||
# 行为准则
|
||||
## 行为准则
|
||||
|
||||
参与 Kubernetes 社区受 [CNCF 行为准则](https://github.com/cncf/foundation/blob/master/code-of-conduct.md) 约束。
|
||||
|
||||
<!--
|
||||
## Thank you!
|
||||
## Thank you
|
||||
|
||||
Kubernetes thrives on community participation, and we appreciate your contributions to our website and our documentation!
|
||||
-->
|
||||
# 感谢!
|
||||
## 感谢你
|
||||
|
||||
Kubernetes 因为社区的参与而蓬勃发展,感谢您对我们网站和文档的贡献!
|
||||
|
|
File diff suppressed because it is too large
Load Diff
|
@ -405,6 +405,7 @@
|
|||
- fields:
|
||||
- jobTemplate
|
||||
- schedule
|
||||
- timeZone
|
||||
- concurrencyPolicy
|
||||
- startingDeadlineSeconds
|
||||
- suspend
|
||||
|
|
|
@ -124,7 +124,7 @@ parts:
|
|||
version: v1
|
||||
- name: CSIStorageCapacity
|
||||
group: storage.k8s.io
|
||||
version: v1beta1
|
||||
version: v1
|
||||
- name: Authentication Resources
|
||||
chapters:
|
||||
- name: ServiceAccount
|
||||
|
|
|
@ -634,12 +634,12 @@ body.td-documentation {
|
|||
|
||||
a {
|
||||
color: inherit;
|
||||
border-bottom: 1px solid #fff;
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
a:hover {
|
||||
color: inherit;
|
||||
border-bottom: none;
|
||||
text-decoration: initial;
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -648,6 +648,9 @@ body.td-documentation {
|
|||
}
|
||||
|
||||
#announcement {
|
||||
// default background is blue; overrides are possible
|
||||
color: #fff;
|
||||
|
||||
.announcement-main {
|
||||
margin-left: auto;
|
||||
margin-right: auto;
|
||||
|
@ -660,9 +663,8 @@ body.td-documentation {
|
|||
}
|
||||
|
||||
|
||||
/* always white */
|
||||
h1, h2, h3, h4, h5, h6, p * {
|
||||
color: #ffffff;
|
||||
color: inherit; /* defaults to white */
|
||||
background: transparent;
|
||||
|
||||
img.event-logo {
|
||||
|
|
|
@ -9,17 +9,20 @@ options:
|
|||
steps:
|
||||
# It's fine to bump the tag to a recent version, as needed
|
||||
- name: "gcr.io/k8s-staging-test-infra/gcb-docker-gcloud:v20210917-12df099d55"
|
||||
entrypoint: make
|
||||
entrypoint: 'bash'
|
||||
env:
|
||||
- DOCKER_CLI_EXPERIMENTAL=enabled
|
||||
- TAG=$_GIT_TAG
|
||||
- BASE_REF=$_PULL_BASE_REF
|
||||
args:
|
||||
- container-image
|
||||
- -c
|
||||
- |
|
||||
gcloud auth configure-docker \
|
||||
&& make container-push
|
||||
substitutions:
|
||||
# _GIT_TAG will be filled with a git-based tag for the image, of the form vYYYYMMDD-hash, and
|
||||
# can be used as a substitution
|
||||
_GIT_TAG: "12345"
|
||||
# _PULL_BASE_REF will contain the ref that was pushed to to trigger this build -
|
||||
# a branch like 'master' or 'release-0.2', or a tag like 'v0.2'.
|
||||
_PULL_BASE_REF: "master"
|
||||
# a branch like 'main' or 'release-0.2', or a tag like 'v0.2'.
|
||||
_PULL_BASE_REF: "main"
|
||||
|
|
71
config.toml
71
config.toml
|
@ -139,10 +139,10 @@ time_format_default = "January 02, 2006 at 3:04 PM PST"
|
|||
description = "Production-Grade Container Orchestration"
|
||||
showedit = true
|
||||
|
||||
latest = "v1.23"
|
||||
latest = "v1.24"
|
||||
|
||||
fullversion = "v1.23.0"
|
||||
version = "v1.23"
|
||||
fullversion = "v1.24.0"
|
||||
version = "v1.24"
|
||||
githubbranch = "main"
|
||||
docsbranch = "main"
|
||||
deprecated = false
|
||||
|
@ -155,10 +155,10 @@ githubWebsiteRaw = "raw.githubusercontent.com/kubernetes/website"
|
|||
# GitHub repository link for editing a page and opening issues.
|
||||
github_repo = "https://github.com/kubernetes/website"
|
||||
|
||||
#Searching
|
||||
# Searching
|
||||
k8s_search = true
|
||||
|
||||
#The following search parameters are specific to Docsy's implementation. Kubernetes implementes its own search-related partials and scripts.
|
||||
# The following search parameters are specific to Docsy's implementation. Kubernetes implementes its own search-related partials and scripts.
|
||||
|
||||
# Google Custom Search Engine ID. Remove or comment out to disable search.
|
||||
#gcs_engine_id = "011737558837375720776:fsdu1nryfng"
|
||||
|
@ -179,44 +179,46 @@ js = [
|
|||
]
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.23.0"
|
||||
version = "v1.23"
|
||||
githubbranch = "v1.23.0"
|
||||
fullversion = "v1.24.0"
|
||||
version = "v1.24"
|
||||
githubbranch = "v1.24.0"
|
||||
docsbranch = "main"
|
||||
url = "https://kubernetes.io"
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.22.4"
|
||||
fullversion = "v1.23.6"
|
||||
version = "v1.23"
|
||||
githubbranch = "v1.23.6"
|
||||
docsbranch = "release-1.23"
|
||||
url = "https://v1-23.docs.kubernetes.io"
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.22.9"
|
||||
version = "v1.22"
|
||||
githubbranch = "v1.22.4"
|
||||
githubbranch = "v1.22.9"
|
||||
docsbranch = "release-1.22"
|
||||
url = "https://v1-22.docs.kubernetes.io"
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.21.7"
|
||||
fullversion = "v1.21.12"
|
||||
version = "v1.21"
|
||||
githubbranch = "v1.21.7"
|
||||
githubbranch = "v1.21.12"
|
||||
docsbranch = "release-1.21"
|
||||
url = "https://v1-21.docs.kubernetes.io"
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.20.13"
|
||||
fullversion = "v1.20.15"
|
||||
version = "v1.20"
|
||||
githubbranch = "v1.20.13"
|
||||
githubbranch = "v1.20.15"
|
||||
docsbranch = "release-1.20"
|
||||
url = "https://v1-20.docs.kubernetes.io"
|
||||
|
||||
[[params.versions]]
|
||||
fullversion = "v1.19.16"
|
||||
version = "v1.19"
|
||||
githubbranch = "v1.19.16"
|
||||
docsbranch = "release-1.19"
|
||||
url = "https://v1-19.docs.kubernetes.io"
|
||||
|
||||
# User interface configuration
|
||||
[params.ui]
|
||||
# Enable to show the side bar menu in its compact state.
|
||||
sidebar_menu_compact = false
|
||||
# Show expand/collapse icon for sidebar sections.
|
||||
sidebar_menu_foldable = true
|
||||
# https://github.com/gohugoio/hugo/issues/8918#issuecomment-903314696
|
||||
sidebar_cache_limit = 1
|
||||
# Set to true to disable breadcrumb navigation.
|
||||
|
@ -295,7 +297,7 @@ no = 'Sorry to hear that. Please <a href="https://github.com/USERNAME/REPOSITORY
|
|||
[languages.en]
|
||||
title = "Kubernetes"
|
||||
description = "Production-Grade Container Orchestration"
|
||||
languageName ="English"
|
||||
languageName = "English"
|
||||
# Weight used for sorting.
|
||||
weight = 1
|
||||
languagedirection = "ltr"
|
||||
|
@ -342,7 +344,7 @@ language_alternatives = ["en"]
|
|||
[languages.fr]
|
||||
title = "Kubernetes"
|
||||
description = "Solution professionnelle d’orchestration de conteneurs"
|
||||
languageName ="Français (French)"
|
||||
languageName = "Français (French)"
|
||||
languageNameLatinScript = "Français"
|
||||
weight = 5
|
||||
contentDir = "content/fr"
|
||||
|
@ -370,7 +372,7 @@ language_alternatives = ["en"]
|
|||
[languages.no]
|
||||
title = "Kubernetes"
|
||||
description = "Production-Grade Container Orchestration"
|
||||
languageName ="Norsk (Norwegian)"
|
||||
languageName = "Norsk (Norwegian)"
|
||||
languageNameLatinScript = "Norsk"
|
||||
weight = 7
|
||||
contentDir = "content/no"
|
||||
|
@ -384,7 +386,7 @@ language_alternatives = ["en"]
|
|||
[languages.de]
|
||||
title = "Kubernetes"
|
||||
description = "Produktionsreife Container-Orchestrierung"
|
||||
languageName ="Deutsch (German)"
|
||||
languageName = "Deutsch (German)"
|
||||
languageNameLatinScript = "Deutsch"
|
||||
weight = 8
|
||||
contentDir = "content/de"
|
||||
|
@ -398,7 +400,7 @@ language_alternatives = ["en"]
|
|||
[languages.es]
|
||||
title = "Kubernetes"
|
||||
description = "Orquestación de contenedores para producción"
|
||||
languageName ="Español (Spanish)"
|
||||
languageName = "Español (Spanish)"
|
||||
languageNameLatinScript = "Español"
|
||||
weight = 9
|
||||
contentDir = "content/es"
|
||||
|
@ -412,9 +414,10 @@ language_alternatives = ["en"]
|
|||
[languages.pt-br]
|
||||
title = "Kubernetes"
|
||||
description = "Orquestração de contêineres em nível de produção"
|
||||
languageName ="Português (Portuguese)"
|
||||
languageName = "Português (Portuguese)"
|
||||
languageNameLatinScript = "Português"
|
||||
weight = 9
|
||||
weight = 10
|
||||
|
||||
contentDir = "content/pt-br"
|
||||
languagedirection = "ltr"
|
||||
|
||||
|
@ -428,7 +431,7 @@ title = "Kubernetes"
|
|||
description = "Orkestrasi Kontainer dengan Skala Produksi"
|
||||
languageName ="Bahasa Indonesia"
|
||||
languageNameLatinScript = "Bahasa Indonesia"
|
||||
weight = 10
|
||||
weight = 11
|
||||
contentDir = "content/id"
|
||||
languagedirection = "ltr"
|
||||
|
||||
|
@ -442,7 +445,7 @@ title = "Kubernetes"
|
|||
description = "Production-Grade Container Orchestration"
|
||||
languageName = "हिन्दी (Hindi)"
|
||||
languageNameLatinScript = "Hindi"
|
||||
weight = 11
|
||||
weight = 12
|
||||
contentDir = "content/hi"
|
||||
languagedirection = "ltr"
|
||||
|
||||
|
@ -456,7 +459,7 @@ description = "Giải pháp điều phối container trong môi trường produc
|
|||
languageName = "Tiếng Việt (Vietnamese)"
|
||||
languageNameLatinScript = "Tiếng Việt"
|
||||
contentDir = "content/vi"
|
||||
weight = 12
|
||||
weight = 13
|
||||
languagedirection = "ltr"
|
||||
|
||||
[languages.ru]
|
||||
|
@ -464,7 +467,7 @@ title = "Kubernetes"
|
|||
description = "Первоклассная оркестрация контейнеров"
|
||||
languageName = "Русский (Russian)"
|
||||
languageNameLatinScript = "Russian"
|
||||
weight = 12
|
||||
weight = 14
|
||||
contentDir = "content/ru"
|
||||
languagedirection = "ltr"
|
||||
|
||||
|
@ -478,7 +481,7 @@ title = "Kubernetes"
|
|||
description = "Produkcyjny system zarządzania kontenerami"
|
||||
languageName = "Polski (Polish)"
|
||||
languageNameLatinScript = "Polski"
|
||||
weight = 13
|
||||
weight = 15
|
||||
contentDir = "content/pl"
|
||||
languagedirection = "ltr"
|
||||
|
||||
|
@ -492,7 +495,7 @@ title = "Kubernetes"
|
|||
description = "Довершена система оркестрації контейнерів"
|
||||
languageName = "Українська (Ukrainian)"
|
||||
languageNameLatinScript = "Ukrainian"
|
||||
weight = 14
|
||||
weight = 16
|
||||
contentDir = "content/uk"
|
||||
languagedirection = "ltr"
|
||||
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
---
|
||||
title: "Kubernets erweitern"
|
||||
title: "Kubernetes erweitern"
|
||||
weight: 110
|
||||
---
|
||||
|
|
|
@ -248,7 +248,7 @@ einige Einschränkungen:
|
|||
Eintrag zur Liste `metadata.finalizers` hinzugefügt werden.
|
||||
- Pod-Updates dürfen keine Felder ändern, die Ausnahmen sind
|
||||
`spec.containers[*].image`,
|
||||
`spec.initContainers[*].image`,` spec.activeDeadlineSeconds` oder
|
||||
`spec.initContainers[*].image`, `spec.activeDeadlineSeconds` oder
|
||||
`spec.tolerations`. Für `spec.tolerations` kannnst du nur neue Einträge
|
||||
hinzufügen.
|
||||
- Für `spec.activeDeadlineSeconds` sind nur zwei Änderungen erlaubt:
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
---
|
||||
content_type: concept
|
||||
title: Zur Kubernets-Dokumentation beitragen
|
||||
title: Zur Kubernetes-Dokumentation beitragen
|
||||
linktitle: Mitmachen
|
||||
main_menu: true
|
||||
weight: 80
|
||||
|
|
|
@ -76,6 +76,6 @@ Um eine Pull-Anfrage zu schließen, hinterlasse einen `/close`-Kommentar zu dem
|
|||
|
||||
{{< note >}}
|
||||
|
||||
Der [`fejta-bot`](https://github.com/fejta-bot) Bot markiert Themen nach 90 Tagen Inaktivität als veraltet. Nach weiteren 30 Tagen markiert er Issues als faul und schließt sie. PR-Beauftragte sollten Themen nach 14-30 Tagen Inaktivität schließen.
|
||||
Der [`k8s-ci-robot`](https://github.com/k8s-ci-robot) Bot markiert Themen nach 90 Tagen Inaktivität als veraltet. Nach weiteren 30 Tagen markiert er Issues als faul und schließt sie. PR-Beauftragte sollten Themen nach 14-30 Tagen Inaktivität schließen.
|
||||
|
||||
{{< /note >}}
|
||||
|
|
|
@ -322,7 +322,7 @@ Ausgabeformat | Beschreibung
|
|||
|
||||
### Kubectl Ausgabe Ausführlichkeit und Debugging
|
||||
|
||||
Die Ausführlichkeit von Kubectl wird mit den Flags `-v` oder `--v ` gesteuert, gefolgt von einer Ganzzahl, die die Protokollebene darstellt. Allgemeine Protokollierungskonventionen für Kubernetes und die zugehörigen Protokollebenen werden [hier](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md) beschrieben.
|
||||
Die Ausführlichkeit von Kubectl wird mit den Flags `-v` oder `--v` gesteuert, gefolgt von einer Ganzzahl, die die Protokollebene darstellt. Allgemeine Protokollierungskonventionen für Kubernetes und die zugehörigen Protokollebenen werden [hier](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md) beschrieben.
|
||||
|
||||
Ausführlichkeit | Beschreibung
|
||||
--------------| -----------
|
||||
|
|
|
@ -1,5 +1,6 @@
|
|||
---
|
||||
title: "Jobs ausführen"
|
||||
description: Führen Sie Jobs mit paralleler Verarbeitung aus.
|
||||
weight: 50
|
||||
---
|
||||
|
||||
|
|
|
@ -46,7 +46,7 @@ card:
|
|||
</div>
|
||||
|
||||
<div id="basics-modules" class="content__modules">
|
||||
<h2>Kubernets Grundlagen Module</h2>
|
||||
<h2>Kubernetes Grundlagen Module</h2>
|
||||
<div class="row">
|
||||
<div class="col-md-12">
|
||||
<div class="row">
|
||||
|
|
|
@ -1 +1 @@
|
|||
Sie benötigen entweder einen dynamischen PersistentVolume-Anbieter mit einer [Standard-Speicherklasse](/docs/concepts/storage/storage-classes/), oder Sie selbst stellen statische [PersistentVolumes](/docs/user-guide/persistent-volumes/#provisioning) bereit, um die [PersistentVolumeClaims](/docs/user-guide/persistent-volumes/#persistentvolumeclaims) zu erfüllen, die hier verwendet werden.
|
||||
Sie benötigen entweder einen dynamischen PersistentVolume-Anbieter mit einer [Standard-Speicherklasse](/docs/concepts/storage/storage-classes/), oder Sie selbst stellen statische [PersistentVolumes](/docs/concepts/storage/persistent-volumes/#provisioning) bereit, um die [PersistentVolumeClaims](/docs/concepts/storage/persistent-volumes/#persistentvolumeclaims) zu erfüllen, die hier verwendet werden.
|
|
@ -19,7 +19,7 @@ Security:
|
|||
- [Node authorizer](/docs/reference/access-authn-authz/node/) and admission control plugin are new additions that restrict kubelet’s access to secrets, pods and other objects based on its node.
|
||||
- [Encryption for Secrets](/docs/tasks/administer-cluster/encrypt-data/), and other resources in etcd, is now available as alpha.
|
||||
- [Kubelet TLS bootstrapping](/docs/admin/kubelet-tls-bootstrapping/) now supports client and server certificate rotation.
|
||||
- [Audit logs](/docs/tasks/debug-application-cluster/audit/) stored by the API server are now more customizable and extensible with support for event filtering and webhooks. They also provide richer data for system audit.
|
||||
- [Audit logs](/docs/tasks/debug/debug-cluster/audit/) stored by the API server are now more customizable and extensible with support for event filtering and webhooks. They also provide richer data for system audit.
|
||||
|
||||
Stateful workloads:
|
||||
|
||||
|
|
|
@ -117,7 +117,7 @@ To achieve the best possible isolation, each function call would have to happen
|
|||
By using Landlock, we could isolate function calls from each other within the same container, making a temporary file created by one function call inaccessible to the next function call, for example. Integration between Landlock and technologies like Kubernetes-based serverless frameworks would be a ripe area for further exploration.
|
||||
|
||||
## Auditing kubectl-exec with eBPF
|
||||
In Kubernetes 1.7 the [audit proposal](/docs/tasks/debug-application-cluster/audit/) started making its way in. It's currently pre-stable with plans to be stable in the 1.10 release. As the name implies, it allows administrators to log and audit events that take place in a Kubernetes cluster.
|
||||
In Kubernetes 1.7 the [audit proposal](/docs/tasks/debug/debug-cluster/audit/) started making its way in. It's currently pre-stable with plans to be stable in the 1.10 release. As the name implies, it allows administrators to log and audit events that take place in a Kubernetes cluster.
|
||||
|
||||
While these events log Kubernetes events, they don't currently provide the level of visibility that some may require. For example, while we can see that someone has used `kubectl exec` to enter a container, we are not able to see what commands were executed in that session. With eBPF one can attach a BPF program that would record any commands executed in the `kubectl exec` session and pass those commands to a user-space program that logs those events. We could then play that session back and know the exact sequence of events that took place.
|
||||
## Learn more about eBPF
|
||||
|
|
|
@ -6,6 +6,8 @@ date: 2018-07-11
|
|||
|
||||
**Author**: Michael Taufen (Google)
|
||||
|
||||
**Editor’s note: The feature has been removed in the version 1.24 after deprecation in 1.22.**
|
||||
|
||||
**Editor’s note: this post is part of a [series of in-depth articles](https://kubernetes.io/blog/2018/06/27/kubernetes-1.11-release-announcement/) on what’s new in Kubernetes 1.11**
|
||||
|
||||
## Why Dynamic Kubelet Configuration?
|
||||
|
|
|
@ -66,7 +66,7 @@ There are plenty of [good examples](https://docs.bitnami.com/kubernetes/how-to/c
|
|||
|
||||
Incorrect or excessively permissive RBAC policies are a security threat in case of a compromised pod. Maintaining least privilege, and continuously reviewing and improving RBAC rules, should be considered part of the "technical debt hygiene" that teams build into their development lifecycle.
|
||||
|
||||
[Audit Logging](/docs/tasks/debug-application-cluster/audit/) (beta in 1.10) provides customisable API logging at the payload (e.g. request and response), and also metadata levels. Log levels can be tuned to your organisation's security policy - [GKE](https://cloud.google.com/kubernetes-engine/docs/how-to/audit-logging#audit_policy) provides sane defaults to get you started.
|
||||
[Audit Logging](/docs/tasks/debug/debug-cluster/audit/) (beta in 1.10) provides customisable API logging at the payload (e.g. request and response), and also metadata levels. Log levels can be tuned to your organisation's security policy - [GKE](https://cloud.google.com/kubernetes-engine/docs/how-to/audit-logging#audit_policy) provides sane defaults to get you started.
|
||||
|
||||
For read requests such as get, list, and watch, only the request object is saved in the audit logs; the response object is not. For requests involving sensitive data such as Secret and ConfigMap, only the metadata is exported. For all other requests, both request and response objects are saved in audit logs.
|
||||
|
||||
|
|
|
@ -174,7 +174,7 @@ Cluster-distributed stateful services (e.g., Cassandra) can benefit from splitti
|
|||
|
||||
## Other considerations
|
||||
|
||||
[Logs](/docs/concepts/cluster-administration/logging/) and [metrics](/docs/tasks/debug-application-cluster/resource-usage-monitoring/) (if collected and persistently retained) are valuable to diagnose outages, but given the variety of technologies available it will not be addressed in this blog. If Internet connectivity is available, it may be desirable to retain logs and metrics externally at a central location.
|
||||
[Logs](/docs/concepts/cluster-administration/logging/) and [metrics](/docs/tasks/debug/debug-cluster/resource-usage-monitoring/) (if collected and persistently retained) are valuable to diagnose outages, but given the variety of technologies available it will not be addressed in this blog. If Internet connectivity is available, it may be desirable to retain logs and metrics externally at a central location.
|
||||
|
||||
Your production deployment should utilize an automated installation, configuration and update tool (e.g., [Ansible](https://github.com/kubernetes-incubator/kubespray), [BOSH](https://github.com/cloudfoundry-incubator/kubo-deployment), [Chef](https://github.com/chef-cookbooks/kubernetes), [Juju](/docs/getting-started-guides/ubuntu/installation/), [kubeadm](/docs/reference/setup-tools/kubeadm/), [Puppet](https://forge.puppet.com/puppetlabs/kubernetes), etc.). A manual process will have repeatability issues, be labor intensive, error prone, and difficult to scale. [Certified distributions](https://www.cncf.io/certification/software-conformance/#logos) are likely to include a facility for retaining configuration settings across updates, but if you implement your own install and config toolchain, then retention, backup and recovery of the configuration artifacts is essential. Consider keeping your deployment components and settings under a version control system such as Git.
|
||||
|
||||
|
|
|
@ -360,7 +360,7 @@ So let's fix the issue by installing the missing package:
|
|||
sudo apt install -y conntrack
|
||||
```
|
||||
|
||||
![minikube-install-conntrack](/images/blog/2020-05-21-wsl2-dockerdesktop-k8s/wsl2-minikube-install conntrack.png)
|
||||
![minikube-install-conntrack](/images/blog/2020-05-21-wsl2-dockerdesktop-k8s/wsl2-minikube-install-conntrack.png)
|
||||
|
||||
Let's try to launch it again:
|
||||
|
||||
|
|
|
@ -177,7 +177,7 @@ group_right() apiserver_request_total
|
|||
|
||||
Metrics are a fast way to check whether deprecated APIs are being used, and at what rate,
|
||||
but they don't include enough information to identify particular clients or API objects.
|
||||
Starting in Kubernetes v1.19, [audit events](/docs/tasks/debug-application-cluster/audit/)
|
||||
Starting in Kubernetes v1.19, [audit events](/docs/tasks/debug/debug-cluster/audit/)
|
||||
for requests to deprecated APIs include an audit annotation of `"k8s.io/deprecated":"true"`.
|
||||
Administrators can use those audit events to identify specific clients or objects that need to be updated.
|
||||
|
||||
|
|
|
@ -20,7 +20,7 @@ The paper attempts to _not_ focus on any specific [cloud native project](https:/
|
|||
When using Kubernetes as a workload orchestrator, some of the security controls this version of the whitepaper recommends are:
|
||||
* [Pod Security Policies](/docs/concepts/security/pod-security-policy/): Implement a single source of truth for “least privilege” workloads across the entire cluster
|
||||
* [Resource requests and limits](/docs/concepts/configuration/manage-resources-containers/#requests-and-limits): Apply requests (soft constraint) and limits (hard constraint) for shared resources such as memory and CPU
|
||||
* [Audit log analysis](/docs/tasks/debug-application-cluster/audit/): Enable Kubernetes API auditing and filtering for security relevant events
|
||||
* [Audit log analysis](/docs/tasks/debug/debug-cluster/audit/): Enable Kubernetes API auditing and filtering for security relevant events
|
||||
* [Control plane authentication and certificate root of trust](/docs/concepts/architecture/control-plane-node-communication/): Enable mutual TLS authentication with a trusted CA for communication within the cluster
|
||||
* [Secrets management](/docs/concepts/configuration/secret/): Integrate with a built-in or external secrets store
|
||||
|
||||
|
|
|
@ -14,7 +14,7 @@ on the deprecation of Docker as a container runtime for Kubernetes kubelets, and
|
|||
what that means, check out the blog post
|
||||
[Don't Panic: Kubernetes and Docker](/blog/2020/12/02/dont-panic-kubernetes-and-docker/).
|
||||
|
||||
Also, you can read [check whether Dockershim deprecation affects you](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-deprecation-affects-you/) to check whether it does.
|
||||
Also, you can read [check whether Dockershim removal affects you](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-removal-affects-you/) to check whether it does.
|
||||
|
||||
### Why is dockershim being deprecated?
|
||||
|
||||
|
@ -155,7 +155,7 @@ runtime where possible.
|
|||
|
||||
Another thing to look out for is anything expecting to run for system maintenance
|
||||
or nested inside a container when building images will no longer work. For the
|
||||
former, you can use the [`crictl`][cr] tool as a drop-in replacement (see [mapping from docker cli to crictl](https://kubernetes.io/docs/tasks/debug-application-cluster/crictl/#mapping-from-docker-cli-to-crictl)) and for the
|
||||
former, you can use the [`crictl`][cr] tool as a drop-in replacement (see [mapping from dockercli to crictl](/docs/reference/tools/map-crictl-dockercli/)) and for the
|
||||
latter you can use newer container build options like [img], [buildah],
|
||||
[kaniko], or [buildkit-cli-for-kubectl] that don’t require Docker.
|
||||
|
||||
|
|
|
@ -3,13 +3,17 @@ layout: blog
|
|||
title: "Don't Panic: Kubernetes and Docker"
|
||||
date: 2020-12-02
|
||||
slug: dont-panic-kubernetes-and-docker
|
||||
evergreen: true
|
||||
---
|
||||
|
||||
**Update:** _Kubernetes support for Docker via `dockershim` is now removed.
|
||||
For more information, read the [removal FAQ](/dockershim).
|
||||
You can also discuss the deprecation via a dedicated [GitHub issue](https://github.com/kubernetes/kubernetes/issues/106917)._
|
||||
|
||||
---
|
||||
|
||||
**Authors:** Jorge Castro, Duffie Cooley, Kat Cosgrove, Justin Garrison, Noah Kantrowitz, Bob Killen, Rey Lejano, Dan “POP” Papandrea, Jeffrey Sica, Davanum “Dims” Srinivas
|
||||
|
||||
_Update: Kubernetes support for Docker via `dockershim` is now deprecated.
|
||||
For more information, read the [deprecation notice](/blog/2020/12/08/kubernetes-1-20-release-announcement/#dockershim-deprecation).
|
||||
You can also discuss the deprecation via a dedicated [GitHub issue](https://github.com/kubernetes/kubernetes/issues/106917)._
|
||||
|
||||
Kubernetes is [deprecating
|
||||
Docker](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.20.md#deprecation)
|
||||
|
@ -28,7 +32,7 @@ shouldn’t, use Docker as a development tool anymore. Docker is still a useful
|
|||
tool for building containers, and the images that result from running `docker
|
||||
build` can still run in your Kubernetes cluster.
|
||||
|
||||
If you’re using a managed Kubernetes service like GKE, EKS, or AKS (which [defaults to containerd](https://github.com/Azure/AKS/releases/tag/2020-11-16)) you will need to
|
||||
If you’re using a managed Kubernetes service like AKS, EkS or GKE, you will need to
|
||||
make sure your worker nodes are using a supported container runtime before
|
||||
Docker support is removed in a future version of Kubernetes. If you have node
|
||||
customizations you may need to update them based on your environment and runtime
|
||||
|
@ -37,8 +41,8 @@ testing and planning.
|
|||
|
||||
If you’re rolling your own clusters, you will also need to make changes to avoid
|
||||
your clusters breaking. At v1.20, you will get a deprecation warning for Docker.
|
||||
When Docker runtime support is removed in a future release (currently planned
|
||||
for the 1.22 release in late 2021) of Kubernetes it will no longer be supported
|
||||
When Docker runtime support is removed in a future release (<del>currently planned
|
||||
for the 1.22 release in late 2021</del>) of Kubernetes it will no longer be supported
|
||||
and you will need to switch to one of the other compliant container runtimes,
|
||||
like containerd or CRI-O. Just make sure that the runtime you choose supports
|
||||
the docker daemon configurations you currently use (e.g. logging).
|
||||
|
|
|
@ -32,7 +32,7 @@ The `kubectl alpha debug` features graduates to beta in 1.20, becoming `kubectl
|
|||
|
||||
Note that as a new built-in command, `kubectl debug` takes priority over any kubectl plugin named “debug”. You must rename the affected plugin.
|
||||
|
||||
Invocations using `kubectl alpha debug` are now deprecated and will be removed in a subsequent release. Update your scripts to use `kubectl debug`. For more information about `kubectl debug`, see [Debugging Running Pods](https://kubernetes.io/docs/tasks/debug-application-cluster/debug-running-pod/).
|
||||
Invocations using `kubectl alpha debug` are now deprecated and will be removed in a subsequent release. Update your scripts to use `kubectl debug`. For more information about `kubectl debug`, see [Debugging Running Pods](https://kubernetes.io/docs/tasks/debug/debug-application/debug-running-pod/).
|
||||
|
||||
### Beta: API Priority and Fairness
|
||||
|
||||
|
|
|
@ -62,7 +62,7 @@ spec:
|
|||
Note that completion mode is an alpha feature in the 1.21 release. To be able to
|
||||
use it in your cluster, make sure to enable the `IndexedJob` [feature
|
||||
gate](/docs/reference/command-line-tools-reference/feature-gates/) on the
|
||||
[API server](docs/reference/command-line-tools-reference/kube-apiserver/) and
|
||||
[API server](/docs/reference/command-line-tools-reference/kube-apiserver/) and
|
||||
the [controller manager](/docs/reference/command-line-tools-reference/kube-controller-manager/).
|
||||
|
||||
When you run the example, you will see that each of the three created Pods gets a
|
||||
|
|
|
@ -108,7 +108,7 @@ metadata:
|
|||
uid: 93a37fed-23e3-45e8-b6ee-b2521db81638
|
||||
```
|
||||
|
||||
In short, what’s happened is that the object was updated, not deleted. That’s because Kubernetes saw that the object contained finalizers and put it into a read-only state. The deletion timestamp signals that the object can only be read, with the exception of removing the finalizer key updates. In other words, the deletion will not be complete until we edit the object and remove the finalizer.
|
||||
In short, what’s happened is that the object was updated, not deleted. That’s because Kubernetes saw that the object contained finalizers and blocked removal of the object from etcd. The deletion timestamp signals that deletion was requested, but the deletion will not be complete until we edit the object and remove the finalizer.
|
||||
|
||||
Here's a demonstration of using the `patch` command to remove finalizers. If we want to delete an object, we can simply patch it on the command line to remove the finalizers. In this way, the deletion that was running in the background will complete and the object will be deleted. When we attempt to `get` that configmap, it will be gone.
|
||||
|
||||
|
|
|
@ -255,7 +255,7 @@ The minimum required versions are:
|
|||
|
||||
## What’s next?
|
||||
|
||||
As part of the beta graduation for this feature, SIG Storage plans to update the Kubenetes scheduler to support pod preemption in relation to ReadWriteOncePod storage.
|
||||
As part of the beta graduation for this feature, SIG Storage plans to update the Kubernetes scheduler to support pod preemption in relation to ReadWriteOncePod storage.
|
||||
This means if two pods request a PersistentVolumeClaim with ReadWriteOncePod, the pod with highest priority will gain access to the PersistentVolumeClaim and any pod with lower priority will be preempted from the node and be unable to access the PersistentVolumeClaim.
|
||||
|
||||
## How can I learn more?
|
||||
|
|
|
@ -317,7 +317,7 @@ RequestResponse's including metadata and request / response bodies. While helpfu
|
|||
|
||||
Each organization needs to evaluate their
|
||||
own threat model and build an audit policy that complements or helps troubleshooting incident response. Think
|
||||
about how someone would attack your organization and what audit trail could identify it. Review more advanced options for tuning audit logs in the official [audit logging documentation](/docs/tasks/debug-application-cluster/audit/#audit-policy).
|
||||
about how someone would attack your organization and what audit trail could identify it. Review more advanced options for tuning audit logs in the official [audit logging documentation](/docs/tasks/debug/debug-cluster/audit/#audit-policy).
|
||||
It's crucial to tune your audit logs to only include events that meet your threat model. A minimal audit policy that logs everything at `metadata` level can also be a good starting point.
|
||||
|
||||
Audit logging configurations can also be tested with
|
||||
|
|
|
@ -8,8 +8,8 @@ slug: kubernetes-1-23-statefulset-pvc-auto-deletion
|
|||
**Author:** Matthew Cary (Google)
|
||||
|
||||
Kubernetes v1.23 introduced a new, alpha-level policy for
|
||||
[StatefulSets](docs/concepts/workloads/controllers/statefulset/) that controls the lifetime of
|
||||
[PersistentVolumeClaims](docs/concepts/storage/persistent-volumes/) (PVCs) generated from the
|
||||
[StatefulSets](/docs/concepts/workloads/controllers/statefulset/) that controls the lifetime of
|
||||
[PersistentVolumeClaims](/docs/concepts/storage/persistent-volumes/) (PVCs) generated from the
|
||||
StatefulSet spec template for cases when they should be deleted automatically when the StatefulSet
|
||||
is deleted or pods in the StatefulSet are scaled down.
|
||||
|
||||
|
@ -82,7 +82,7 @@ This policy forms a matrix with four cases. I’ll walk through and give an exam
|
|||
new replicas will automatically use them.
|
||||
|
||||
Visit the
|
||||
[documentation](docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-policies) to
|
||||
[documentation](/docs/concepts/workloads/controllers/statefulset/#persistentvolumeclaim-policies) to
|
||||
see all the details.
|
||||
|
||||
## What’s next?
|
||||
|
|
|
@ -12,7 +12,7 @@ to reaffirm our community values by supporting open source container runtimes,
|
|||
enabling a smaller kubelet, and increasing engineering velocity for teams using
|
||||
Kubernetes. If you [use Docker Engine as a container runtime](/docs/tasks/administer-cluster/migrating-from-dockershim/find-out-runtime-you-use/)
|
||||
for your Kubernetes cluster, get ready to migrate in 1.24! To check if you're
|
||||
affected, refer to [Check whether dockershim deprecation affects you](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-deprecation-affects-you/).
|
||||
affected, refer to [Check whether dockershim removal affects you](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-removal-affects-you/).
|
||||
|
||||
## Why we’re moving away from dockershim
|
||||
|
||||
|
|
|
@ -7,31 +7,37 @@ slug: dockershim-faq
|
|||
aliases: [ '/dockershim' ]
|
||||
---
|
||||
|
||||
**This is an update to the original [Dockershim Deprecation FAQ](/blog/2020/12/02/dockershim-faq/) article,
|
||||
published in late 2020.**
|
||||
**This supersedes the original
|
||||
[Dockershim Deprecation FAQ](/blog/2020/12/02/dockershim-faq/) article,
|
||||
published in late 2020. The article includes updates from the v1.24
|
||||
release of Kubernetes.**
|
||||
|
||||
---
|
||||
|
||||
This document goes over some frequently asked questions regarding the
|
||||
deprecation and removal of _dockershim_, that was
|
||||
removal of _dockershim_ from Kubernetes. The removal was originally
|
||||
[announced](/blog/2020/12/08/kubernetes-1-20-release-announcement/)
|
||||
as a part of the Kubernetes v1.20 release. For more detail
|
||||
on what that means, check out the blog post
|
||||
as a part of the Kubernetes v1.20 release. The Kubernetes
|
||||
[v1.24 release](/releases/#release-v1-24) actually removed the dockershim
|
||||
from Kubernetes.
|
||||
|
||||
For more on what that means, check out the blog post
|
||||
[Don't Panic: Kubernetes and Docker](/blog/2020/12/02/dont-panic-kubernetes-and-docker/).
|
||||
|
||||
Also, you can read [check whether dockershim removal affects you](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-deprecation-affects-you/)
|
||||
to determine how much impact the removal of dockershim would have for you
|
||||
or for your organization.
|
||||
To determine the impact that the removal of dockershim would have for you or your organization,
|
||||
you can read [Check whether dockershim removal affects you](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-removal-affects-you/).
|
||||
|
||||
As the Kubernetes 1.24 release has become imminent, we've been working hard to try to make this a smooth transition.
|
||||
In the months and days leading up to the Kubernetes 1.24 release, Kubernetes contributors worked hard to try to make this a smooth transition.
|
||||
|
||||
- We've written a blog post detailing our [commitment and next steps](/blog/2022/01/07/kubernetes-is-moving-on-from-dockershim/).
|
||||
- We believe there are no major blockers to migration to [other container runtimes](/docs/setup/production-environment/container-runtimes/#container-runtimes).
|
||||
- There is also a [Migrating from dockershim](/docs/tasks/administer-cluster/migrating-from-dockershim/) guide available.
|
||||
- We've also created a page to list
|
||||
- A blog post detailing our [commitment and next steps](/blog/2022/01/07/kubernetes-is-moving-on-from-dockershim/).
|
||||
- Checking if there were major blockers to migration to [other container runtimes](/docs/setup/production-environment/container-runtimes/#container-runtimes).
|
||||
- Adding a [migrating from dockershim](/docs/tasks/administer-cluster/migrating-from-dockershim/) guide.
|
||||
- Creating a list of
|
||||
[articles on dockershim removal and on using CRI-compatible runtimes](/docs/reference/node/topics-on-dockershim-and-cri-compatible-runtimes/).
|
||||
That list includes some of the already mentioned docs, and also covers selected external sources
|
||||
(including vendor guides).
|
||||
|
||||
### Why is the dockershim being removed from Kubernetes?
|
||||
### Why was the dockershim removed from Kubernetes?
|
||||
|
||||
Early versions of Kubernetes only worked with a specific container runtime:
|
||||
Docker Engine. Later, Kubernetes added support for working with other container runtimes.
|
||||
|
@ -49,36 +55,18 @@ In fact, maintaining dockershim had become a heavy burden on the Kubernetes main
|
|||
|
||||
Additionally, features that were largely incompatible with the dockershim, such
|
||||
as cgroups v2 and user namespaces are being implemented in these newer CRI
|
||||
runtimes. Removing support for the dockershim will allow further development in
|
||||
those areas.
|
||||
runtimes. Removing the dockershim from Kubernetes allows further development in those areas.
|
||||
|
||||
[drkep]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2221-remove-dockershim
|
||||
|
||||
### Can I still use Docker Engine in Kubernetes 1.23?
|
||||
### Are Docker and containers the same thing?
|
||||
|
||||
Yes, the only thing changed in 1.20 is a single warning log printed at [kubelet]
|
||||
startup if using Docker Engine as the runtime. You'll see this warning in all versions up to 1.23. The dockershim removal occurs in Kubernetes 1.24.
|
||||
|
||||
[kubelet]: /docs/reference/command-line-tools-reference/kubelet/
|
||||
|
||||
### When will dockershim be removed?
|
||||
|
||||
Given the impact of this change, we are using an extended deprecation timeline.
|
||||
Removal of dockershim is scheduled for Kubernetes v1.24, see [Dockershim Removal Kubernetes Enhancement Proposal][drkep].
|
||||
The Kubernetes project will be working closely with vendors and other ecosystem groups to ensure
|
||||
a smooth transition and will evaluate things as the situation evolves.
|
||||
|
||||
### Can I still use Docker Engine as my container runtime?
|
||||
|
||||
First off, if you use Docker on your own PC to develop or test containers: nothing changes.
|
||||
You can still use Docker locally no matter what container runtime(s) you use for your
|
||||
Kubernetes clusters. Containers make this kind of interoperability possible.
|
||||
|
||||
Mirantis and Docker have [committed][mirantis] to maintaining a replacement adapter for
|
||||
Docker Engine, and to maintain that adapter even after the in-tree dockershim is removed
|
||||
from Kubernetes. The replacement adapter is named [`cri-dockerd`](https://github.com/Mirantis/cri-dockerd).
|
||||
|
||||
[mirantis]: https://www.mirantis.com/blog/mirantis-to-take-over-support-of-kubernetes-dockershim-2/
|
||||
Docker popularized the Linux containers pattern and has been instrumental in
|
||||
developing the underlying technology, however containers in Linux have existed
|
||||
for a long time. The container ecosystem has grown to be much broader than just
|
||||
Docker. Standards like OCI and CRI have helped many tools grow and thrive in our
|
||||
ecosystem, some replacing aspects of Docker while others enhance existing
|
||||
functionality.
|
||||
|
||||
### Will my existing container images still work?
|
||||
|
||||
|
@ -90,14 +78,41 @@ All your existing images will still work exactly the same.
|
|||
Yes. All CRI runtimes support the same pull secrets configuration used in
|
||||
Kubernetes, either via the PodSpec or ServiceAccount.
|
||||
|
||||
### Are Docker and containers the same thing?
|
||||
### Can I still use Docker Engine in Kubernetes 1.23?
|
||||
|
||||
Docker popularized the Linux containers pattern and has been instrumental in
|
||||
developing the underlying technology, however containers in Linux have existed
|
||||
for a long time. The container ecosystem has grown to be much broader than just
|
||||
Docker. Standards like OCI and CRI have helped many tools grow and thrive in our
|
||||
ecosystem, some replacing aspects of Docker while others enhance existing
|
||||
functionality.
|
||||
Yes, the only thing changed in 1.20 is a single warning log printed at [kubelet]
|
||||
startup if using Docker Engine as the runtime. You'll see this warning in all versions up to 1.23. The dockershim removal occurred
|
||||
in Kubernetes 1.24.
|
||||
|
||||
If you're running Kubernetes v1.24 or later, see [Can I still use Docker Engine as my container runtime?](#can-i-still-use-docker-engine-as-my-container-runtime).
|
||||
(Remember, you can switch away from the dockershim if you're using any supported Kubernetes release; from release v1.24, you
|
||||
**must** switch as Kubernetes no longer incluides the dockershim).
|
||||
|
||||
[kubelet]: /docs/reference/command-line-tools-reference/kubelet/
|
||||
|
||||
### Which CRI implementation should I use?
|
||||
|
||||
That’s a complex question and it depends on a lot of factors. If Docker Engine is
|
||||
working for you, moving to containerd should be a relatively easy swap and
|
||||
will have strictly better performance and less overhead. However, we encourage you
|
||||
to explore all the options from the [CNCF landscape] in case another would be an
|
||||
even better fit for your environment.
|
||||
|
||||
[CNCF landscape]: https://landscape.cncf.io/card-mode?category=container-runtime&grouping=category
|
||||
|
||||
#### Can I still use Docker Engine as my container runtime?
|
||||
|
||||
First off, if you use Docker on your own PC to develop or test containers: nothing changes.
|
||||
You can still use Docker locally no matter what container runtime(s) you use for your
|
||||
Kubernetes clusters. Containers make this kind of interoperability possible.
|
||||
|
||||
Mirantis and Docker have [committed][mirantis] to maintaining a replacement adapter for
|
||||
Docker Engine, and to maintain that adapter even after the in-tree dockershim is removed
|
||||
from Kubernetes. The replacement adapter is named [`cri-dockerd`](https://github.com/Mirantis/cri-dockerd).
|
||||
|
||||
You can install `cri-dockerd` and use it to connect the kubelet to Docker Engine. Read [Migrate Docker Engine nodes from dockershim to cri-dockerd](/docs/tasks/administer-cluster/migrating-from-dockershim/migrate-dockershim-dockerd/) to learn more.
|
||||
|
||||
[mirantis]: https://www.mirantis.com/blog/mirantis-to-take-over-support-of-kubernetes-dockershim-2/
|
||||
|
||||
### Are there examples of folks using other runtimes in production today?
|
||||
|
||||
|
@ -135,16 +150,6 @@ provide an end-to-end standard for managing containers.
|
|||
[runc]: https://github.com/opencontainers/runc
|
||||
[containerd]: https://containerd.io/
|
||||
|
||||
### Which CRI implementation should I use?
|
||||
|
||||
That’s a complex question and it depends on a lot of factors. If Docker is
|
||||
working for you, moving to containerd should be a relatively easy swap and
|
||||
will have strictly better performance and less overhead. However, we encourage you
|
||||
to explore all the options from the [CNCF landscape] in case another would be an
|
||||
even better fit for your environment.
|
||||
|
||||
[CNCF landscape]: https://landscape.cncf.io/card-mode?category=container-runtime&grouping=category
|
||||
|
||||
### What should I look out for when changing CRI implementations?
|
||||
|
||||
While the underlying containerization code is the same between Docker and most
|
||||
|
@ -153,8 +158,8 @@ common things to consider when migrating are:
|
|||
|
||||
- Logging configuration
|
||||
- Runtime resource limitations
|
||||
- Node provisioning scripts that call docker or use docker via it's control socket
|
||||
- Kubectl plugins that require docker CLI or the control socket
|
||||
- Node provisioning scripts that call docker or use Docker Engine via its control socket
|
||||
- Plugins for `kubectl` that require the `docker` CLI or the Docker Engine control socket
|
||||
- Tools from the Kubernetes project that require direct access to Docker Engine
|
||||
(for example: the deprecated `kube-imagepuller` tool)
|
||||
- Configuration of functionality like `registry-mirrors` and insecure registries
|
||||
|
@ -163,14 +168,15 @@ common things to consider when migrating are:
|
|||
- GPUs or special hardware and how they integrate with your runtime and Kubernetes
|
||||
|
||||
If you use Kubernetes resource requests/limits or file-based log collection
|
||||
DaemonSets then they will continue to work the same, but if you’ve customized
|
||||
DaemonSets then they will continue to work the same, but if you've customized
|
||||
your `dockerd` configuration, you’ll need to adapt that for your new container
|
||||
runtime where possible.
|
||||
|
||||
Another thing to look out for is anything expecting to run for system maintenance
|
||||
or nested inside a container when building images will no longer work. For the
|
||||
former, you can use the [`crictl`][cr] tool as a drop-in replacement (see [mapping from docker cli to crictl](https://kubernetes.io/docs/tasks/debug-application-cluster/crictl/#mapping-from-docker-cli-to-crictl)) and for the
|
||||
latter you can use newer container build options like [img], [buildah],
|
||||
former, you can use the [`crictl`][cr] tool as a drop-in replacement (see
|
||||
[mapping from docker cli to crictl](https://kubernetes.io/docs/tasks/debug/debug-cluster/crictl/#mapping-from-docker-cli-to-crictl))
|
||||
and for the latter you can use newer container build options like [img], [buildah],
|
||||
[kaniko], or [buildkit-cli-for-kubectl] that don’t require Docker.
|
||||
|
||||
[cr]: https://github.com/kubernetes-sigs/cri-tools
|
||||
|
@ -204,7 +210,7 @@ discussion of the changes.
|
|||
|
||||
[dep]: https://dev.to/inductor/wait-docker-is-deprecated-in-kubernetes-now-what-do-i-do-e4m
|
||||
|
||||
### Is there any tooling that can help me find dockershim in use
|
||||
### Is there any tooling that can help me find dockershim in use?
|
||||
|
||||
Yes! The [Detector for Docker Socket (DDS)][dds] is a kubectl plugin that you can
|
||||
install and then use to check your cluster. DDS can detect if active Kubernetes workloads
|
||||
|
|
|
@ -12,7 +12,7 @@ Way back in December of 2020, Kubernetes announced the [deprecation of Dockershi
|
|||
|
||||
## First, does this even affect you?
|
||||
|
||||
If you are rolling your own cluster or are otherwise unsure whether or not this removal affects you, stay on the safe side and [check to see if you have any dependencies on Docker Engine](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-deprecation-affects-you/). Please note that using Docker Desktop to build your application containers is not a Docker dependency for your cluster. Container images created by Docker are compliant with the [Open Container Initiative (OCI)](https://opencontainers.org/), a Linux Foundation governance structure that defines industry standards around container formats and runtimes. They will work just fine on any container runtime supported by Kubernetes.
|
||||
If you are rolling your own cluster or are otherwise unsure whether or not this removal affects you, stay on the safe side and [check to see if you have any dependencies on Docker Engine](/docs/tasks/administer-cluster/migrating-from-dockershim/check-if-dockershim-removal-affects-you/). Please note that using Docker Desktop to build your application containers is not a Docker dependency for your cluster. Container images created by Docker are compliant with the [Open Container Initiative (OCI)](https://opencontainers.org/), a Linux Foundation governance structure that defines industry standards around container formats and runtimes. They will work just fine on any container runtime supported by Kubernetes.
|
||||
|
||||
If you are using a managed Kubernetes service from a cloud provider, and you haven’t explicitly changed the container runtime, there may be nothing else for you to do. Amazon EKS, Azure AKS, and Google GKE all default to containerd now, though you should make sure they do not need updating if you have any node customizations. To check the runtime of your nodes, follow [Find Out What Container Runtime is Used on a Node](/docs/tasks/administer-cluster/migrating-from-dockershim/find-out-runtime-you-use/).
|
||||
|
||||
|
|
|
@ -0,0 +1,155 @@
|
|||
---
|
||||
layout: blog
|
||||
title: 'Increasing the security bar in Ingress-NGINX v1.2.0'
|
||||
date: 2022-04-28
|
||||
slug: ingress-nginx-1-2-0
|
||||
---
|
||||
|
||||
**Authors:** Ricardo Katz (VMware), James Strong (Chainguard)
|
||||
|
||||
The [Ingress](/docs/concepts/services-networking/ingress/) may be one of the most targeted components
|
||||
of Kubernetes. An Ingress typically defines an HTTP reverse proxy, exposed to the Internet, containing
|
||||
multiple websites, and with some privileged access to Kubernetes API (such as to read Secrets relating to
|
||||
TLS certificates and their private keys).
|
||||
|
||||
While it is a risky component in your architecture, it is still the most popular way to properly expose your services.
|
||||
|
||||
Ingress-NGINX has been part of security assessments that figured out we have a big problem: we don't
|
||||
do all proper sanitization before turning the configuration into an `nginx.conf` file, which may lead to information
|
||||
disclosure risks.
|
||||
|
||||
While we understand this risk and the real need to fix this, it's not an easy process to do, so we took another approach to reduce (but not remove!) this risk in the current (v1.2.0) release.
|
||||
|
||||
## Meet Ingress NGINX v1.2.0 and the chrooted NGINX process
|
||||
|
||||
One of the main challenges is that Ingress-NGINX runs the web proxy server (NGINX) alongside the Ingress
|
||||
controller (the component that has access to Kubernetes API that and that creates the `nginx.conf` file).
|
||||
|
||||
So, NGINX does have the same access to the filesystem of the controller (and Kubernetes service account token, and other configurations from the container). While splitting those components is our end goal, the project needed a fast response; that lead us to the idea of using `chroot()`.
|
||||
|
||||
Let's take a look into what an Ingress-NGINX container looked like before this change:
|
||||
|
||||
![Ingress NGINX pre chroot](ingress-pre-chroot.png)
|
||||
|
||||
As we can see, the same container (not the Pod, the container!) that provides HTTP Proxy is the one that watches Ingress objects and writes the Container Volume
|
||||
|
||||
Now, meet the new architecture:
|
||||
|
||||
![Ingress NGINX post chroot](ingress-post-chroot.png)
|
||||
|
||||
What does all of this mean? A basic summary is: that we are isolating the NGINX service as a container inside the
|
||||
controller container.
|
||||
|
||||
While this is not strictly true, to understand what was done here, it's good to understand how
|
||||
Linux containers (and underlying mechanisms such as kernel namespaces) work.
|
||||
You can read about cgroups in the Kubernetes glossary: [`cgroup`](https://kubernetes.io/docs/reference/glossary/?fundamental=true#term-cgroup) and learn more about cgroups interact with namespaces in the NGINX project article
|
||||
[What Are Namespaces and cgroups, and How Do They Work?](https://www.nginx.com/blog/what-are-namespaces-cgroups-how-do-they-work/).
|
||||
(As you read that, bear in mind that Linux kernel namespaces are a different thing from
|
||||
[Kubernetes namespaces](/docs/concepts/overview/working-with-objects/namespaces/)).
|
||||
|
||||
## Skip the talk, what do I need to use this new approach?
|
||||
|
||||
While this increases the security, we made this feature an opt-in in this release so you can have
|
||||
time to make the right adjustments in your environment(s). This new feature is only available from
|
||||
release v1.2.0 of the Ingress-NGINX controller.
|
||||
|
||||
There are two required changes in your deployments to use this feature:
|
||||
* Append the suffix "-chroot" to the container image name. For example: `gcr.io/k8s-staging-ingress-nginx/controller-chroot:v1.2.0`
|
||||
* In your Pod template for the Ingress controller, find where you add the capability `NET_BIND_SERVICE` and add the capability `SYS_CHROOT`. After you edit the manifest, you'll see a snippet like:
|
||||
|
||||
```yaml
|
||||
capabilities:
|
||||
drop:
|
||||
- ALL
|
||||
add:
|
||||
- NET_BIND_SERVICE
|
||||
- SYS_CHROOT
|
||||
```
|
||||
|
||||
If you deploy the controller using the official Helm chart then change the following setting in
|
||||
`values.yaml`:
|
||||
|
||||
```yaml
|
||||
controller:
|
||||
image:
|
||||
chroot: true
|
||||
```
|
||||
|
||||
Ingress controllers are normally set up cluster-wide (the IngressClass API is cluster scoped). If you manage the
|
||||
Ingress-NGINX controller but you're not the overall cluster operator, then check with your cluster admin about
|
||||
whether you can use the `SYS_CHROOT` capability, **before** you enable it in your deployment.
|
||||
|
||||
## OK, but how does this increase the security of my Ingress controller?
|
||||
|
||||
Take the following configuration snippet and imagine, for some reason it was added to your `nginx.conf`:
|
||||
```
|
||||
location /randomthing/ {
|
||||
alias /;
|
||||
autoindex on;
|
||||
}
|
||||
```
|
||||
|
||||
If you deploy this configuration, someone can call `http://website.example/randomthing` and get some listing (and access) to the whole filesystem of the Ingress controller.
|
||||
|
||||
Now, can you spot the difference between chrooted and non chrooted Nginx on the listings below?
|
||||
|
||||
| Without extra `chroot()` | With extra `chroot()` |
|
||||
|----------------------------|--------|
|
||||
| `bin` | `bin` |
|
||||
| `dev` | `dev` |
|
||||
| `etc` | `etc` |
|
||||
| `home` | |
|
||||
| `lib` | `lib` |
|
||||
| `media` | |
|
||||
| `mnt` | |
|
||||
| `opt` | `opt` |
|
||||
| `proc` | `proc` |
|
||||
| `root` | |
|
||||
| `run` | `run` |
|
||||
| `sbin` | |
|
||||
| `srv` | |
|
||||
| `sys` | |
|
||||
| `tmp` | `tmp` |
|
||||
| `usr` | `usr` |
|
||||
| `var` | `var` |
|
||||
| `dbg` | |
|
||||
| `nginx-ingress-controller` | |
|
||||
| `wait-shutdown` | |
|
||||
|
||||
The one in left side is not chrooted. So NGINX has full access to the filesystem. The one in right side is chrooted, so a new filesystem with only the required files to make NGINX work is created.
|
||||
|
||||
## What about other security improvements in this release?
|
||||
|
||||
We know that the new `chroot()` mechanism helps address some portion of the risk, but still, someone
|
||||
can try to inject commands to read, for example, the `nginx.conf` file and extract sensitive information.
|
||||
|
||||
So, another change in this release (this is opt-out!) is the _deep inspector_.
|
||||
We know that some directives or regular expressions may be dangerous to NGINX, so the deep inspector
|
||||
checks all fields from an Ingress object (during its reconciliation, and also with a
|
||||
[validating admission webhook](/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook))
|
||||
to verify if any fields contains these dangerous directives.
|
||||
|
||||
The ingress controller already does this for annotations, and our goal is to move this existing validation to happen inside
|
||||
deep inspection as part of a future release.
|
||||
|
||||
You can take a look into the existing rules in [https://github.com/kubernetes/ingress-nginx/blob/main/internal/ingress/inspector/rules.go](https://github.com/kubernetes/ingress-nginx/blob/main/internal/ingress/inspector/rules.go).
|
||||
|
||||
Due to the nature of inspecting and matching all strings within relevant Ingress objects, this new feature may consume a bit more CPU. You can disable it by running the ingress controller with the command line argument `--deep-inspect=false`.
|
||||
|
||||
## What's next?
|
||||
|
||||
This is not our final goal. Our final goal is to split the control plane and the data plane processes.
|
||||
In fact, doing so will help us also achieve a [Gateway](https://gateway-api.sigs.k8s.io/) API implementation,
|
||||
as we may have a different controller as soon as it "knows" what to provide to the data plane
|
||||
(we need some help here!!)
|
||||
|
||||
Some other projects in Kubernetes already take this approach
|
||||
(like [KPNG](https://github.com/kubernetes-sigs/kpng), the proposed replacement for `kube-proxy`),
|
||||
and we plan to align with them and get the same experience for Ingress-NGINX.
|
||||
|
||||
## Further reading
|
||||
|
||||
If you want to take a look into how chrooting was done in Ingress NGINX, take a look
|
||||
into [https://github.com/kubernetes/ingress-nginx/pull/8337](https://github.com/kubernetes/ingress-nginx/pull/8337)
|
||||
The release v1.2.0 containing all the changes can be found at
|
||||
[https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.2.0](https://github.com/kubernetes/ingress-nginx/releases/tag/controller-v1.2.0)
|
Binary file not shown.
After Width: | Height: | Size: 59 KiB |
Binary file not shown.
After Width: | Height: | Size: 51 KiB |
|
@ -0,0 +1,319 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Frontiers, fsGroups and frogs: the Kubernetes 1.23 release interview"
|
||||
date: 2022-04-29
|
||||
---
|
||||
|
||||
**Author**: Craig Box (Google)
|
||||
|
||||
One of the highlights of hosting the weekly [Kubernetes Podcast from Google](https://kubernetespodcast.com/) is talking to the release managers for each new Kubernetes version. The release team is constantly refreshing. Many working their way from small documentation fixes, step up to shadow roles, and then eventually lead a release.
|
||||
|
||||
As we prepare for the 1.24 release next week, [in accordance with long-standing tradition](https://www.google.com/search?q=%22release+interview%22+site%3Akubernetes.io%2Fblog), I'm pleased to bring you a look back at the story of 1.23. The release was led by [Rey Lejano](https://twitter.com/reylejano), a Field Engineer at SUSE. [I spoke to Rey](https://kubernetespodcast.com/episode/167-kubernetes-1.23/) in December, as he was awaiting the birth of his first child.
|
||||
|
||||
Make sure you [subscribe, wherever you get your podcasts](https://kubernetespodcast.com/subscribe/), so you hear all our stories from the Cloud Native community, including the story of 1.24 next week.
|
||||
|
||||
*This transcript has been lightly edited and condensed for clarity.*
|
||||
|
||||
---
|
||||
|
||||
**CRAIG BOX: I'd like to start with what is, of course, on top of everyone's mind at the moment. Let's talk African clawed frogs!**
|
||||
|
||||
REY LEJANO: [CHUCKLES] Oh, you mean [Xenopus lavis](https://en.wikipedia.org/wiki/African_clawed_frog), the scientific name for the African clawed frog?
|
||||
|
||||
**CRAIG BOX: Of course.**
|
||||
|
||||
REY LEJANO: Not many people know, but my background and my degree is actually in microbiology, from the University of California Davis. I did some research for about four years in biochemistry, in a biochemistry lab, and I [do have a research paper published](https://www.sciencedirect.com/science/article/pii/). It's actually on glycoproteins, particularly something called "cortical granule lectin". We used frogs, because they generate lots and lots of eggs, from which we can extract the protein. That protein prevents polyspermy. When the sperm goes into the egg, the egg releases a glycoprotein, cortical granule lectin, to the membrane, and prevents any other sperm from going inside the egg.
|
||||
|
||||
**CRAIG BOX: Were you able to take anything from the testing that we did on frogs and generalize that to higher-order mammals, perhaps?**
|
||||
|
||||
REY LEJANO: Yes. Since mammals also have cortical granule lectin, we were able to analyze both the convergence and the evolutionary pattern, not just from multiple species of frogs, but also into mammals as well.
|
||||
|
||||
**CRAIG BOX: Now, there's a couple of different threads to unravel here. When you were young, what led you into the fields of biology, and perhaps more the technical side of it?**
|
||||
|
||||
REY LEJANO: I think it was mostly from family, since I do have a family history in the medical field that goes back generations. So I kind of felt like that was the natural path going into college.
|
||||
|
||||
**CRAIG BOX: Now, of course, you're working in a more abstract tech field. What led you out of microbiology?**
|
||||
|
||||
REY LEJANO: [CHUCKLES] Well, I've always been interested in tech. Taught myself a little programming when I was younger, before high school, did some web dev stuff. Just kind of got burnt out being in a lab. I was literally in the basement. I had a great opportunity to join a consultancy that specialized in [ITIL](https://www.axelos.com/certifications/itil-service-management/what-is-itil). I actually started off with application performance management, went into monitoring, went into operation management and also ITIL, which is aligning your IT asset management and service managements with business services. Did that for a good number of years, actually.
|
||||
|
||||
**CRAIG BOX: It's very interesting, as people describe the things that they went through and perhaps the technologies that they worked on, you can pretty much pinpoint how old they might be. There's a lot of people who come into tech these days that have never heard of ITIL. They have no idea what it is. It's basically just SRE with more process.**
|
||||
|
||||
REY LEJANO: Yes, absolutely. It's not very cloud native. [CHUCKLES]
|
||||
|
||||
**CRAIG BOX: Not at all.**
|
||||
|
||||
REY LEJANO: You don't really hear about it in the cloud native landscape. Definitely, you can tell someone's been in the field for a little bit, if they specialize or have worked with ITIL before.
|
||||
|
||||
**CRAIG BOX: You mentioned that you wanted to get out of the basement. That is quite often where people put the programmers. Did they just give you a bit of light in the new basement?**
|
||||
|
||||
REY LEJANO: [LAUGHS] They did give us much better lighting. Able to get some vitamin D sometimes, as well.
|
||||
|
||||
**CRAIG BOX: To wrap up the discussion about your previous career — over the course of the last year, with all of the things that have happened in the world, I could imagine that microbiology skills may be more in demand than perhaps they were when you studied them?**
|
||||
|
||||
REY LEJANO: Oh, absolutely. I could definitely see a big increase of numbers of people going into the field. Also, reading what's going on with the world currently kind of brings back all the education I've learned in the past, as well.
|
||||
|
||||
**CRAIG BOX: Do you keep in touch with people you went through school with?**
|
||||
|
||||
REY LEJANO: Just some close friends, but not in the microbiology field.
|
||||
|
||||
**CRAIG BOX: One thing that I think will probably happen as a result of the pandemic is a renewed interest in some of these STEM fields. It will be interesting to see what impact that has on society at large.**
|
||||
|
||||
REY LEJANO: Yeah. I think that'll be great.
|
||||
|
||||
**CRAIG BOX: You mentioned working at a consultancy doing IT management, application performance monitoring, and so on. When did Kubernetes come into your professional life?**
|
||||
|
||||
REY LEJANO: One of my good friends at the company I worked at, left in mid-2015. He went on to a company that was pretty heavily into Docker. He taught me a little bit. I did my first "docker run" around 2015, maybe 2016. Then, one of the applications we were using for the ITIL framework was containerized around 2018 or so, also in Kubernetes. At that time, it was pretty buggy. That was my initial introduction to Kubernetes and containerised applications.
|
||||
|
||||
Then I left that company, and I actually joined my friend over at [RX-M](https://rx-m.com/), which is a cloud native consultancy and training firm. They specialize in Docker and Kubernetes. I was able to get my feet wet. I got my CKD, got my CKA as well. And they were really, really great at encouraging us to learn more about Kubernetes and also to be involved in the community.
|
||||
|
||||
**CRAIG BOX: You will have seen, then, the life cycle of people adopting Kubernetes and containerization at large, through your own initial journey and then through helping customers. How would you characterize how that journey has changed from the early days to perhaps today?**
|
||||
|
||||
REY LEJANO: I think the early days, there was a lot of questions of, why do I have to containerize? Why can't I just stay with virtual machines?
|
||||
|
||||
**CRAIG BOX: It's a line item on your CV.**
|
||||
|
||||
REY LEJANO: [CHUCKLES] It is. And nowadays, I think people know the value of using containers, of orchestrating containers with Kubernetes. I don't want to say "jumping on the bandwagon", but it's become the de-facto standard to orchestrate containers.
|
||||
|
||||
**CRAIG BOX: It's not something that a consultancy needs to go out and pitch to customers that they should be doing. They're just taking it as, that will happen, and starting a bit further down the path, perhaps.**
|
||||
|
||||
REY LEJANO: Absolutely.
|
||||
|
||||
**CRAIG BOX: Working at a consultancy like that, how much time do you get to work on improving process, perhaps for multiple customers, and then looking at how you can upstream that work, versus paid work that you do for just an individual customer at a time?**
|
||||
|
||||
REY LEJANO: Back then, it would vary. They helped me introduce myself, and I learned a lot about the cloud native landscape and Kubernetes itself. They helped educate me as to how the cloud native landscape, and the tools around it, can be used together. My boss at that company, Randy, he actually encouraged us to start contributing upstream, and encouraged me to join the release team. He just said, this is a great opportunity. Definitely helped me with starting with the contributions early on.
|
||||
|
||||
**CRAIG BOX: Was the release team the way that you got involved with upstream Kubernetes contribution?**
|
||||
|
||||
REY LEJANO: Actually, no. My first contribution was with SIG Docs. I met Taylor Dolezal — he was the release team lead for 1.19, but he is involved with SIG Docs as well. I met him at KubeCon 2019, I sat at his table during a luncheon. I remember Paris Pittman was hosting this luncheon at the Marriott. Taylor says he was involved with SIG Docs. He encouraged me to join. I started joining into meetings, started doing a few drive-by PRs. That's what we call them — drive-by — little typo fixes. Then did a little bit more, started to send better or higher quality pull requests, and also reviewing PRs.
|
||||
|
||||
**CRAIG BOX: When did you first formally take your release team role?**
|
||||
|
||||
REY LEJANO: That was in [1.18](https://github.com/kubernetes/sig-release/blob/master/releases/release-1.18/release_team.md), in December. My boss at the time encouraged me to apply. I did, was lucky enough to get accepted for the release notes shadow. Then from there, stayed in with release notes for a few cycles, then went into Docs, naturally then led Docs, then went to Enhancements, and now I'm the release lead for 1.23.
|
||||
|
||||
**CRAIG BOX: I don't know that a lot of people think about what goes into a good release note. What would you say does?**
|
||||
|
||||
REY LEJANO: [CHUCKLES] You have to tell the end user what has changed or what effect that they might see in the release notes. It doesn't have to be highly technical. It could just be a few lines, and just saying what has changed, what they have to do if they have to do anything as well.
|
||||
|
||||
**CRAIG BOX: As you moved through the process of shadowing, how did you learn from the people who were leading those roles?**
|
||||
|
||||
REY LEJANO: I said this a few times when I was the release lead for this cycle. You get out of the release team as much as you put in, or it directly aligns to how much you put in. I learned a lot. I went into the release team having that mindset of learning from the role leads, learning from the other shadows, as well. That's actually a saying that my first role lead told me. I still carry it to heart, and that was back in 1.18. That was Eddie, in the very first meeting we had, and I still carry it to heart.
|
||||
|
||||
**CRAIG BOX: You, of course, were [the release lead for 1.23](https://github.com/kubernetes/sig-release/tree/master/releases/release-1.23). First of all, congratulations on the release.**
|
||||
|
||||
REY LEJANO: Thank you very much.
|
||||
|
||||
**CRAIG BOX: The theme for this release is [The Next Frontier](https://kubernetes.io/blog/2021/12/07/kubernetes-1-23-release-announcement/). Tell me the story of how we came to the theme and then the logo.**
|
||||
|
||||
REY LEJANO: The Next Frontier represents a few things. It not only represents the next enhancements in this release, but Kubernetes itself also has a history of Star Trek references. The original codename for Kubernetes was Project Seven, a reference to Seven of Nine, originally from Star Trek Voyager. Also the seven spokes in the helm in the logo of Kubernetes as well. And, of course, Borg, the predecessor to Kubernetes.
|
||||
|
||||
The Next Frontier continues that Star Trek reference. It's a fusion of two titles in the Star Trek universe. One is [Star Trek V, the Final Frontier](https://en.wikipedia.org/wiki/Star_Trek_V:_The_Final_Frontier), and the Star Trek: The Next Generation.
|
||||
|
||||
**CRAIG BOX: Do you have any opinion on the fact that Star Trek V was an odd-numbered movie, and they are [canonically referred to as being lesser than the even-numbered ones](https://screenrant.com/star-trek-movies-odd-number-curse-explained/)?**
|
||||
|
||||
REY LEJANO: I can't say, because I am such a sci-fi nerd that I love all of them even though they're bad. Even the post-Next Generation movies, after the series, I still liked all of them, even though I know some weren't that great.
|
||||
|
||||
**CRAIG BOX: Am I right in remembering that Star Trek V was the one directed by William Shatner?**
|
||||
|
||||
REY LEJANO: Yes, that is correct.
|
||||
|
||||
**CRAIG BOX: I think that says it all.**
|
||||
|
||||
REY LEJANO: [CHUCKLES] Yes.
|
||||
|
||||
**CRAIG BOX: Now, I understand that the theme comes from a part of the [SIG Release charter](https://github.com/kubernetes/community/blob/master/sig-release/charter.md)?**
|
||||
|
||||
REY LEJANO: Yes. There's a line in the SIG Release charter, "ensure there is a consistent group of community members in place to support the release process across time." With the release team, we have new shadows that join every single release cycle. With this, we're growing with this community. We're growing the release team members. We're growing SIG Release. We're growing the Kubernetes community itself. For a lot of people, this is their first time contributing to open source, so that's why I say it's their new open source frontier.
|
||||
|
||||
**CRAIG BOX: And the logo is obviously very Star Trek-inspired. It sort of surprised me that it took that long for someone to go this route.**
|
||||
|
||||
REY LEJANO: I was very surprised as well. I had to relearn Adobe Illustrator to create the logo.
|
||||
|
||||
**CRAIG BOX: This your own work, is it?**
|
||||
|
||||
REY LEJANO: This is my own work.
|
||||
|
||||
**CRAIG BOX: It's very nice.**
|
||||
|
||||
REY LEJANO: Thank you very much. Funny, the galaxy actually took me the longest time versus the ship. Took me a few days to get that correct. I'm always fine-tuning it, so there might be a final change when this is actually released.
|
||||
|
||||
**CRAIG BOX: No frontier is ever truly final.**
|
||||
|
||||
REY LEJANO: True, very true.
|
||||
|
||||
**CRAIG BOX: Moving now from the theme of the release to the substance, perhaps, what is new in 1.23?**
|
||||
|
||||
REY LEJANO: We have 47 enhancements. I'm going to run through most of the stable ones, if not all of them, some of the key Beta ones, and a few of the Alpha enhancements for 1.23.
|
||||
|
||||
One of the key enhancements is [dual-stack IPv4/IPv6](https://github.com/kubernetes/enhancements/issues/563), which went GA in 1.23.
|
||||
|
||||
Some background info: dual-stack was introduced as Alpha in 1.15. You probably saw a keynote at KubeCon 2019. Back then, the way dual-stack worked was that you needed two services — you needed a service per IP family. You would need a service for IPv4 and a service for IPv6. It was refactored in 1.20. In 1.21, it was in Beta; clusters were enabled to be dual-stack by default.
|
||||
|
||||
And then in 1.23 we did remove the IPv6 dual-stack feature flag. It's not mandatory to use dual-stack. It's actually not "default" still. The pods, the services still default to single-stack. There are some requirements to be able to use dual-stack. The nodes have to be routable on IPv4 and IPv6 network interfaces. You need a CNI plugin that supports dual-stack. The pods themselves have to be configured to be dual-stack. And the services need the ipFamilyPolicy field to specify prefer dual-stack, or require dual-stack.
|
||||
|
||||
**CRAIG BOX: This sounds like there's an implication in this that v4 is still required. Do you see a world where we can actually move to v6-only clusters?**
|
||||
|
||||
REY LEJANO: I think we'll be talking about IPv4 and IPv6 for many, many years to come. I remember a long time ago, they kept saying "it's going to be all IPv6", and that was decades ago.
|
||||
|
||||
**CRAIG BOX: I think I may have mentioned on the show before, but there was [a meeting in London that Vint Cerf attended](https://www.youtube.com/watch?v=AEaJtZVimqs), and he gave a public presentation at the time to say, now is the time of v6. And that was 10 years ago at least. It's still not the time of v6, and my desktop still doesn't have Linux on it. One day.**
|
||||
|
||||
REY LEJANO: [LAUGHS] In my opinion, that's one of the big key features that went stable for 1.23.
|
||||
|
||||
One of the other highlights of 1.23 is [pod security admission going to Beta](/blog/2021/12/09/pod-security-admission-beta/). I know this feature is going to Beta, but I highlight this because as some people might know, PodSecurityPolicy, which was deprecated in 1.21, is targeted to be removed in 1.25. Pod security admission replaces pod security policy. It's an admission controller. It evaluates the pods against a predefined set of pod security standards to either admit or deny the pod for running.
|
||||
|
||||
There's three levels of pod security standards. Privileged, that's totally open. Baseline, known privileges escalations are minimized. Or Restricted, which is hardened. And you could set pod security standards either to run in three modes, which is enforce: reject any pods that are in violation; to audit: pods are allowed to be created, but the violations are recorded; or warn: it will send a warning message to the user, and the pod is allowed.
|
||||
|
||||
**CRAIG BOX: You mentioned there that PodSecurityPolicy is due to be deprecated in two releases' time. Are we lining up these features so that pod security admission will be GA at that time?**
|
||||
|
||||
REY LEJANO: Yes. Absolutely. I'll talk about that for another feature in a little bit as well. There's also another feature that went to GA. It was an API that went to GA, and therefore the Beta API is now deprecated. I'll talk about that a little bit.
|
||||
|
||||
**CRAIG BOX: All right. Let's talk about what's next on the list.**
|
||||
|
||||
REY LEJANO: Let's move on to more stable enhancements. One is the [TTL controller](https://github.com/kubernetes/enhancements/issues/592). This cleans up jobs and pods after the jobs are finished. There is a TTL timer that starts when the job or pod is finished. This TTL controller watches all the jobs, and ttlSecondsAfterFinished needs to be set. The controller will see if the ttlSecondsAfterFinished, combined with the last transition time, if it's greater than now. If it is, then it will delete the job and the pods of that job.
|
||||
|
||||
**CRAIG BOX: Loosely, it could be called a garbage collector?**
|
||||
|
||||
REY LEJANO: Yes. Garbage collector for pods and jobs, or jobs and pods.
|
||||
|
||||
**CRAIG BOX: If Kubernetes is truly becoming a programming language, it of course has to have a garbage collector implemented.**
|
||||
|
||||
REY LEJANO: Yeah. There's another one, too, coming in Alpha. [CHUCKLES]
|
||||
|
||||
**CRAIG BOX: Tell me about that.**
|
||||
|
||||
REY LEJANO: That one is coming in in Alpha. It's actually one of my favorite features, because there's only a few that I'm going to highlight today. [PVCs for StafeulSet will be cleaned up](https://github.com/kubernetes/enhancements/issues/1847). It will auto-delete PVCs created by StatefulSets, when you delete that StatefulSet.
|
||||
|
||||
**CRAIG BOX: What's next on our tour of stable features?**
|
||||
|
||||
REY LEJANO: Next one is, [skip volume ownership change goes to stable](https://github.com/kubernetes/enhancements/issues/695). This is from SIG Storage. There are times when you're running a stateful application, like many databases, they're sensitive to permission bits changing underneath. Currently, when a volume is bind mounted inside the container, the permissions of that volume will change recursively. It might take a really long time.
|
||||
|
||||
Now, there's a field, the fsGroupChangePolicy, which allows you, as a user, to tell Kubernetes how you want the permission and ownership change for that volume to happen. You can set it to always, to always change permissions, or just on mismatch, to only do it when the permission ownership changes at the top level is different from what is expected.
|
||||
|
||||
**CRAIG BOX: It does feel like a lot of these enhancements came from a very particular use case where someone said, "hey, this didn't work for me and I've plumbed in a feature that works with exactly the thing I need to have".**
|
||||
|
||||
REY LEJANO: Absolutely. People create issues for these, then create Kubernetes enhancement proposals, and then get targeted for releases.
|
||||
|
||||
**CRAIG BOX: Another GA feature in this release — ephemeral volumes.**
|
||||
|
||||
REY LEJANO: We've always been able to use empty dir for ephemeral volumes, but now we could actually have [ephemeral inline volumes](https://github.com/kubernetes/enhancements/issues/1698), meaning that you could take your standard CSI driver and be able to use ephemeral volumes with it.
|
||||
|
||||
**CRAIG BOX: And, a long time coming, [CronJobs](https://github.com/kubernetes/enhancements/issues/19).**
|
||||
|
||||
REY LEJANO: CronJobs is a funny one, because it was stable before 1.23. For 1.23, it was still tracked,but it was just cleaning up some of the old controller. With CronJobs, there's a v2 controller. What was cleaned up in 1.23 is just the old v1 controller.
|
||||
|
||||
**CRAIG BOX: Were there any other duplications or major cleanups of note in this release?**
|
||||
|
||||
REY LEJANO: Yeah. There were a few you might see in the major themes. One's a little tricky, around FlexVolumes. This is one of the efforts from SIG Storage. They have an effort to migrate in-tree plugins to CSI drivers. This is a little tricky, because FlexVolumes were actually deprecated in November 2020. We're [formally announcing it in 1.23](https://github.com/kubernetes/community/blob/master/sig-storage/volume-plugin-faq.md#kubernetes-volume-plugin-faq-for-storage-vendors).
|
||||
|
||||
**CRAIG BOX: FlexVolumes, in my mind, predate CSI as a concept. So it's about time to get rid of them.**
|
||||
|
||||
REY LEJANO: Yes, it is. There's another deprecation, just some [klog specific flags](https://kubernetes.io/docs/concepts/cluster-administration/system-logs/#klog), but other than that, there are no other big deprecations in 1.23.
|
||||
|
||||
**CRAIG BOX: The buzzword of the last KubeCon, and in some ways the theme of the last 12 months, has been secure software supply chain. What work is Kubernetes doing to improve in this area?**
|
||||
|
||||
REY LEJANO: For 1.23, Kubernetes is now SLSA compliant at Level 1, which means that provenance attestation files that describe the staging and release phases of the release process are satisfactory for the SLSA framework.
|
||||
|
||||
**CRAIG BOX: What needs to happen to step up to further levels?**
|
||||
|
||||
REY LEJANO: Level 1 means a few things — that the build is scripted; that the provenance is available, meaning that the artifacts are verified and they're handed over from one phase to the next; and describes how the artifact is produced. Level 2 means that the source is version-controlled, which it is, provenance is authenticated, provenance is service-generated, and there is a build service. There are four levels of SLSA compliance.
|
||||
|
||||
**CRAIG BOX: It does seem like the levels were largely influenced by what it takes to build a big, secure project like this. It doesn't seem like it will take a lot of extra work to move up to verifiable provenance, for example. There's probably just a few lines of script required to meet many of those requirements.**
|
||||
|
||||
REY LEJANO: Absolutely. I feel like we're almost there; we'll see what will come out of 1.24. And I do want to give a big shout-out to SIG Release and Release Engineering, primarily to Adolfo García Veytia, who is aka Puerco on GitHub and on Slack. He's been driving this forward.
|
||||
|
||||
**CRAIG BOX: You've mentioned some APIs that are being graduated in time to replace their deprecated version. Tell me about the new HPA API.**
|
||||
|
||||
REY LEJANO: The [horizontal pod autoscaler v2 API](https://github.com/kubernetes/enhancements/issues/2702), is now stable, which means that the v2beta2 API is deprecated. Just for everyone's knowledge, the v1 API is not being deprecated. The difference is that v2 adds support for multiple and custom metrics to be used for HPA.
|
||||
|
||||
**CRAIG BOX: There's also now a facility to validate my CRDs with an expression language.**
|
||||
|
||||
REY LEJANO: Yeah. You can use the [Common Expression Language, or CEL](https://github.com/google/cel-spec), to validate your CRDs, so you no longer need to use webhooks. This also makes the CRDs more self-contained and declarative, because the rules are now kept within the CRD object definition.
|
||||
|
||||
**CRAIG BOX: What new features, perhaps coming in Alpha or Beta, have taken your interest?**
|
||||
|
||||
REY LEJANO: Aside from pod security policies, I really love [ephemeral containers](https://github.com/kubernetes/enhancements/issues/277) supporting kubectl debug. It launches an ephemeral container and a running pod, shares those pod namespaces, and you can do all your troubleshooting with just running kubectl debug.
|
||||
|
||||
**CRAIG BOX: There's also been some interesting changes in the way that events are handled with kubectl.**
|
||||
|
||||
REY LEJANO: Yeah. kubectl events has always had some issues, like how things weren't sorted. [kubectl events improved](https://github.com/kubernetes/enhancements/issues/1440) that so now you can do `--watch`, and it will also sort with the `--watch` option as well. That is something new. You can actually combine fields and custom columns. And also, you can list events in the timeline with doing the last N number of minutes. And you can also sort events using other criteria as well.
|
||||
|
||||
**CRAIG BOX: You are a field engineer at SUSE. Are there any things that are coming in that your individual customers that you deal with are looking out for?**
|
||||
|
||||
REY LEJANO: More of what I look out for to help the customers.
|
||||
|
||||
**CRAIG BOX: Right.**
|
||||
|
||||
REY LEJANO: I really love kubectl events. Really love the PVCs being cleaned up with StatefulSets. Most of it's for selfish reasons that it will improve troubleshooting efforts. [CHUCKLES]
|
||||
|
||||
**CRAIG BOX: I have always hoped that a release team lead would say to me, "yes, I have selfish reasons. And I finally got something I wanted in."**
|
||||
|
||||
REY LEJANO: [LAUGHS]
|
||||
|
||||
**CRAIG BOX: Perhaps I should run to be release team lead, just so I can finally get init containers fixed once and for all.**
|
||||
|
||||
REY LEJANO: Oh, init containers, I've been looking for that for a while. I've actually created animated GIFs on how init containers will be run with that Kubernetes enhancement proposal, but it's halted currently.
|
||||
|
||||
**CRAIG BOX: One day.**
|
||||
|
||||
REY LEJANO: One day. Maybe I shouldn't stay halted.
|
||||
|
||||
**CRAIG BOX: You mentioned there are obviously the things you look out for. Are there any things that are coming down the line, perhaps Alpha features or maybe even just proposals you've seen lately, that you're personally really looking forward to seeing which way they go?**
|
||||
|
||||
REY LEJANO: Yeah. Oone is a very interesting one, it affects the whole community, so it's not just for personal reasons. As you may have known, Dockershim is deprecated. And we did release a blog that it will be removed in 1.24.
|
||||
|
||||
**CRAIG BOX: Scared a bunch of people.**
|
||||
|
||||
REY LEJANO: Scared a bunch of people. From a survey, we saw that a lot of people are still using Docker and Dockershim. One of the enhancements for 1.23 is, [kubelet CRI goes to Beta](https://github.com/kubernetes/enhancements/issues/2040). This promotes the CRI API, which is required. This had to be in Beta for Dockershim to be removed in 1.24.
|
||||
|
||||
**CRAIG BOX: Now, in the last release team lead interview, [we spoke with Savitha Raghunathan](https://kubernetespodcast.com/episode/157-kubernetes-1.22/), and she talked about what she would advise you as her successor. It was to look out for the mental health of the team members. How were you able to take that advice on board?**
|
||||
|
||||
REY LEJANO: That was great advice from Savitha. A few things I've made note of with each release team meeting. After each release team meeting, I stop the recording, because we do record all the meetings and post them on YouTube. And I open up the floor to anyone who wants to say anything that's not recorded, that's not going to be on the agenda. Also, I tell people not to work on weekends. I broke this rule once, but other than that, I told people it could wait. Just be mindful of your mental health.
|
||||
|
||||
**CRAIG BOX: It's just been announced that [James Laverack from Jetstack](https://twitter.com/JamesLaverack/status/1466834312993644551) will be the release team lead for 1.24. James and I shared an interesting Mexican dinner at the last KubeCon in San Diego.**
|
||||
|
||||
REY LEJANO: Oh, nice. I didn't know you knew James.
|
||||
|
||||
**CRAIG BOX: The British tech scene. We're a very small world. What will your advice to James be?**
|
||||
|
||||
REY LEJANO: What I would tell James for 1.24 is use teachable moments in the release team meetings. When you're a shadow for the first time, it's very daunting. It's very difficult, because you don't know the repos. You don't know the release process. Everyone around you seems like they know the release process, and very familiar with what the release process is. But as a first-time shadow, you don't know all the vernacular for the community. I just advise to use teachable moments. Take a few minutes in the release team meetings to make it a little easier for new shadows to ramp up and to be familiar with the release process.
|
||||
|
||||
**CRAIG BOX: Has there been major evolution in the process in the time that you've been involved? Or do you think that it's effectively doing what it needs to do?**
|
||||
|
||||
REY LEJANO: It's always evolving. I remember my first time in release notes, 1.18, we said that our goal was to automate and program our way out so that we don't have a release notes team anymore. That's changed [CHUCKLES] quite a bit. Although there's been significant advancements in the release notes process by Adolfo and also James, they've created a subcommand in krel to generate release notes.
|
||||
|
||||
But nowadays, all their release notes are richer. Still not there at the automation process yet. Every release cycle, there is something a little bit different. For this release cycle, we had a production readiness review deadline. It was a soft deadline. A production readiness review is a review by several people in the community. It's actually been required since 1.21, and it ensures that the enhancements are observable, scalable, supportable, and it's safe to operate in production, and could also be disabled or rolled back. In 1.23, we had a deadline to have the production readiness review completed by a specific date.
|
||||
|
||||
**CRAIG BOX: How have you found the change of schedule to three releases per year rather than four?**
|
||||
|
||||
REY LEJANO: Moving to three releases a year from four, in my opinion, has been an improvement, because we support the last three releases, and now we can actually support the last releases in a calendar year instead of having 9 months out of 12 months of the year.
|
||||
|
||||
**CRAIG BOX: The next event on the calendar is a [Kubernetes contributor celebration](https://www.kubernetes.dev/events/kcc2021/) starting next Monday. What can we expect from that event?**
|
||||
|
||||
REY LEJANO: This is our second time running this virtual event. It's a virtual celebration to recognize the whole community and all of our accomplishments of the year, and also contributors. There's a number of events during this week of celebration. It starts the week of December 13.
|
||||
|
||||
There's events like the Kubernetes Contributor Awards, where SIGs honor and recognize the hard work of the community and contributors. There's also a DevOps party game as well. There is a cloud native bake-off. I do highly suggest people to go to [kubernetes.dev/celebration](https://www.kubernetes.dev/events/past-events/2021/kcc2021/) to learn more.
|
||||
|
||||
**CRAIG BOX: How exactly does one judge a virtual bake-off?**
|
||||
|
||||
REY LEJANO: That I don't know. [CHUCKLES]
|
||||
|
||||
**CRAIG BOX: I tasted my scones. I think they're the best. I rate them 10 out of 10.**
|
||||
|
||||
REY LEJANO: Yeah. That is very difficult to do virtually. I would have to say, probably what the dish is, how closely it is tied with Kubernetes or open source or to CNCF. There's a few judges. I know Josh Berkus and Rin Oliver are a few of the judges running the bake-off.
|
||||
|
||||
**CRAIG BOX: Yes. We spoke with Josh about his love of the kitchen, and so he seems like a perfect fit for that role.**
|
||||
|
||||
REY LEJANO: He is.
|
||||
|
||||
**CRAIG BOX: Finally, your wife and yourself are expecting your first child in January. Have you had a production readiness review for that?**
|
||||
|
||||
REY LEJANO: I think we failed that review. [CHUCKLES]
|
||||
|
||||
**CRAIG BOX: There's still time.**
|
||||
|
||||
REY LEJANO: We are working on refactoring. We're going to refactor a little bit in December, and `--apply` again.
|
||||
|
||||
---
|
||||
|
||||
_[Rey Lejano](https://twitter.com/reylejano) is a field engineer at SUSE, by way of Rancher Labs, and was the release team lead for Kubernetes 1.23. He is now also a co-chair for SIG Docs. His son Liam is now 3 and a half months old._
|
||||
|
||||
_You can find the [Kubernetes Podcast from Google](http://www.kubernetespodcast.com/) at [@KubernetesPod](https://twitter.com/KubernetesPod) on Twitter, and you can [subscribe](https://kubernetespodcast.com/subscribe/) so you never miss an episode._
|
|
@ -0,0 +1,25 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Dockershim: The Historical Context"
|
||||
date: 2022-05-03
|
||||
slug: dockershim-historical-context
|
||||
---
|
||||
|
||||
**Author:** Kat Cosgrove
|
||||
|
||||
|
||||
Dockershim has been removed as of Kubernetes v1.24, and this is a positive move for the project. However, context is important for fully understanding something, be it socially or in software development, and this deserves a more in-depth review. Alongside the dockershim removal in Kubernetes v1.24, we’ve seen some confusion (sometimes at a panic level) and dissatisfaction with this decision in the community, largely due to a lack of context around this removal. The decision to deprecate and eventually remove dockershim from Kubernetes was not made quickly or lightly. Still, it’s been in the works for so long that many of today’s users are newer than that decision, and certainly newer than the choices that led to the dockershim being necessary in the first place.
|
||||
|
||||
So what is the dockershim, and why is it going away?
|
||||
|
||||
In the early days of Kubernetes, we only supported one container runtime. That runtime was Docker Engine. Back then, there weren’t really a lot of other options out there and Docker was the dominant tool for working with containers, so this was not a controversial choice. Eventually, we started adding more container runtimes, like rkt and hypernetes, and it became clear that Kubernetes users want a choice of runtimes working best for them. So Kubernetes needed a way to allow cluster operators the flexibility to use whatever runtime they choose.
|
||||
|
||||
The [Container Runtime Interface](/blog/2016/12/container-runtime-interface-cri-in-kubernetes/) (CRI) was released to allow that flexibility. The introduction of CRI was great for the project and users alike, but it did introduce a problem: Docker Engine’s use as a container runtime predates CRI, and Docker Engine is not CRI-compatible. To solve this issue, a small software shim (dockershim) was introduced as part of the kubelet component specifically to fill in the gaps between Docker Engine and CRI, allowing cluster operators to continue using Docker Engine as their container runtime largely uninterrupted.
|
||||
|
||||
However, this little software shim was never intended to be a permanent solution. Over the course of years, its existence has introduced a lot of unnecessary complexity to the kubelet itself. Some integrations are inconsistently implemented for Docker because of this shim, resulting in an increased burden on maintainers, and maintaining vendor-specific code is not in line with our open source philosophy. To reduce this maintenance burden and move towards a more collaborative community in support of open standards, [KEP-2221 was introduced](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2221-remove-dockershim), proposing the removal of the dockershim. With the release of Kubernetes v1.20, the deprecation was official.
|
||||
|
||||
We didn’t do a great job communicating this, and unfortunately, the deprecation announcement led to some panic within the community. Confusion around what this meant for Docker as a company, if container images built by Docker would still run, and what Docker Engine actually is led to a conflagration on social media. This was our fault; we should have more clearly communicated what was happening and why at the time. To combat this, we released [a blog](/blog/2020/12/02/dont-panic-kubernetes-and-docker/) and [accompanying FAQ](/blog/2020/12/02/dockershim-faq/) to allay the community’s fears and correct some misconceptions about what Docker is and how containers work within Kubernetes. As a result of the community’s concerns, Docker and Mirantis jointly agreed to continue supporting the dockershim code in the form of [cri-dockerd](https://www.mirantis.com/blog/the-future-of-dockershim-is-cri-dockerd/), allowing you to continue using Docker Engine as your container runtime if need be. For the interest of users who want to try other runtimes, like containerd or cri-o, [migration documentation was written](/docs/tasks/administer-cluster/migrating-from-dockershim/change-runtime-containerd/).
|
||||
|
||||
We later [surveyed the community](https://kubernetes.io/blog/2021/11/12/are-you-ready-for-dockershim-removal/) and [discovered that there are still many users with questions and concerns](/blog/2022/01/07/kubernetes-is-moving-on-from-dockershim). In response, Kubernetes maintainers and the CNCF committed to addressing these concerns by extending documentation and other programs. In fact, this blog post is a part of this program. With so many end users successfully migrated to other runtimes, and improved documentation, we believe that everyone has a paved way to migration now.
|
||||
|
||||
Docker is not going away, either as a tool or as a company. It’s an important part of the cloud native community and the history of the Kubernetes project. We wouldn’t be where we are without them. That said, removing dockershim from kubelet is ultimately good for the community, the ecosystem, the project, and open source at large. This is an opportunity for all of us to come together to support open standards, and we’re glad to be doing so with the help of Docker and the community.
|
|
@ -0,0 +1,242 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Kubernetes 1.24: Stargazer"
|
||||
date: 2022-05-03
|
||||
slug: kubernetes-1-24-release-announcement
|
||||
---
|
||||
|
||||
**Authors**: [Kubernetes 1.24 Release Team](https://github.com/kubernetes/sig-release/blob/master/releases/release-1.24/release-team.md)
|
||||
|
||||
We are excited to announce the release of Kubernetes 1.24, the first release of 2022!
|
||||
|
||||
This release consists of 46 enhancements: fourteen enhancements have graduated to stable,
|
||||
fifteen enhancements are moving to beta, and thirteen enhancements are entering alpha.
|
||||
Also, two features have been deprecated, and two features have been removed.
|
||||
|
||||
## Major Themes
|
||||
|
||||
### Dockershim Removed from kubelet
|
||||
|
||||
After its deprecation in v1.20, the dockershim component has been removed from the kubelet in Kubernetes v1.24.
|
||||
From v1.24 onwards, you will need to either use one of the other [supported runtimes](/docs/setup/production-environment/container-runtimes/) (such as containerd or CRI-O)
|
||||
or use cri-dockerd if you are relying on Docker Engine as your container runtime.
|
||||
For more information about ensuring your cluster is ready for this removal, please
|
||||
see [this guide](/blog/2022/03/31/ready-for-dockershim-removal/).
|
||||
|
||||
### Beta APIs Off by Default
|
||||
|
||||
[New beta APIs will not be enabled in clusters by default](https://github.com/kubernetes/enhancements/issues/3136).
|
||||
Existing beta APIs and new versions of existing beta APIs will continue to be enabled by default.
|
||||
|
||||
### Signing Release Artifacts
|
||||
|
||||
Release artifacts are [signed](https://github.com/kubernetes/enhancements/issues/3031) using [cosign](https://github.com/sigstore/cosign)
|
||||
signatures,
|
||||
and there is experimental support for [verifying image signatures](/docs/tasks/administer-cluster/verify-signed-images/).
|
||||
Signing and verification of release artifacts is part of [increasing software supply chain security for the Kubernetes release process](https://github.com/kubernetes/enhancements/issues/3027).
|
||||
|
||||
### OpenAPI v3
|
||||
|
||||
Kubernetes 1.24 offers beta support for publishing its APIs in the [OpenAPI v3 format](https://github.com/kubernetes/enhancements/issues/2896).
|
||||
|
||||
### Storage Capacity and Volume Expansion Are Generally Available
|
||||
|
||||
[Storage capacity tracking](https://github.com/kubernetes/enhancements/issues/1472)
|
||||
supports exposing currently available storage capacity via [CSIStorageCapacity objects](/docs/concepts/storage/storage-capacity/#api)
|
||||
and enhances scheduling of pods that use CSI volumes with late binding.
|
||||
|
||||
[Volume expansion](https://github.com/kubernetes/enhancements/issues/284) adds support
|
||||
for resizing existing persistent volumes.
|
||||
|
||||
### NonPreemptingPriority to Stable
|
||||
|
||||
This feature adds [a new option to PriorityClasses](https://github.com/kubernetes/enhancements/issues/902),
|
||||
which can enable or disable pod preemption.
|
||||
|
||||
### Storage Plugin Migration
|
||||
|
||||
Work is underway to [migrate the internals of in-tree storage plugins](https://github.com/kubernetes/enhancements/issues/625) to call out to CSI Plugins
|
||||
while maintaining the original API.
|
||||
The [Azure Disk](https://github.com/kubernetes/enhancements/issues/1490)
|
||||
and [OpenStack Cinder](https://github.com/kubernetes/enhancements/issues/1489) plugins
|
||||
have both been migrated.
|
||||
|
||||
### gRPC Probes Graduate to Beta
|
||||
|
||||
With Kubernetes 1.24, the [gRPC probes functionality](https://github.com/kubernetes/enhancements/issues/2727)
|
||||
has entered beta and is available by default. You can now [configure startup, liveness, and readiness probes](/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes) for your gRPC app
|
||||
natively within Kubernetes without exposing an HTTP endpoint or
|
||||
using an extra executable.
|
||||
|
||||
### Kubelet Credential Provider Graduates to Beta
|
||||
|
||||
Originally released as Alpha in Kubernetes 1.20, the kubelet's support for
|
||||
[image credential providers](/docs/tasks/kubelet-credential-provider/kubelet-credential-provider/)
|
||||
has now graduated to Beta.
|
||||
This allows the kubelet to dynamically retrieve credentials for a container image registry
|
||||
using exec plugins rather than storing credentials on the node's filesystem.
|
||||
|
||||
### Contextual Logging in Alpha
|
||||
|
||||
Kubernetes 1.24 has introduced [contextual logging](https://github.com/kubernetes/enhancements/issues/3077)
|
||||
that enables the caller of a function to control all aspects of logging (output formatting, verbosity, additional values, and names).
|
||||
|
||||
### Avoiding Collisions in IP allocation to Services
|
||||
|
||||
Kubernetes 1.24 introduces a new opt-in feature that allows you to
|
||||
[soft-reserve a range for static IP address assignments](/docs/concepts/services-networking/service/#service-ip-static-sub-range)
|
||||
to Services.
|
||||
With the manual enablement of this feature, the cluster will prefer automatic assignment from
|
||||
the pool of Service IP addresses, thereby reducing the risk of collision.
|
||||
|
||||
A Service `ClusterIP` can be assigned:
|
||||
|
||||
* dynamically, which means the cluster will automatically pick a free IP within the configured Service IP range.
|
||||
* statically, which means the user will set one IP within the configured Service IP range.
|
||||
|
||||
Service `ClusterIP` are unique; hence, trying to create a Service with a `ClusterIP` that has already been allocated will return an error.
|
||||
|
||||
### Dynamic Kubelet Configuration is Removed from the Kubelet
|
||||
|
||||
After being deprecated in Kubernetes 1.22, Dynamic Kubelet Configuration has been removed from the kubelet. The feature will be removed from the API server in Kubernetes 1.26.
|
||||
|
||||
## CNI Version-Related Breaking Change
|
||||
|
||||
Before you upgrade to Kubernetes 1.24, please verify that you are using/upgrading to a container
|
||||
runtime that has been tested to work correctly with this release.
|
||||
|
||||
For example, the following container runtimes are being prepared, or have already been prepared, for Kubernetes:
|
||||
|
||||
* containerd v1.6.4 and later, v1.5.11 and later
|
||||
* CRI-O 1.24 and later
|
||||
|
||||
Service issues exist for pod CNI network setup and tear down in containerd
|
||||
v1.6.0–v1.6.3 when the CNI plugins have not been upgraded and/or the CNI config
|
||||
version is not declared in the CNI config files. The containerd team reports, "these issues are resolved in containerd v1.6.4."
|
||||
|
||||
With containerd v1.6.0–v1.6.3, if you do not upgrade the CNI plugins and/or
|
||||
declare the CNI config version, you might encounter the following "Incompatible
|
||||
CNI versions" or "Failed to destroy network for sandbox" error conditions.
|
||||
|
||||
## CSI Snapshot
|
||||
|
||||
_This information was added after initial publication._
|
||||
|
||||
[VolumeSnapshot v1beta1 CRD has been removed](https://github.com/kubernetes/enhancements/issues/177).
|
||||
Volume snapshot and restore functionality for Kubernetes and the Container Storage Interface (CSI), which provides standardized APIs design (CRDs) and adds PV snapshot/restore support for CSI volume drivers, moved to GA in v1.20. VolumeSnapshot v1beta1 was deprecated in v1.20 and is now unsupported. Refer to [KEP-177: CSI Snapshot](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/177-volume-snapshot#kep-177-csi-snapshot) and [Volume Snapshot GA blog](https://kubernetes.io/blog/2020/12/10/kubernetes-1.20-volume-snapshot-moves-to-ga/) for more information.
|
||||
|
||||
## Other Updates
|
||||
|
||||
### Graduations to Stable
|
||||
|
||||
This release saw fourteen enhancements promoted to stable:
|
||||
|
||||
* [Container Storage Interface (CSI) Volume Expansion](https://github.com/kubernetes/enhancements/issues/284)
|
||||
* [Pod Overhead](https://github.com/kubernetes/enhancements/issues/688): Account for resources tied to the pod sandbox but not specific containers.
|
||||
* [Add non-preempting option to PriorityClasses](https://github.com/kubernetes/enhancements/issues/902)
|
||||
* [Storage Capacity Tracking](https://github.com/kubernetes/enhancements/issues/1472)
|
||||
* [OpenStack Cinder In-Tree to CSI Driver Migration](https://github.com/kubernetes/enhancements/issues/1489)
|
||||
* [Azure Disk In-Tree to CSI Driver Migration](https://github.com/kubernetes/enhancements/issues/1490)
|
||||
* [Efficient Watch Resumption](https://github.com/kubernetes/enhancements/issues/1904): Watch can be efficiently resumed after kube-apiserver reboot.
|
||||
* [Service Type=LoadBalancer Class Field](https://github.com/kubernetes/enhancements/issues/1959): Introduce a new Service annotation `service.kubernetes.io/load-balancer-class` that allows multiple implementations of `type: LoadBalancer` Services in the same cluster.
|
||||
* [Indexed Job](https://github.com/kubernetes/enhancements/issues/2214): Add a completion index to Pods of Jobs with a fixed completion count.
|
||||
* [Add Suspend Field to Jobs API](https://github.com/kubernetes/enhancements/issues/2232): Add a suspend field to the Jobs API to allow orchestrators to create jobs with more control over when pods are created.
|
||||
* [Pod Affinity NamespaceSelector](https://github.com/kubernetes/enhancements/issues/2249): Add a `namespaceSelector` field for to pod affinity/anti-affinity spec.
|
||||
* [Leader Migration for Controller Managers](https://github.com/kubernetes/enhancements/issues/2436): kube-controller-manager and cloud-controller-manager can apply new controller-to-controller-manager assignment in HA control plane without downtime.
|
||||
* [CSR Duration](https://github.com/kubernetes/enhancements/issues/2784): Extend the CertificateSigningRequest API with a mechanism to allow clients to request a specific duration for the issued certificate.
|
||||
|
||||
### Major Changes
|
||||
|
||||
This release saw two major changes:
|
||||
|
||||
* [Dockershim Removal](https://github.com/kubernetes/enhancements/issues/2221)
|
||||
* [Beta APIs are off by Default](https://github.com/kubernetes/enhancements/issues/3136)
|
||||
|
||||
### Release Notes
|
||||
|
||||
Check out the full details of the Kubernetes 1.24 release in our [release notes](https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.24.md).
|
||||
|
||||
### Availability
|
||||
|
||||
Kubernetes 1.24 is available for download on [GitHub](https://github.com/kubernetes/kubernetes/releases/tag/v1.24.0).
|
||||
To get started with Kubernetes, check out these [interactive tutorials](/docs/tutorials/) or run local
|
||||
Kubernetes clusters using containers as “nodes”, with [kind](https://kind.sigs.k8s.io/).
|
||||
You can also easily install 1.24 using [kubeadm](/docs/setup/independent/create-cluster-kubeadm/).
|
||||
|
||||
### Release Team
|
||||
|
||||
This release would not have been possible without the combined efforts of committed individuals
|
||||
comprising the Kubernetes 1.24 release team. This team came together to deliver all of the components
|
||||
that go into each Kubernetes release, including code, documentation, release notes, and more.
|
||||
|
||||
Special thanks to James Laverack, our release lead, for guiding us through a successful release cycle,
|
||||
and to all of the release team members for the time and effort they put in to deliver the v1.24
|
||||
release for the Kubernetes community.
|
||||
|
||||
### Release Theme and Logo
|
||||
|
||||
**Kubernetes 1.24: Stargazer**
|
||||
|
||||
{{< figure src="/images/blog/2022-05-03-kubernetes-release-1.24/kubernetes-1.24.png" alt="" class="release-logo" >}}
|
||||
|
||||
The theme for Kubernetes 1.24 is _Stargazer_.
|
||||
|
||||
Generations of people have looked to the stars in awe and wonder, from ancient astronomers to the
|
||||
scientists who built the James Webb Space Telescope. The stars have inspired us, set our imagination
|
||||
alight, and guided us through long nights on difficult seas.
|
||||
|
||||
With this release we gaze upwards, to what is possible when our community comes together. Kubernetes
|
||||
is the work of hundreds of contributors across the globe and thousands of end-users supporting
|
||||
applications that serve millions. Every one is a star in our sky, helping us chart our course.
|
||||
|
||||
The release logo is made by [Britnee Laverack](https://www.instagram.com/artsyfie/), and depicts a telescope set upon starry skies and the
|
||||
[Pleiades](https://en.wikipedia.org/wiki/Pleiades), often known in mythology as the “Seven Sisters”. The number seven is especially auspicious
|
||||
for the Kubernetes project, and is a reference back to our original “Project Seven” name.
|
||||
|
||||
This release of Kubernetes is named for those that would look towards the night sky and wonder — for
|
||||
all the stargazers out there. ✨
|
||||
|
||||
### User Highlights
|
||||
|
||||
* Check out how leading retail e-commerce company [La Redoute used Kubernetes, alongside other CNCF projects, to transform and streamline its software delivery lifecycle](https://www.cncf.io/case-studies/la-redoute/) - from development to operations.
|
||||
* Trying to ensure no change to an API call would cause any breaks, [Salt Security built its microservices entirely on Kubernetes, and it communicates via gRPC while Linkerd ensures messages are encrypted](https://www.cncf.io/case-studies/salt-security/).
|
||||
* In their effort to migrate from private to public cloud, [Allainz Direct engineers redesigned its CI/CD pipeline in just three months while managing to condense 200 workflows down to 10-15](https://www.cncf.io/case-studies/allianz/).
|
||||
* Check out how [Bink, a UK based fintech company, updated its in-house Kubernetes distribution with Linkerd to build a cloud-agnostic platform that scales as needed whilst allowing them to keep a close eye on performance and stability](https://www.cncf.io/case-studies/bink/).
|
||||
* Using Kubernetes, the Dutch organization [Stichting Open Nederland](http://www.stichtingopennederland.nl/) created a testing portal in just one-and-a-half months to help safely reopen events in the Netherlands. The [Testing for Entry (Testen voor Toegang)](https://www.testenvoortoegang.org/) platform [leveraged the performance and scalability of Kubernetes to help individuals book over 400,000 COVID-19 testing appointments per day. ](https://www.cncf.io/case-studies/true/)
|
||||
* Working alongside SparkFabrik and utilizing Backstage, [Santagostino created the developer platform Samaritan to centralize services and documentation, manage the entire lifecycle of services, and simplify the work of Santagostino developers](https://www.cncf.io/case-studies/santagostino/).
|
||||
|
||||
### Ecosystem Updates
|
||||
|
||||
* KubeCon + CloudNativeCon Europe 2022 will take place in Valencia, Spain, from 16 – 20 May 2022! You can find more information about the conference and registration on the [event site](https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/).
|
||||
* In the [2021 Cloud Native Survey](https://www.cncf.io/announcements/2022/02/10/cncf-sees-record-kubernetes-and-container-adoption-in-2021-cloud-native-survey/), the CNCF saw record Kubernetes and container adoption. Take a look at the [results of the survey](https://www.cncf.io/reports/cncf-annual-survey-2021/).
|
||||
* The [Linux Foundation](https://www.linuxfoundation.org/) and [The Cloud Native Computing Foundation](https://www.cncf.io/) (CNCF) announced the availability of a new [Cloud Native Developer Bootcamp](https://training.linuxfoundation.org/training/cloudnativedev-bootcamp/?utm_source=lftraining&utm_medium=pr&utm_campaign=clouddevbc0322) to provide participants with the knowledge and skills to design, build, and deploy cloud native applications. Check out the [announcement](https://www.cncf.io/announcements/2022/03/15/new-cloud-native-developer-bootcamp-provides-a-clear-path-to-cloud-native-careers/) to learn more.
|
||||
|
||||
### Project Velocity
|
||||
|
||||
The [CNCF K8s DevStats](https://k8s.devstats.cncf.io/d/12/dashboards?orgId=1&refresh=15m) project
|
||||
aggregates a number of interesting data points related to the velocity of Kubernetes and various
|
||||
sub-projects. This includes everything from individual contributions to the number of companies that
|
||||
are contributing, and is an illustration of the depth and breadth of effort that goes into evolving this ecosystem.
|
||||
|
||||
In the v1.24 release cycle, which [ran for 17 weeks](https://github.com/kubernetes/sig-release/tree/master/releases/release-1.24) (January 10 to May 3), we saw contributions from [1029 companies](https://k8s.devstats.cncf.io/d/9/companies-table?orgId=1&var-period_name=v1.23.0%20-%20v1.24.0&var-metric=contributions) and [1179 individuals](https://k8s.devstats.cncf.io/d/66/developer-activity-counts-by-companies?orgId=1&var-period_name=v1.23.0%20-%20v1.24.0&var-metric=contributions&var-repogroup_name=Kubernetes&var-country_name=All&var-companies=All&var-repo_name=kubernetes%2Fkubernetes).
|
||||
|
||||
## Upcoming Release Webinar
|
||||
|
||||
Join members of the Kubernetes 1.24 release team on Tue May 24, 2022 9:45am – 11am PT to learn about
|
||||
the major features of this release, as well as deprecations and removals to help plan for upgrades.
|
||||
For more information and registration, visit the [event page](https://community.cncf.io/e/mck3kd/)
|
||||
on the CNCF Online Programs site.
|
||||
|
||||
## Get Involved
|
||||
|
||||
The simplest way to get involved with Kubernetes is by joining one of the many [Special Interest Groups](https://github.com/kubernetes/community/blob/master/sig-list.md) (SIGs) that align with your interests.
|
||||
Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly [community meeting](https://github.com/kubernetes/community/tree/master/communication), and through the channels below:
|
||||
|
||||
* Find out more about contributing to Kubernetes at the [Kubernetes Contributors](https://www.kubernetes.dev/) website
|
||||
* Follow us on Twitter [@Kubernetesio](https://twitter.com/kubernetesio) for the latest updates
|
||||
* Join the community discussion on [Discuss](https://discuss.kubernetes.io/)
|
||||
* Join the community on [Slack](http://slack.k8s.io/)
|
||||
* Post questions (or answer questions) on [Server Fault](https://serverfault.com/questions/tagged/kubernetes).
|
||||
* Share your Kubernetes [story](https://docs.google.com/a/linuxfoundation.org/forms/d/e/1FAIpQLScuI7Ye3VQHQTwBASrgkjQDSS5TP0g3AXfFhwSM9YpHgxRKFA/viewform)
|
||||
* Read more about what’s happening with Kubernetes on the [blog](https://kubernetes.io/blog/)
|
||||
* Learn more about the [Kubernetes Release Team](https://github.com/kubernetes/sig-release/tree/master/release-team)
|
|
@ -0,0 +1,103 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Kubernetes 1.24: Volume Expansion Now A Stable Feature"
|
||||
date: 2022-05-05
|
||||
slug: volume-expansion-ga
|
||||
---
|
||||
|
||||
**Author:** Hemant Kumar (Red Hat)
|
||||
|
||||
Volume expansion was introduced as a alpha feature in Kubernetes 1.8 and it went beta in 1.11 and with Kubernetes 1.24 we are excited to announce general availability(GA)
|
||||
of volume expansion.
|
||||
|
||||
This feature allows Kubernetes users to simply edit their `PersistentVolumeClaim` objects and specify new size in PVC Spec and Kubernetes will automatically expand the volume
|
||||
using storage backend and also expand the underlying file system in-use by the Pod without requiring any downtime at all if possible.
|
||||
|
||||
|
||||
### How to use volume expansion
|
||||
|
||||
You can trigger expansion for a PersistentVolume by editing the `spec` field of a PVC, specifying a different
|
||||
(and larger) storage request. For example, given following PVC:
|
||||
|
||||
```
|
||||
kind: PersistentVolumeClaim
|
||||
apiVersion: v1
|
||||
metadata:
|
||||
name: myclaim
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 1Gi # specify new size here
|
||||
```
|
||||
|
||||
You can request expansion of the underlying PersistentVolume by specifying a new value instead of old `1Gi` size.
|
||||
Once you've changed the requested size, watch the `status.conditions` field of the PVC to see if the
|
||||
resize has completed.
|
||||
|
||||
When Kubernetes starts expanding the volume - it will add `Resizing` condition to the PVC, which will be removed once expansion completes. More information about progress of
|
||||
expansion operation can also be obtained by monitoring events associated with the PVC:
|
||||
|
||||
```
|
||||
kubectl describe pvc <pvc>
|
||||
```
|
||||
|
||||
### Storage driver support
|
||||
|
||||
Not every volume type however is expandable by default. Some volume types such as - intree hostpath volumes are not expandable at all. For CSI volumes - the CSI driver
|
||||
must have capability `EXPAND_VOLUME` in controller or node service (or both if appropriate). Please refer to documentation of your CSI driver, to find out
|
||||
if it supports volume expansion.
|
||||
|
||||
Please refer to volume expansion documentation for intree volume types which support volume expansion - [Expanding Persistent Volumes](/docs/concepts/storage/persistent-volumes/#expanding-persistent-volumes-claims).
|
||||
|
||||
|
||||
In general to provide some degree of control over volumes that can be expanded, only dynamically provisioned PVCs whose storage class has `allowVolumeExpansion` parameter set to `true` are expandable.
|
||||
|
||||
A Kubernetes cluster administrator must edit the appropriate StorageClass object and set
|
||||
the `allowVolumeExpansion` field to `true`. For example:
|
||||
|
||||
```
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: gp2-default
|
||||
provisioner: kubernetes.io/aws-ebs
|
||||
parameters:
|
||||
secretNamespace: ""
|
||||
secretName: ""
|
||||
allowVolumeExpansion: true
|
||||
```
|
||||
|
||||
### Online expansion compared to offline expansion
|
||||
|
||||
By default, Kubernetes attempts to expand volumes immediately after user requests a resize.
|
||||
If one or more Pods are using the volume, Kubernetes tries to expands the volume using an online resize;
|
||||
as a result volume expansion usually requires no application downtime.
|
||||
Filesystem expansion on the node is also performed online and hence does not require shutting
|
||||
down any Pod that was using the PVC.
|
||||
|
||||
If you expand a PersistentVolume that is not in use, Kubernetes does an offline resize (and,
|
||||
because the volume isn't in use, there is again no workload disruption).
|
||||
|
||||
In some cases though - if underlying Storage Driver can only support offline expansion, users of the PVC must take down their Pod before expansion can succeed. Please refer to documentation of your storage
|
||||
provider to find out - what mode of volume expansion it supports.
|
||||
|
||||
When volume expansion was introduced as an alpha feature, Kubernetes only supported offline filesystem
|
||||
expansion on the node and hence required users to restart their pods for file system resizing to finish.
|
||||
His behaviour has been changed and Kubernetes tries its best to fulfil any resize request regardless
|
||||
of whether the underlying PersistentVolume volume is online or offline. If your storage provider supports
|
||||
online expansion then no Pod restart should be necessary for volume expansion to finish.
|
||||
|
||||
## Next steps
|
||||
|
||||
Although volume expansion is now stable as part of the recent v1.24 release,
|
||||
SIG Storage are working to make it even simpler for users of Kubernetes to expand their persistent storage.
|
||||
Kubernetes 1.23 introduced features for triggering recovery from failed volume expansion, allowing users
|
||||
to attempt self-service healing after a failed resize.
|
||||
See [Recovering from volume expansion failure](/docs/concepts/storage/persistent-volumes/#recovering-from-failure-when-expanding-volumes) for more details.
|
||||
|
||||
The Kubernetes contributor community is also discussing the potential for StatefulSet-driven storage expansion. This proposed
|
||||
feature would let you trigger expansion for all underlying PVs that are providing storage to a StatefulSet,
|
||||
by directly editing the StatefulSet object.
|
||||
See the [Support Volume Expansion Through StatefulSets](https://github.com/kubernetes/enhancements/issues/661) enhancement proposal for more details.
|
|
@ -0,0 +1,79 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Storage Capacity Tracking reaches GA in Kubernetes 1.24"
|
||||
date: 2022-05-06
|
||||
slug: storage-capacity-ga
|
||||
---
|
||||
|
||||
**Authors:** Patrick Ohly (Intel)
|
||||
|
||||
The v1.24 release of Kubernetes brings [storage capacity](/docs/concepts/storage/storage-capacity/)
|
||||
tracking as a generally available feature.
|
||||
|
||||
## Problems we have solved
|
||||
|
||||
As explained in more detail in the [previous blog post about this
|
||||
feature](/blog/2021/04/14/local-storage-features-go-beta/), storage capacity
|
||||
tracking allows a CSI driver to publish information about remaining
|
||||
capacity. The kube-scheduler then uses that information to pick suitable nodes
|
||||
for a Pod when that Pod has volumes that still need to be provisioned.
|
||||
|
||||
Without this information, a Pod may get stuck without ever being scheduled onto
|
||||
a suitable node because kube-scheduler has to choose blindly and always ends up
|
||||
picking a node for which the volume cannot be provisioned because the
|
||||
underlying storage system managed by the CSI driver does not have sufficient
|
||||
capacity left.
|
||||
|
||||
Because CSI drivers publish storage capacity information that gets used at a
|
||||
later time when it might not be up-to-date anymore, it can still happen that a
|
||||
node is picked that doesn't work out after all. Volume provisioning recovers
|
||||
from that by informing the scheduler that it needs to try again with a
|
||||
different node.
|
||||
|
||||
[Load
|
||||
tests](https://github.com/kubernetes-csi/csi-driver-host-path/blob/master/docs/storage-capacity-tracking.md)
|
||||
that were done again for promotion to GA confirmed that all storage in a
|
||||
cluster can be consumed by Pods with storage capacity tracking whereas Pods got
|
||||
stuck without it.
|
||||
|
||||
## Problems we have *not* solved
|
||||
|
||||
Recovery from a failed volume provisioning attempt has one known limitation: if a Pod
|
||||
uses two volumes and only one of them could be provisioned, then all future
|
||||
scheduling decisions are limited by the already provisioned volume. If that
|
||||
volume is local to a node and the other volume cannot be provisioned there, the
|
||||
Pod is stuck. This problem pre-dates storage capacity tracking and while the
|
||||
additional information makes it less likely to occur, it cannot be avoided in
|
||||
all cases, except of course by only using one volume per Pod.
|
||||
|
||||
An idea for solving this was proposed in a [KEP
|
||||
draft](https://github.com/kubernetes/enhancements/pull/1703): volumes that were
|
||||
provisioned and haven't been used yet cannot have any valuable data and
|
||||
therefore could be freed and provisioned again elsewhere. SIG Storage is
|
||||
looking for interested developers who want to continue working on this.
|
||||
|
||||
Also not solved is support in Cluster Autoscaler for Pods with volumes. For CSI
|
||||
drivers with storage capacity tracking, a prototype was developed and discussed
|
||||
in [a PR](https://github.com/kubernetes/autoscaler/pull/3887). It was meant to
|
||||
work with arbitrary CSI drivers, but that flexibility made it hard to configure
|
||||
and slowed down scale up operations: because autoscaler was unable to simulate
|
||||
volume provisioning, it only scaled the cluster by one node at a time, which
|
||||
was seen as insufficient.
|
||||
|
||||
Therefore that PR was not merged and a different approach with tighter coupling
|
||||
between autoscaler and CSI driver will be needed. For this a better
|
||||
understanding is needed about which local storage CSI drivers are used in
|
||||
combination with cluster autoscaling. Should this lead to a new KEP, then users
|
||||
will have to try out an implementation in practice before it can move to beta
|
||||
or GA. So please reach out to SIG Storage if you have an interest in this
|
||||
topic.
|
||||
|
||||
## Acknowledgements
|
||||
|
||||
Thanks a lot to the members of the community who have contributed to this
|
||||
feature or given feedback including members of [SIG
|
||||
Scheduling](https://github.com/kubernetes/community/tree/master/sig-scheduling),
|
||||
[SIG
|
||||
Autoscaling](https://github.com/kubernetes/community/tree/master/sig-autoscaling),
|
||||
and of course [SIG
|
||||
Storage](https://github.com/kubernetes/community/tree/master/sig-storage)!
|
|
@ -0,0 +1,209 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Kubernetes 1.24: gRPC container probes in beta"
|
||||
date: 2022-05-13
|
||||
slug: grpc-probes-now-in-beta
|
||||
---
|
||||
|
||||
**Author**: Sergey Kanzhelev (Google)
|
||||
|
||||
|
||||
With Kubernetes 1.24 the gRPC probes functionality entered beta and is available by default.
|
||||
Now you can configure startup, liveness, and readiness probes for your gRPC app
|
||||
without exposing any HTTP endpoint, nor do you need an executable. Kubernetes can natively connect to your your workload via gRPC and query its status.
|
||||
|
||||
## Some history
|
||||
|
||||
It's useful to let the system managing your workload check that the app is
|
||||
healthy, has started OK, and whether the app considers itself good to accept
|
||||
traffic. Before the gRPC support was added, Kubernetes already allowed you to
|
||||
check for health based on running an executable from inside the container image,
|
||||
by making an HTTP request, or by checking whether a TCP connection succeeded.
|
||||
|
||||
For most apps, those checks are enough. If your app provides a gRPC endpoint
|
||||
for a health (or readiness) check, it is easy
|
||||
to repurpose the `exec` probe to use it for gRPC health checking.
|
||||
In the blog article [Health checking gRPC servers on Kubernetes](/blog/2018/10/01/health-checking-grpc-servers-on-kubernetes/),
|
||||
Ahmet Alp Balkan described how you can do that — a mechanism that still works today.
|
||||
|
||||
There is a commonly used tool to enable this that was [created](https://github.com/grpc-ecosystem/grpc-health-probe/commit/2df4478982e95c9a57d5fe3f555667f4365c025d)
|
||||
on August 21, 2018, and with
|
||||
the first release at [Sep 19, 2018](https://github.com/grpc-ecosystem/grpc-health-probe/releases/tag/v0.1.0-alpha.1).
|
||||
|
||||
This approach for gRPC apps health checking is very popular. There are [3,626 Dockerfiles](https://github.com/search?l=Dockerfile&q=grpc_health_probe&type=code)
|
||||
with the `grpc_health_probe` and [6,621 yaml](https://github.com/search?l=YAML&q=grpc_health_probe&type=Code) files that are discovered with the
|
||||
basic search on GitHub (at the moment of writing). This is good indication of the tool popularity
|
||||
and the need to support this natively.
|
||||
|
||||
Kubernetes v1.23 introduced an alpha-quality implementation of native support for
|
||||
querying a workload status using gRPC. Because it was an alpha feature,
|
||||
this was disabled by default for the v1.23 release.
|
||||
|
||||
## Using the feature
|
||||
|
||||
We built gRPC health checking in similar way with other probes and believe
|
||||
it will be [easy to use](/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-grpc-liveness-probe)
|
||||
if you are familiar with other probe types in Kubernetes.
|
||||
The natively supported health probe has many benefits over the workaround involving `grpc_health_probe` executable.
|
||||
|
||||
With the native gRPC support you don't need to download and carry `10MB` of an additional executable with your image.
|
||||
Exec probes are generally slower than a gRPC call as they require instantiating a new process to run an executable.
|
||||
It also makes the checks less sensible for edge cases when the pod is running at maximum resources and has troubles
|
||||
instantiating new processes.
|
||||
|
||||
There are a few limitations though. Since configuring a client certificate for probes is hard,
|
||||
services that require client authentication are not supported. The built-in probes are also
|
||||
not checking the server certificates and ignore related problems.
|
||||
|
||||
Built-in checks also cannot be configured to ignore certain types of errors
|
||||
(`grpc_health_probe` returns different exit codes for different errors),
|
||||
and cannot be "chained" to run the health check on multiple services in a single probe.
|
||||
|
||||
But all these limitations are quite standard for gRPC and there are easy workarounds
|
||||
for those.
|
||||
|
||||
## Try it for yourself
|
||||
|
||||
### Cluster-level setup
|
||||
|
||||
You can try this feature today. To try native gRPC probes, you can spin up a Kubernetes cluster
|
||||
yourself with the `GRPCContainerProbe` feature gate enabled, there are many [tools available](/docs/tasks/tools/).
|
||||
|
||||
Since the feature gate `GRPCContainerProbe` is enabled by default in 1.24,
|
||||
many vendors will have this functionality working out of the box.
|
||||
So you may just create an 1.24 cluster on platform of your choice. Some vendors
|
||||
allow to enable alpha features on 1.23 clusters.
|
||||
|
||||
For example, at the moment of writing, you can spin up the test cluster on GKE for a quick test.
|
||||
Other vendors may also have similar capabilities, especially if you
|
||||
are reading this blog post long after the Kubernetes 1.24 release.
|
||||
|
||||
On GKE use the following command (note, version is `1.23` and `enable-kubernetes-alpha` are specified).
|
||||
|
||||
```shell
|
||||
gcloud container clusters create test-grpc \
|
||||
--enable-kubernetes-alpha \
|
||||
--no-enable-autorepair \
|
||||
--no-enable-autoupgrade \
|
||||
--release-channel=rapid \
|
||||
--cluster-version=1.23
|
||||
```
|
||||
|
||||
You will also need to configure `kubectl` to access the cluster:
|
||||
|
||||
```shell
|
||||
gcloud container clusters get-credentials test-grpc
|
||||
```
|
||||
|
||||
### Trying the feature out
|
||||
|
||||
Let's create the pod to test how gRPC probes work. For this test we will use the `agnhost` image.
|
||||
This is a k8s maintained image with that can be used for all sorts of workload testing.
|
||||
For example, it has a useful [grpc-health-checking](https://github.com/kubernetes/kubernetes/blob/b2c5bd2a278288b5ef19e25bf7413ecb872577a4/test/images/agnhost/README.md#grpc-health-checking) module
|
||||
that exposes two ports - one is serving health checking service,
|
||||
another - http port to react on commands `make-serving` and `make-not-serving`.
|
||||
|
||||
Here is an example pod definition. It starts the `grpc-health-checking` module,
|
||||
exposes ports `5000` and `8080`, and configures gRPC readiness probe:
|
||||
|
||||
``` yaml
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: test-grpc
|
||||
spec:
|
||||
containers:
|
||||
- name: agnhost
|
||||
image: k8s.gcr.io/e2e-test-images/agnhost:2.35
|
||||
command: ["/agnhost", "grpc-health-checking"]
|
||||
ports:
|
||||
- containerPort: 5000
|
||||
- containerPort: 8080
|
||||
readinessProbe:
|
||||
grpc:
|
||||
port: 5000
|
||||
```
|
||||
|
||||
If the file called `test.yaml`, you can create the pod and check it's status.
|
||||
The pod will be in ready state as indicated by the snippet of the output.
|
||||
|
||||
```shell
|
||||
kubectl apply -f test.yaml
|
||||
kubectl describe test-grpc
|
||||
```
|
||||
|
||||
The output will contain something like this:
|
||||
|
||||
```
|
||||
Conditions:
|
||||
Type Status
|
||||
Initialized True
|
||||
Ready True
|
||||
ContainersReady True
|
||||
PodScheduled True
|
||||
```
|
||||
|
||||
Now let's change the health checking endpoint status to NOT_SERVING.
|
||||
In order to call the http port of the Pod, let's create a port forward:
|
||||
|
||||
```shell
|
||||
kubectl port-forward test-grpc 8080:8080
|
||||
```
|
||||
|
||||
You can `curl` to call the command...
|
||||
|
||||
```shell
|
||||
curl http://localhost:8080/make-not-serving
|
||||
```
|
||||
|
||||
... and in a few seconds the port status will switch to not ready.
|
||||
|
||||
```shell
|
||||
kubectl describe pod test-grpc
|
||||
```
|
||||
|
||||
The output now will have:
|
||||
|
||||
```
|
||||
Conditions:
|
||||
Type Status
|
||||
Initialized True
|
||||
Ready False
|
||||
ContainersReady False
|
||||
PodScheduled True
|
||||
|
||||
...
|
||||
|
||||
Warning Unhealthy 2s (x6 over 42s) kubelet Readiness probe failed: service unhealthy (responded with "NOT_SERVING")
|
||||
```
|
||||
|
||||
Once it is switched back, in about one second the Pod will get back to ready status:
|
||||
|
||||
``` bsh
|
||||
curl http://localhost:8080/make-serving
|
||||
kubectl describe test-grpc
|
||||
```
|
||||
|
||||
The output indicates that the Pod went back to being `Ready`:
|
||||
|
||||
```
|
||||
Conditions:
|
||||
Type Status
|
||||
Initialized True
|
||||
Ready True
|
||||
ContainersReady True
|
||||
PodScheduled True
|
||||
```
|
||||
|
||||
This new built-in gRPC health probing on Kubernetes makes implementing a health-check via gRPC
|
||||
much easier than the older approach that relied on using a separate `exec` probe. Read through
|
||||
the official
|
||||
[documentation](/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-grpc-liveness-probe)
|
||||
to learn more and provide feedback before the feature will be promoted to GA.
|
||||
|
||||
## Summary
|
||||
|
||||
Kubernetes is a popular workload orchestration platform and we add features based on feedback and demand.
|
||||
Features like gRPC probes support is a minor improvement that will make life of many app developers
|
||||
easier and apps more resilient. Try it today and give feedback, before the feature went into GA.
|
|
@ -0,0 +1,162 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Kubernetes 1.24: Volume Populators Graduate to Beta"
|
||||
date: 2022-05-16
|
||||
slug: volume-populators-beta
|
||||
---
|
||||
|
||||
**Author:**
|
||||
Ben Swartzlander (NetApp)
|
||||
|
||||
The volume populators feature is now two releases old and entering beta! The `AnyVolumeDataSource` feature
|
||||
gate defaults to enabled in Kubernetes v1.24, which means that users can specify any custom resource
|
||||
as the data source of a PVC.
|
||||
|
||||
An [earlier blog article](/blog/2021/08/30/volume-populators-redesigned/) detailed how the
|
||||
volume populators feature works. In short, a cluster administrator can install a CRD and
|
||||
associated populator controller in the cluster, and any user who can create instances of
|
||||
the CR can create pre-populated volumes by taking advantage of the populator.
|
||||
|
||||
Multiple populators can be installed side by side for different purposes. The SIG storage
|
||||
community is already seeing some implementations in public, and more prototypes should
|
||||
appear soon.
|
||||
|
||||
Cluster administrations are **strongly encouraged** to install the
|
||||
volume-data-source-validator controller and associated `VolumePopulator` CRD before installing
|
||||
any populators so that users can get feedback about invalid PVC data sources.
|
||||
|
||||
## New Features
|
||||
|
||||
The [lib-volume-populator](https://github.com/kubernetes-csi/lib-volume-populator) library
|
||||
on which populators are built now includes metrics to help operators monitor and detect
|
||||
problems. This library is now beta and latest release is v1.0.1.
|
||||
|
||||
The [volume data source validator](https://github.com/kubernetes-csi/volume-data-source-validator)
|
||||
controller also has metrics support added, and is in beta. The `VolumePopulator` CRD is
|
||||
beta and the latest release is v1.0.1.
|
||||
|
||||
## Trying it out
|
||||
|
||||
To see how this works, you can install the sample "hello" populator and try it
|
||||
out.
|
||||
|
||||
First install the volume-data-source-validator controller.
|
||||
|
||||
```shell
|
||||
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/volume-data-source-validator/v1.0.1/client/config/crd/populator.storage.k8s.io_volumepopulators.yaml
|
||||
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/volume-data-source-validator/v1.0.1/deploy/kubernetes/rbac-data-source-validator.yaml
|
||||
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/volume-data-source-validator/v1.0.1/deploy/kubernetes/setup-data-source-validator.yaml
|
||||
```
|
||||
|
||||
Next install the example populator.
|
||||
|
||||
```shell
|
||||
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/lib-volume-populator/v1.0.1/example/hello-populator/crd.yaml
|
||||
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/lib-volume-populator/87a47467b86052819e9ad13d15036d65b9a32fbb/example/hello-populator/deploy.yaml
|
||||
```
|
||||
|
||||
Your cluster now has a new CustomResourceDefinition that provides a test API named Hello.
|
||||
Create an instance of the `Hello` custom resource, with some text:
|
||||
|
||||
```yaml
|
||||
apiVersion: hello.example.com/v1alpha1
|
||||
kind: Hello
|
||||
metadata:
|
||||
name: example-hello
|
||||
spec:
|
||||
fileName: example.txt
|
||||
fileContents: Hello, world!
|
||||
```
|
||||
|
||||
Create a PVC that refers to that CR as its data source.
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: example-pvc
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Mi
|
||||
dataSourceRef:
|
||||
apiGroup: hello.example.com
|
||||
kind: Hello
|
||||
name: example-hello
|
||||
volumeMode: Filesystem
|
||||
```
|
||||
|
||||
Next, run a Job that reads the file in the PVC.
|
||||
|
||||
```yaml
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
metadata:
|
||||
name: example-job
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
containers:
|
||||
- name: example-container
|
||||
image: busybox:latest
|
||||
command:
|
||||
- cat
|
||||
- /mnt/example.txt
|
||||
volumeMounts:
|
||||
- name: vol
|
||||
mountPath: /mnt
|
||||
restartPolicy: Never
|
||||
volumes:
|
||||
- name: vol
|
||||
persistentVolumeClaim:
|
||||
claimName: example-pvc
|
||||
```
|
||||
|
||||
Wait for the job to complete (including all of its dependencies).
|
||||
|
||||
```shell
|
||||
kubectl wait --for=condition=Complete job/example-job
|
||||
```
|
||||
|
||||
And last examine the log from the job.
|
||||
|
||||
```shell
|
||||
kubectl logs job/example-job
|
||||
```
|
||||
|
||||
The output should be:
|
||||
|
||||
```terminal
|
||||
Hello, world!
|
||||
```
|
||||
|
||||
Note that the volume already contained a text file with the string contents from
|
||||
the CR. This is only the simplest example. Actual populators can set up the volume
|
||||
to contain arbitrary contents.
|
||||
|
||||
## How to write your own volume populator
|
||||
|
||||
Developers interested in writing new poplators are encouraged to use the
|
||||
[lib-volume-populator](https://github.com/kubernetes-csi/lib-volume-populator) library
|
||||
and to only supply a small controller wrapper around the library, and a pod image
|
||||
capable of attaching to volumes and writing the appropriate data to the volume.
|
||||
|
||||
Individual populators can be extremely generic such that they work with every type
|
||||
of PVC, or they can do vendor specific things to rapidly fill a volume with data
|
||||
if the volume was provisioned by a specific CSI driver from the same vendor, for
|
||||
example, by communicating directly with the storage for that volume.
|
||||
|
||||
## How can I learn more?
|
||||
|
||||
The enhancement proposal,
|
||||
[Volume Populators](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1495-volume-populators), includes lots of detail about the history and technical implementation
|
||||
of this feature.
|
||||
|
||||
[Volume populators and data sources](/docs/concepts/storage/persistent-volumes/#volume-populators-and-data-sources), within the documentation topic about persistent volumes,
|
||||
explains how to use this feature in your cluster.
|
||||
|
||||
Please get involved by joining the Kubernetes storage SIG to help us enhance this
|
||||
feature. There are a lot of good ideas already and we'd be thrilled to have more!
|
||||
|
|
@ -0,0 +1,117 @@
|
|||
---
|
||||
layout: blog
|
||||
title: 'Kubernetes 1.24: Prevent unauthorised volume mode conversion'
|
||||
date: 2022-05-18
|
||||
slug: prevent-unauthorised-volume-mode-conversion-alpha
|
||||
---
|
||||
|
||||
**Author:** Raunak Pradip Shah (Mirantis)
|
||||
|
||||
Kubernetes v1.24 introduces a new alpha-level feature that prevents unauthorised users
|
||||
from modifying the volume mode of a [`PersistentVolumeClaim`](/docs/concepts/storage/persistent-volumes/) created from an
|
||||
existing [`VolumeSnapshot`](/docs/concepts/storage/volume-snapshots/) in the Kubernetes cluster.
|
||||
|
||||
|
||||
|
||||
### The problem
|
||||
|
||||
The [Volume Mode](/docs/concepts/storage/persistent-volumes/#volume-mode) determines whether a volume
|
||||
is formatted into a filesystem or presented as a raw block device.
|
||||
|
||||
Users can leverage the `VolumeSnapshot` feature, which has been stable since Kubernetes v1.20,
|
||||
to create a `PersistentVolumeClaim` (shortened as PVC) from an existing `VolumeSnapshot` in
|
||||
the Kubernetes cluster. The PVC spec includes a `dataSource` field, which can point to an
|
||||
existing `VolumeSnapshot` instance.
|
||||
Visit [Create a PersistentVolumeClaim from a Volume Snapshot](/docs/concepts/storage/persistent-volumes/#create-persistent-volume-claim-from-volume-snapshot) for more details.
|
||||
|
||||
When leveraging the above capability, there is no logic that validates whether the mode of the
|
||||
original volume, whose snapshot was taken, matches the mode of the newly created volume.
|
||||
|
||||
This presents a security gap that allows malicious users to potentially exploit an
|
||||
as-yet-unknown vulnerability in the host operating system.
|
||||
|
||||
Many popular storage backup vendors convert the volume mode during the course of a
|
||||
backup operation, for efficiency purposes, which prevents Kubernetes from blocking
|
||||
the operation completely and presents a challenge in distinguishing trusted
|
||||
users from malicious ones.
|
||||
|
||||
### Preventing unauthorised users from converting the volume mode
|
||||
|
||||
In this context, an authorised user is one who has access rights to perform `Update`
|
||||
or `Patch` operations on `VolumeSnapshotContents`, which is a cluster-level resource.
|
||||
It is upto the cluster administrator to provide these rights only to trusted users
|
||||
or applications, like backup vendors.
|
||||
|
||||
If the alpha feature is [enabled](https://kubernetes-csi.github.io/docs/) in
|
||||
`snapshot-controller`, `snapshot-validation-webhook` and `external-provisioner`,
|
||||
then unauthorised users will not be allowed to modify the volume mode of a PVC
|
||||
when it is being created from a `VolumeSnapshot`.
|
||||
|
||||
To convert the volume mode, an authorised user must do the following:
|
||||
|
||||
1. Identify the `VolumeSnapshot` that is to be used as the data source for a newly
|
||||
created PVC in the given namespace.
|
||||
2. Identify the `VolumeSnapshotContent` bound to the above `VolumeSnapshot`.
|
||||
|
||||
```shell
|
||||
kubectl get volumesnapshot -n <namespace>
|
||||
```
|
||||
|
||||
3. Add the annotation [`snapshot.storage.kubernetes.io/allowVolumeModeChange`](/docs/reference/labels-annotations-taints/#snapshot-storage-kubernetes-io-allowvolumemodechange)
|
||||
to the `VolumeSnapshotContent`.
|
||||
|
||||
4. This annotation can be added either via software or manually by the authorised
|
||||
user. The `VolumeSnapshotContent` annotation must look like following manifest fragment:
|
||||
|
||||
```yaml
|
||||
kind: VolumeSnapshotContent
|
||||
metadata:
|
||||
annotations:
|
||||
- snapshot.storage.kubernetes.io/allowVolumeModeChange: "true"
|
||||
...
|
||||
```
|
||||
|
||||
**Note**: For pre-provisioned `VolumeSnapshotContents`, you must take an extra
|
||||
step of setting `spec.sourceVolumeMode` field to either `Filesystem` or `Block`,
|
||||
depending on the mode of the volume from which this snapshot was taken.
|
||||
|
||||
An example is shown below:
|
||||
|
||||
```yaml
|
||||
apiVersion: snapshot.storage.k8s.io/v1
|
||||
kind: VolumeSnapshotContent
|
||||
metadata:
|
||||
annotations:
|
||||
- snapshot.storage.kubernetes.io/allowVolumeModeChange: "true"
|
||||
name: new-snapshot-content-test
|
||||
spec:
|
||||
deletionPolicy: Delete
|
||||
driver: hostpath.csi.k8s.io
|
||||
source:
|
||||
snapshotHandle: 7bdd0de3-aaeb-11e8-9aae-0242ac110002
|
||||
sourceVolumeMode: Filesystem
|
||||
volumeSnapshotRef:
|
||||
name: new-snapshot-test
|
||||
namespace: default
|
||||
```
|
||||
|
||||
Repeat steps 1 to 3 for all `VolumeSnapshotContents` whose volume mode needs to be
|
||||
converted during a backup or restore operation.
|
||||
|
||||
If the annotation shown in step 4 above is present on a `VolumeSnapshotContent`
|
||||
object, Kubernetes will not prevent the volume mode from being converted.
|
||||
Users should keep this in mind before they attempt to add the annotation
|
||||
to any `VolumeSnapshotContent`.
|
||||
|
||||
|
||||
### What's next
|
||||
|
||||
[Enable this feature](https://kubernetes-csi.github.io/docs/) and let us know
|
||||
what you think!
|
||||
|
||||
We hope this feature causes no disruption to existing workflows while preventing
|
||||
malicious users from exploiting security vulnerabilities in their clusters.
|
||||
|
||||
For any queries or issues, join [Kubernetes on Slack](https://slack.k8s.io/) and
|
||||
create a thread in the #sig-storage channel. Alternately, create an issue in the
|
||||
CSI external-snapshotter [repository](https://github.com/kubernetes-csi/external-snapshotter).
|
|
@ -0,0 +1,96 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Kubernetes 1.24: Introducing Non-Graceful Node Shutdown Alpha"
|
||||
date: 2022-05-20
|
||||
slug: kubernetes-1-24-non-graceful-node-shutdown-alpha
|
||||
---
|
||||
|
||||
**Authors** Xing Yang and Yassine Tijani (VMware)
|
||||
|
||||
Kubernetes v1.24 introduces alpha support for [Non-Graceful Node Shutdown](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown). This feature allows stateful workloads to failover to a different node after the original node is shutdown or in a non-recoverable state such as hardware failure or broken OS.
|
||||
|
||||
## How is this different from Graceful Node Shutdown
|
||||
|
||||
You might have heard about the [Graceful Node Shutdown](/docs/concepts/architecture/nodes/#graceful-node-shutdown) capability of Kubernetes,
|
||||
and are wondering how the Non-Graceful Node Shutdown feature is different from that. Graceful Node Shutdown
|
||||
allows Kubernetes to detect when a node is shutting down cleanly, and handles that situation appropriately.
|
||||
A Node Shutdown can be "graceful" only if the node shutdown action can be detected by the kubelet ahead
|
||||
of the actual shutdown. However, there are cases where a node shutdown action may not be detected by
|
||||
the kubelet. This could happen either because the shutdown command does not trigger the systemd inhibitor
|
||||
locks mechanism that kubelet relies upon, or because of a configuration error
|
||||
(the `ShutdownGracePeriod` and `ShutdownGracePeriodCriticalPods` are not configured properly).
|
||||
|
||||
Graceful node shutdown relies on Linux-specific support. The kubelet does not watch for upcoming
|
||||
shutdowns on Windows nodes (this may change in a future Kubernetes release).
|
||||
|
||||
When a node is shutdown but without the kubelet detecting it, pods on that node
|
||||
also shut down ungracefully. For stateless apps, that's often not a problem (a ReplicaSet adds a new pod once
|
||||
the cluster detects that the affected node or pod has failed). For stateful apps, the story is more complicated.
|
||||
If you use a StatefulSet and have a pod from that StatefulSet on a node that fails uncleanly, that affected pod
|
||||
will be marked as terminating; the StatefulSet cannot create a replacement pod because the pod
|
||||
still exists in the cluster.
|
||||
As a result, the application running on the StatefulSet may be degraded or even offline. If the original, shut
|
||||
down node comes up again, the kubelet on that original node reports in, deletes the existing pods, and
|
||||
the control plane makes a replacement pod for that StatefulSet on a different running node.
|
||||
If the original node has failed and does not come up, those stateful pods would be stuck in a
|
||||
terminating status on that failed node indefinitely.
|
||||
|
||||
```
|
||||
$ kubectl get pod -o wide
|
||||
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
|
||||
web-0 1/1 Running 0 100m 10.244.2.4 k8s-node-876-1639279816 <none> <none>
|
||||
web-1 1/1 Terminating 0 100m 10.244.1.3 k8s-node-433-1639279804 <none> <none>
|
||||
```
|
||||
|
||||
## Try out the new non-graceful shutdown handling
|
||||
|
||||
To use the non-graceful node shutdown handling, you must enable the `NodeOutOfServiceVolumeDetach`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) for the `kube-controller-manager`
|
||||
component.
|
||||
|
||||
In the case of a node shutdown, you can manually taint that node as out of service. You should make certain that
|
||||
the node is truly shutdown (not in the middle of restarting) before you add that taint. You could add that
|
||||
taint following a shutdown that the kubelet did not detect and handle in advance; another case where you
|
||||
can use that taint is when the node is in a non-recoverable state due to a hardware failure or a broken OS.
|
||||
The values you set for that taint can be `node.kubernetes.io/out-of-service=nodeshutdown: "NoExecute"`
|
||||
or `node.kubernetes.io/out-of-service=nodeshutdown:" NoSchedule"`.
|
||||
Provided you have enabled the feature gate mentioned earlier, setting the out-of-service taint on a Node
|
||||
means that pods on the node will be deleted unless if there are matching tolerations on the pods.
|
||||
Persistent volumes attached to the shutdown node will be detached, and for StatefulSets, replacement pods will
|
||||
be created successfully on a different running node.
|
||||
|
||||
```
|
||||
$ kubectl taint nodes <node-name> node.kubernetes.io/out-of-service=nodeshutdown:NoExecute
|
||||
|
||||
$ kubectl get pod -o wide
|
||||
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
|
||||
web-0 1/1 Running 0 150m 10.244.2.4 k8s-node-876-1639279816 <none> <none>
|
||||
web-1 1/1 Running 0 10m 10.244.1.7 k8s-node-433-1639279804 <none> <none>
|
||||
```
|
||||
|
||||
Note: Before applying the out-of-service taint, you **must** verify that a node is already in shutdown or power off state (not in the middle of restarting), either because the user intentionally shut it down or the node is down due to hardware failures, OS issues, etc.
|
||||
|
||||
Once all the workload pods that are linked to the out-of-service node are moved to a new running node, and the shutdown node has been recovered, you should remove
|
||||
that taint on the affected node after the node is recovered.
|
||||
If you know that the node will not return to service, you could instead delete the node from the cluster.
|
||||
|
||||
## What’s next?
|
||||
|
||||
Depending on feedback and adoption, the Kubernetes team plans to push the Non-Graceful Node Shutdown implementation to Beta in either 1.25 or 1.26.
|
||||
|
||||
This feature requires a user to manually add a taint to the node to trigger workloads failover and remove the taint after the node is recovered. In the future, we plan to find ways to automatically detect and fence nodes that are shutdown/failed and automatically failover workloads to another node.
|
||||
|
||||
## How can I learn more?
|
||||
|
||||
Check out the [documentation](/docs/concepts/architecture/nodes/#non-graceful-node-shutdown)
|
||||
for non-graceful node shutdown.
|
||||
|
||||
## How to get involved?
|
||||
|
||||
This feature has a long story. Yassine Tijani ([yastij](https://github.com/yastij)) started the KEP more than two years ago. Xing Yang ([xing-yang](https://github.com/xing-yang)) continued to drive the effort. There were many discussions among SIG Storage, SIG Node, and API reviewers to nail down the design details. Ashutosh Kumar ([sonasingh46](https://github.com/sonasingh46)) did most of the implementation and brought it to Alpha in Kubernetes 1.24.
|
||||
|
||||
We want to thank the following people for their insightful reviews: Tim Hockin ([thockin](https://github.com/thockin)) for his guidance on the design, Jing Xu ([jingxu97](https://github.com/jingxu97)), Hemant Kumar ([gnufied](https://github.com/gnufied)), and Michelle Au ([msau42](https://github.com/msau42)) for reviews from SIG Storage side, and Mrunal Patel ([mrunalp](https://github.com/mrunalp)), David Porter ([bobbypage](https://github.com/bobbypage)), Derek Carr ([derekwaynecarr](https://github.com/derekwaynecarr)), and Danielle Endocrimes ([endocrimes](https://github.com/endocrimes)) for reviews from SIG Node side.
|
||||
|
||||
There are many people who have helped review the design and implementation along the way. We want to thank everyone who has contributed to this effort including the about 30 people who have reviewed the [KEP](https://github.com/kubernetes/enhancements/pull/1116) and implementation over the last couple of years.
|
||||
|
||||
This feature is a collaboration between SIG Storage and SIG Node. For those interested in getting involved with the design and development of any part of the Kubernetes Storage system, join the [Kubernetes Storage Special Interest Group](https://github.com/kubernetes/community/tree/master/sig-storage) (SIG). For those interested in getting involved with the design and development of the components that support the controlled interactions between pods and host resources, join the [Kubernetes Node SIG](https://github.com/kubernetes/community/tree/master/sig-node).
|
|
@ -0,0 +1,137 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Kubernetes 1.24: Avoid Collisions Assigning IP Addresses to Services"
|
||||
date: 2022-05-23
|
||||
slug: service-ip-dynamic-and-static-allocation
|
||||
---
|
||||
|
||||
**Author:** Antonio Ojea (Red Hat)
|
||||
|
||||
|
||||
In Kubernetes, [Services](/docs/concepts/services-networking/service/) are an abstract way to expose
|
||||
an application running on a set of Pods. Services
|
||||
can have a cluster-scoped virtual IP address (using a Service of `type: ClusterIP`).
|
||||
Clients can connect using that virtual IP address, and Kubernetes then load-balances traffic to that
|
||||
Service across the different backing Pods.
|
||||
|
||||
## How Service ClusterIPs are allocated?
|
||||
|
||||
A Service `ClusterIP` can be assigned:
|
||||
|
||||
_dynamically_
|
||||
: the cluster's control plane automatically picks a free IP address from within the configured IP range for `type: ClusterIP` Services.
|
||||
|
||||
_statically_
|
||||
: you specify an IP address of your choice, from within the configured IP range for Services.
|
||||
|
||||
Across your whole cluster, every Service `ClusterIP` must be unique.
|
||||
Trying to create a Service with a specific `ClusterIP` that has already
|
||||
been allocated will return an error.
|
||||
|
||||
## Why do you need to reserve Service Cluster IPs?
|
||||
|
||||
Sometimes you may want to have Services running in well-known IP addresses, so other components and
|
||||
users in the cluster can use them.
|
||||
|
||||
The best example is the DNS Service for the cluster. Some Kubernetes installers assign the 10th address from
|
||||
the Service IP range to the DNS service. Assuming you configured your cluster with Service IP range
|
||||
10.96.0.0/16 and you want your DNS Service IP to be 10.96.0.10, you'd have to create a Service like
|
||||
this:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
labels:
|
||||
k8s-app: kube-dns
|
||||
kubernetes.io/cluster-service: "true"
|
||||
kubernetes.io/name: CoreDNS
|
||||
name: kube-dns
|
||||
namespace: kube-system
|
||||
spec:
|
||||
clusterIP: 10.96.0.10
|
||||
ports:
|
||||
- name: dns
|
||||
port: 53
|
||||
protocol: UDP
|
||||
targetPort: 53
|
||||
- name: dns-tcp
|
||||
port: 53
|
||||
protocol: TCP
|
||||
targetPort: 53
|
||||
selector:
|
||||
k8s-app: kube-dns
|
||||
type: ClusterIP
|
||||
```
|
||||
|
||||
but as I explained before, the IP address 10.96.0.10 has not been reserved; if other Services are created
|
||||
before or in parallel with dynamic allocation, there is a chance they can allocate this IP, hence,
|
||||
you will not be able to create the DNS Service because it will fail with a conflict error.
|
||||
|
||||
## How can you avoid Service ClusterIP conflicts? {#avoid-ClusterIP-conflict}
|
||||
|
||||
In Kubernetes 1.24, you can enable a new feature gate `ServiceIPStaticSubrange`.
|
||||
Turning this on allows you to use a different IP
|
||||
allocation strategy for Services, reducing the risk of collision.
|
||||
|
||||
The `ClusterIP` range will be divided, based on the formula `min(max(16, cidrSize / 16), 256)`,
|
||||
described as _never less than 16 or more than 256 with a graduated step between them_.
|
||||
|
||||
Dynamic IP assignment will use the upper band by default, once this has been exhausted it will
|
||||
use the lower range. This will allow users to use static allocations on the lower band with a low
|
||||
risk of collision.
|
||||
|
||||
Examples:
|
||||
|
||||
#### Service IP CIDR block: 10.96.0.0/24
|
||||
|
||||
Range Size: 2<sup>8</sup> - 2 = 254
|
||||
Band Offset: `min(max(16,256/16),256)` = `min(16,256)` = 16
|
||||
Static band start: 10.96.0.1
|
||||
Static band end: 10.96.0.16
|
||||
Range end: 10.96.0.254
|
||||
|
||||
{{< mermaid >}}
|
||||
pie showData
|
||||
title 10.96.0.0/24
|
||||
"Static" : 16
|
||||
"Dynamic" : 238
|
||||
{{< /mermaid >}}
|
||||
|
||||
#### Service IP CIDR block: 10.96.0.0/20
|
||||
|
||||
Range Size: 2<sup>12</sup> - 2 = 4094
|
||||
Band Offset: `min(max(16,256/16),256)` = `min(256,256)` = 256
|
||||
Static band start: 10.96.0.1
|
||||
Static band end: 10.96.1.0
|
||||
Range end: 10.96.15.254
|
||||
|
||||
{{< mermaid >}}
|
||||
pie showData
|
||||
title 10.96.0.0/20
|
||||
"Static" : 256
|
||||
"Dynamic" : 3838
|
||||
{{< /mermaid >}}
|
||||
|
||||
#### Service IP CIDR block: 10.96.0.0/16
|
||||
|
||||
Range Size: 2<sup>16</sup> - 2 = 65534
|
||||
Band Offset: `min(max(16,65536/16),256)` = `min(4096,256)` = 256
|
||||
Static band start: 10.96.0.1
|
||||
Static band ends: 10.96.1.0
|
||||
Range end: 10.96.255.254
|
||||
|
||||
{{< mermaid >}}
|
||||
pie showData
|
||||
title 10.96.0.0/16
|
||||
"Static" : 256
|
||||
"Dynamic" : 65278
|
||||
{{< /mermaid >}}
|
||||
|
||||
## Get involved with SIG Network
|
||||
|
||||
The current SIG-Network [KEPs](https://github.com/orgs/kubernetes/projects/10) and [issues](https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+is%3Aissue+label%3Asig%2Fnetwork) on GitHub illustrate the SIG’s areas of emphasis.
|
||||
|
||||
[SIG Network meetings](https://github.com/kubernetes/community/tree/master/sig-network) are a friendly, welcoming venue for you to connect with the community and share your ideas.
|
||||
Looking forward to hearing from you!
|
||||
|
|
@ -0,0 +1,251 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Contextual Logging in Kubernetes 1.24"
|
||||
date: 2022-05-25
|
||||
slug: contextual-logging
|
||||
canonicalUrl: https://kubernetes.dev/blog/2022/05/25/contextual-logging/
|
||||
---
|
||||
|
||||
**Authors:** Patrick Ohly (Intel)
|
||||
|
||||
The [Structured Logging Working
|
||||
Group](https://github.com/kubernetes/community/blob/master/wg-structured-logging/README.md)
|
||||
has added new capabilities to the logging infrastructure in Kubernetes
|
||||
1.24. This blog post explains how developers can take advantage of those to
|
||||
make log output more useful and how they can get involved with improving Kubernetes.
|
||||
|
||||
## Structured logging
|
||||
|
||||
The goal of [structured
|
||||
logging](https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/1602-structured-logging/README.md)
|
||||
is to replace C-style formatting and the resulting opaque log strings with log
|
||||
entries that have a well-defined syntax for storing message and parameters
|
||||
separately, for example as a JSON struct.
|
||||
|
||||
When using the traditional klog text output format for structured log calls,
|
||||
strings were originally printed with `\n` escape sequences, except when
|
||||
embedded inside a struct. For structs, log entries could still span multiple
|
||||
lines, with no clean way to split the log stream into individual entries:
|
||||
|
||||
```
|
||||
I1112 14:06:35.783529 328441 structured_logging.go:51] "using InfoS" longData={Name:long Data:Multiple
|
||||
lines
|
||||
with quite a bit
|
||||
of text. internal:0}
|
||||
I1112 14:06:35.783549 328441 structured_logging.go:52] "using InfoS with\nthe message across multiple lines" int=1 stringData="long: Multiple\nlines\nwith quite a bit\nof text." str="another value"
|
||||
```
|
||||
|
||||
Now, the `<` and `>` markers along with indentation are used to ensure that splitting at a
|
||||
klog header at the start of a line is reliable and the resulting output is human-readable:
|
||||
|
||||
```
|
||||
I1126 10:31:50.378204 121736 structured_logging.go:59] "using InfoS" longData=<
|
||||
{Name:long Data:Multiple
|
||||
lines
|
||||
with quite a bit
|
||||
of text. internal:0}
|
||||
>
|
||||
I1126 10:31:50.378228 121736 structured_logging.go:60] "using InfoS with\nthe message across multiple lines" int=1 stringData=<
|
||||
long: Multiple
|
||||
lines
|
||||
with quite a bit
|
||||
of text.
|
||||
> str="another value"
|
||||
```
|
||||
|
||||
Note that the log message itself is printed with quoting. It is meant to be a
|
||||
fixed string that identifies a log entry, so newlines should be avoided there.
|
||||
|
||||
Before Kubernetes 1.24, some log calls in kube-scheduler still used `klog.Info`
|
||||
for multi-line strings to avoid the unreadable output. Now all log calls have
|
||||
been updated to support structured logging.
|
||||
|
||||
## Contextual logging
|
||||
|
||||
[Contextual logging](https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/3077-contextual-logging/README.md)
|
||||
is based on the [go-logr API](https://github.com/go-logr/logr#a-minimal-logging-api-for-go). The key
|
||||
idea is that libraries are passed a logger instance by their caller and use
|
||||
that for logging instead of accessing a global logger. The binary decides about
|
||||
the logging implementation, not the libraries. The go-logr API is designed
|
||||
around structured logging and supports attaching additional information to a
|
||||
logger.
|
||||
|
||||
This enables additional use cases:
|
||||
|
||||
- The caller can attach additional information to a logger:
|
||||
- [`WithName`](https://pkg.go.dev/github.com/go-logr/logr#Logger.WithName) adds a prefix
|
||||
- [`WithValues`](https://pkg.go.dev/github.com/go-logr/logr#Logger.WithValues) adds key/value pairs
|
||||
|
||||
When passing this extended logger into a function and a function uses it
|
||||
instead of the global logger, the additional information is
|
||||
then included in all log entries, without having to modify the code that
|
||||
generates the log entries. This is useful in highly parallel applications
|
||||
where it can become hard to identify all log entries for a certain operation
|
||||
because the output from different operations gets interleaved.
|
||||
|
||||
- When running unit tests, log output can be associated with the current test.
|
||||
Then when a test fails, only the log output of the failed test gets shown
|
||||
by `go test`. That output can also be more verbose by default because it
|
||||
will not get shown for successful tests. Tests can be run in parallel
|
||||
without interleaving their output.
|
||||
|
||||
One of the design decisions for contextual logging was to allow attaching a
|
||||
logger as value to a `context.Context`. Since the logger encapsulates all
|
||||
aspects of the intended logging for the call, it is *part* of the context and
|
||||
not just *using* it. A practical advantage is that many APIs already have a
|
||||
`ctx` parameter or adding one has additional advantages, like being able to get
|
||||
rid of `context.TODO()` calls inside the functions.
|
||||
|
||||
Another decision was to not break compatibility with klog v2:
|
||||
|
||||
- Libraries that use the traditional klog logging calls in a binary that has
|
||||
set up contextual logging will work and log through the logging backend
|
||||
chosen by the binary. However, such log output will not include the
|
||||
additional information and will not work well in unit tests, so libraries
|
||||
should be modified to support contextual logging. The [migration guide](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md)
|
||||
for structured logging has been extended to also cover contextual logging.
|
||||
|
||||
- When a library supports contextual logging and retrieves a logger from its
|
||||
context, it will still work in a binary that does not initialize contextual
|
||||
logging because it will get a logger that logs through klog.
|
||||
|
||||
In Kubernetes 1.24, contextual logging is a new alpha feature with
|
||||
`ContextualLogging` as feature gate. When disabled (the default), the new klog
|
||||
API calls for contextual logging (see below) become no-ops to avoid performance
|
||||
or functional regressions.
|
||||
|
||||
No Kubernetes component has been converted yet. An [example program](https://github.com/kubernetes/kubernetes/blob/v1.24.0-beta.0/staging/src/k8s.io/component-base/logs/example/cmd/logger.go)
|
||||
in the Kubernetes repository demonstrates how to enable contextual logging in a
|
||||
binary and how the output depends on the binary's parameters:
|
||||
|
||||
```console
|
||||
$ cd $GOPATH/src/k8s.io/kubernetes/staging/src/k8s.io/component-base/logs/example/cmd/
|
||||
$ go run . --help
|
||||
...
|
||||
--feature-gates mapStringBool A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
|
||||
AllAlpha=true|false (ALPHA - default=false)
|
||||
AllBeta=true|false (BETA - default=false)
|
||||
ContextualLogging=true|false (ALPHA - default=false)
|
||||
$ go run . --feature-gates ContextualLogging=true
|
||||
...
|
||||
I0404 18:00:02.916429 451895 logger.go:94] "example/myname: runtime" foo="bar" duration="1m0s"
|
||||
I0404 18:00:02.916447 451895 logger.go:95] "example: another runtime" foo="bar" duration="1m0s"
|
||||
```
|
||||
|
||||
The `example` prefix and `foo="bar"` were added by the caller of the function
|
||||
which logs the `runtime` message and `duration="1m0s"` value.
|
||||
|
||||
The sample code for klog includes an
|
||||
[example](https://github.com/kubernetes/klog/blob/v2.60.1/ktesting/example/example_test.go)
|
||||
for a unit test with per-test output.
|
||||
|
||||
## klog enhancements
|
||||
|
||||
### Contextual logging API
|
||||
|
||||
The following calls manage the lookup of a logger:
|
||||
|
||||
[`FromContext`](https://pkg.go.dev/k8s.io/klog/v2#FromContext)
|
||||
: from a `context` parameter, with fallback to the global logger
|
||||
|
||||
[`Background`](https://pkg.go.dev/k8s.io/klog/v2#Background)
|
||||
: the global fallback, with no intention to support contextual logging
|
||||
|
||||
[`TODO`](https://pkg.go.dev/k8s.io/klog/v2#TODO)
|
||||
: the global fallback, but only as a temporary solution until the function gets extended to accept
|
||||
a logger through its parameters
|
||||
|
||||
[`SetLoggerWithOptions`](https://pkg.go.dev/k8s.io/klog/v2#SetLoggerWithOptions)
|
||||
: changes the fallback logger; when called with [`ContextualLogger(true)`](https://pkg.go.dev/k8s.io/klog/v2#ContextualLogger),
|
||||
the logger is ready to be called directly, in which case logging will be done
|
||||
without going through klog
|
||||
|
||||
To support the feature gate mechanism in Kubernetes, klog has wrapper calls for
|
||||
the corresponding go-logr calls and a global boolean controlling their behavior:
|
||||
|
||||
- [`LoggerWithName`](https://pkg.go.dev/k8s.io/klog/v2#LoggerWithName)
|
||||
- [`LoggerWithValues`](https://pkg.go.dev/k8s.io/klog/v2#LoggerWithValues)
|
||||
- [`NewContext`](https://pkg.go.dev/k8s.io/klog/v2#NewContext)
|
||||
- [`EnableContextualLogging`](https://pkg.go.dev/k8s.io/klog/v2#EnableContextualLogging)
|
||||
|
||||
Usage of those functions in Kubernetes code is enforced with a linter
|
||||
check. The klog default for contextual logging is to enable the functionality
|
||||
because it is considered stable in klog. It is only in Kubernetes binaries
|
||||
where that default gets overridden and (in some binaries) controlled via the
|
||||
`--feature-gate` parameter.
|
||||
|
||||
### ktesting logger
|
||||
|
||||
The new [ktesting](https://pkg.go.dev/k8s.io/klog/v2@v2.60.1/ktesting) package
|
||||
implements logging through `testing.T` using klog's text output format. It has
|
||||
a [single API call](https://pkg.go.dev/k8s.io/klog/v2@v2.60.1/ktesting#NewTestContext) for
|
||||
instrumenting a test case and [support for command line flags](https://pkg.go.dev/k8s.io/klog/v2@v2.60.1/ktesting/init).
|
||||
|
||||
### klogr
|
||||
|
||||
[`klog/klogr`](https://pkg.go.dev/k8s.io/klog/v2@v2.60.1/klogr) continues to be
|
||||
supported and it's default behavior is unchanged: it formats structured log
|
||||
entries using its own, custom format and prints the result via klog.
|
||||
|
||||
However, this usage is discouraged because that format is neither
|
||||
machine-readable (in contrast to real JSON output as produced by zapr, the
|
||||
go-logr implementation used by Kubernetes) nor human-friendly (in contrast to
|
||||
the klog text format).
|
||||
|
||||
Instead, a klogr instance should be created with
|
||||
[`WithFormat(FormatKlog)`](https://pkg.go.dev/k8s.io/klog/v2@v2.60.1/klogr#WithFormat)
|
||||
which chooses the klog text format. A simpler construction method with the same
|
||||
result is the new
|
||||
[`klog.NewKlogr`](https://pkg.go.dev/k8s.io/klog/v2#NewKlogr). That is the
|
||||
logger that klog returns as fallback when nothing else is configured.
|
||||
|
||||
### Reusable output test
|
||||
|
||||
A lot of go-logr implementations have very similar unit tests where they check
|
||||
the result of certain log calls. If a developer didn't know about certain
|
||||
caveats like for example a `String` function that panics when called, then it
|
||||
is likely that both the handling of such caveats and the unit test are missing.
|
||||
|
||||
[`klog.test`](https://pkg.go.dev/k8s.io/klog/v2@v2.60.1/test) is a reusable set
|
||||
of test cases that can be applied to a go-logr implementation.
|
||||
|
||||
### Output flushing
|
||||
|
||||
klog used to start a goroutine unconditionally during `init` which flushed
|
||||
buffered data at a hard-coded interval. Now that goroutine is only started on
|
||||
demand (i.e. when writing to files with buffering) and can be controlled with
|
||||
[`StopFlushDaemon`](https://pkg.go.dev/k8s.io/klog/v2#StopFlushDaemon) and
|
||||
[`StartFlushDaemon`](https://pkg.go.dev/k8s.io/klog/v2#StartFlushDaemon).
|
||||
|
||||
When a go-logr implementation buffers data, flushing that data can be
|
||||
integrated into [`klog.Flush`](https://pkg.go.dev/k8s.io/klog/v2#Flush) by
|
||||
registering the logger with the
|
||||
[`FlushLogger`](https://pkg.go.dev/k8s.io/klog/v2#FlushLogger) option.
|
||||
|
||||
### Various other changes
|
||||
|
||||
For a description of all other enhancements see in the [release notes](https://github.com/kubernetes/klog/releases).
|
||||
|
||||
## logcheck
|
||||
|
||||
Originally designed as a linter for structured log calls, the
|
||||
[`logcheck`](https://github.com/kubernetes/klog/tree/788efcdee1e9be0bfbe5b076343d447314f2377e/hack/tools/logcheck)
|
||||
tool has been enhanced to support also contextual logging and traditional klog
|
||||
log calls. These enhanced checks already found bugs in Kubernetes, like calling
|
||||
`klog.Info` instead of `klog.Infof` with a format string and parameters.
|
||||
|
||||
It can be included as a plugin in a `golangci-lint` invocation, which is how
|
||||
[Kubernetes uses it now](https://github.com/kubernetes/kubernetes/commit/17e3c555c5115f8c9176bae10ba45baa04d23a7b),
|
||||
or get invoked stand-alone.
|
||||
|
||||
We are in the process of [moving the tool](https://github.com/kubernetes/klog/issues/312) into a new repository because it isn't
|
||||
really related to klog and its releases should be tracked and tagged properly.
|
||||
|
||||
## Next steps
|
||||
|
||||
The [Structured Logging WG](https://github.com/kubernetes/community/tree/master/wg-structured-logging)
|
||||
is always looking for new contributors. The migration
|
||||
away from C-style logging is now going to target structured, contextual logging
|
||||
in one step to reduce the overall code churn and number of PRs. Changing log
|
||||
calls is good first contribution to Kubernetes and an opportunity to get to
|
||||
know code in various different areas.
|
|
@ -0,0 +1,148 @@
|
|||
---
|
||||
layout: blog
|
||||
title: 'Kubernetes 1.24: Maximum Unavailable Replicas for StatefulSet'
|
||||
date: 2022-05-27
|
||||
slug: maxunavailable-for-statefulset
|
||||
---
|
||||
|
||||
**Author:** Mayank Kumar (Salesforce)
|
||||
|
||||
Kubernetes [StatefulSets](/docs/concepts/workloads/controllers/statefulset/), since their introduction in
|
||||
1.5 and becoming stable in 1.9, have been widely used to run stateful applications. They provide stable pod identity, persistent
|
||||
per pod storage and ordered graceful deployment, scaling and rolling updates. You can think of StatefulSet as the atomic building
|
||||
block for running complex stateful applications. As the use of Kubernetes has grown, so has the number of scenarios requiring
|
||||
StatefulSets. Many of these scenarios, require faster rolling updates than the currently supported one-pod-at-a-time updates, in the
|
||||
case where you're using the `OrderedReady` Pod management policy for a StatefulSet.
|
||||
|
||||
|
||||
Here are some examples:
|
||||
|
||||
- I am using a StatefulSet to orchestrate a multi-instance, cache based application where the size of the cache is large. The cache
|
||||
starts cold and requires some siginificant amount of time before the container can start. There could be more initial startup tasks
|
||||
that are required. A RollingUpdate on this StatefulSet would take a lot of time before the application is fully updated. If the
|
||||
StatefulSet supported updating more than one pod at a time, it would result in a much faster update.
|
||||
|
||||
- My stateful application is composed of leaders and followers or one writer and multiple readers. I have multiple readers or
|
||||
followers and my application can tolerate multiple pods going down at the same time. I want to update this application more than
|
||||
one pod at a time so that i get the new updates rolled out quickly, especially if the number of instances of my application are
|
||||
large. Note that my application still requires unique identity per pod.
|
||||
|
||||
|
||||
In order to support such scenarios, Kubernetes 1.24 includes a new alpha feature to help. Before you can use the new feature you must
|
||||
enable the `MaxUnavailableStatefulSet` feature flag. Once you enable that, you can specify a new field called `maxUnavailable`, part
|
||||
of the `spec` for a StatefulSet. For example:
|
||||
|
||||
```
|
||||
apiVersion: apps/v1
|
||||
kind: StatefulSet
|
||||
metadata:
|
||||
name: web
|
||||
namespace: default
|
||||
spec:
|
||||
podManagementPolicy: OrderedReady # you must set OrderedReady
|
||||
replicas: 5
|
||||
selector:
|
||||
matchLabels:
|
||||
app: nginx
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: nginx
|
||||
spec:
|
||||
containers:
|
||||
- image: k8s.gcr.io/nginx-slim:0.8
|
||||
imagePullPolicy: IfNotPresent
|
||||
name: nginx
|
||||
updateStrategy:
|
||||
rollingUpdate:
|
||||
maxUnavailable: 2 # this is the new alpha field, whose default value is 1
|
||||
partition: 0
|
||||
type: RollingUpdate
|
||||
```
|
||||
|
||||
If you enable the new feature and you don't specify a value for `maxUnavailable` in a StatefulSet, Kubernetes applies a default
|
||||
`maxUnavailable: 1`. This matches the behavior you would see if you don't enable the new feature.
|
||||
|
||||
I'll run through a scenario based on that example manifest to demonstrate how this feature works. I will deploy a StatefulSet that
|
||||
has 5 replicas, with `maxUnavailable` set to 2 and `partition` set to 0.
|
||||
|
||||
I can trigger a rolling update by changing the image to `k8s.gcr.io/nginx-slim:0.9`. Once I initiate the rolling update, I can
|
||||
watch the pods update 2 at a time as the current value of maxUnavailable is 2. The below output shows a span of time and is not
|
||||
complete. The maxUnavailable can be an absolute number (for example, 2) or a percentage of desired Pods (for example, 10%). The
|
||||
absolute number is calculated from percentage by rounding down.
|
||||
```
|
||||
kubectl get pods --watch
|
||||
```
|
||||
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
web-0 1/1 Running 0 85s
|
||||
web-1 1/1 Running 0 2m6s
|
||||
web-2 1/1 Running 0 106s
|
||||
web-3 1/1 Running 0 2m47s
|
||||
web-4 1/1 Running 0 2m27s
|
||||
web-4 1/1 Terminating 0 5m43s ----> start terminating 4
|
||||
web-3 1/1 Terminating 0 6m3s ----> start terminating 3
|
||||
web-3 0/1 Terminating 0 6m7s
|
||||
web-3 0/1 Pending 0 0s
|
||||
web-3 0/1 Pending 0 0s
|
||||
web-4 0/1 Terminating 0 5m48s
|
||||
web-4 0/1 Terminating 0 5m48s
|
||||
web-3 0/1 ContainerCreating 0 2s
|
||||
web-3 1/1 Running 0 2s
|
||||
web-4 0/1 Pending 0 0s
|
||||
web-4 0/1 Pending 0 0s
|
||||
web-4 0/1 ContainerCreating 0 0s
|
||||
web-4 1/1 Running 0 1s
|
||||
web-2 1/1 Terminating 0 5m46s ----> start terminating 2 (only after both 4 and 3 are running)
|
||||
web-1 1/1 Terminating 0 6m6s ----> start terminating 1
|
||||
web-2 0/1 Terminating 0 5m47s
|
||||
web-1 0/1 Terminating 0 6m7s
|
||||
web-1 0/1 Pending 0 0s
|
||||
web-1 0/1 Pending 0 0s
|
||||
web-1 0/1 ContainerCreating 0 1s
|
||||
web-1 1/1 Running 0 2s
|
||||
web-2 0/1 Pending 0 0s
|
||||
web-2 0/1 Pending 0 0s
|
||||
web-2 0/1 ContainerCreating 0 0s
|
||||
web-2 1/1 Running 0 1s
|
||||
web-0 1/1 Terminating 0 6m6s ----> start terminating 0 (only after 2 and 1 are running)
|
||||
web-0 0/1 Terminating 0 6m7s
|
||||
web-0 0/1 Pending 0 0s
|
||||
web-0 0/1 Pending 0 0s
|
||||
web-0 0/1 ContainerCreating 0 0s
|
||||
web-0 1/1 Running 0 1s
|
||||
```
|
||||
Note that as soon as the rolling update starts, both 4 and 3 (the two highest ordinal pods) start terminating at the same time. Pods
|
||||
with ordinal 4 and 3 may become ready at their own pace. As soon as both pods 4 and 3 are ready, pods 2 and 1 start terminating at the
|
||||
same time. When pods 2 and 1 are both running and ready, pod 0 starts terminating.
|
||||
|
||||
In Kubernetes, updates to StatefulSets follow a strict ordering when updating Pods. In this example, the update starts at replica 4, then
|
||||
replica 3, then replica 2, and so on, one pod at a time. When going one pod at a time, its not possible for 3 to be running and ready
|
||||
before 4. When `maxUnavailable` is more than 1 (in the example scenario I set `maxUnavailable` to 2), it is possible that replica 3 becomes
|
||||
ready and running before replica 4 is ready—and that is ok. If you're a developer and you set `maxUnavailable` to more than 1, you should
|
||||
know that this outcome is possible and you must ensure that your application is able to handle such ordering issues that occur
|
||||
if any. When you set `maxUnavailable` greater than 1, the ordering is guaranteed in between each batch of pods being updated. That guarantee
|
||||
means that pods in update batch 2 (replicas 2 and 1) cannot start updating until the pods from batch 0 (replicas 4 and 3) are ready.
|
||||
|
||||
Although Kubernetes refers to these as _replicas_, your stateful application may have a different view and each pod of the StatefulSet may
|
||||
be holding completely different data than other pods. The important thing here is that updates to StatefulSets happen in batches, and you can
|
||||
now have a batch size larger than 1 (as an alpha feature).
|
||||
|
||||
Also note, that the above behavior is with `podManagementPolicy: OrderedReady`. If you defined a StatefulSet as `podManagementPolicy: Parallel`,
|
||||
not only `maxUnavailable` number of replicas are terminated at the same time; `maxUnavailable` number of replicas start in `ContainerCreating`
|
||||
phase at the same time as well. This is called bursting.
|
||||
|
||||
So, now you may have a lot of questions about:-
|
||||
- What is the behavior when you set `podManagementPolicy: Parallel`?
|
||||
- What is the behavior when `partition` to a value other than `0`?
|
||||
|
||||
It might be better to try and see it for yourself. This is an alpha feature, and the Kubernetes contributors are looking for feedback on this feature. Did
|
||||
this help you achieve your stateful scenarios Did you find a bug or do you think the behavior as implemented is not intuitive or can
|
||||
break applications or catch them by surprise? Please [open an issue](https://github.com/kubernetes/kubernetes/issues) to let us know.
|
||||
|
||||
## Further reading and next steps {#next-steps}
|
||||
- [Maximum unavailable Pods](/docs/concepts/workloads/controllers/statefulset/#maximum-unavailable-pods)
|
||||
- [KEP for MaxUnavailable for StatefulSet](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/961-maxunavailable-for-statefulset)
|
||||
- [Implementation](https://github.com/kubernetes/kubernetes/pull/82162/files)
|
||||
- [Enhancement Tracking Issue](https://github.com/kubernetes/enhancements/issues/961)
|
|
@ -0,0 +1,19 @@
|
|||
---
|
||||
layout: blog
|
||||
title: "Annual Report Summary 2021"
|
||||
date: 2022-06-01
|
||||
slug: annual-report-summary-2021
|
||||
---
|
||||
|
||||
**Author:** Paris Pittman (Steering Committee)
|
||||
|
||||
Last year, we published our first [Annual Report Summary](/blog/2021/06/28/announcing-kubernetes-community-group-annual-reports/) for 2020 and it's already time for our second edition!
|
||||
|
||||
[2021 Annual Report Summary](https://www.cncf.io/reports/kubernetes-annual-report-2021/)
|
||||
|
||||
This summary reflects the work that has been done in 2021 and the initiatives on deck for the rest of 2022. Please forward to organizations and indidviduals participating in upstream activities, planning cloud native strategies, and/or those looking to help out. To find a specific community group's complete report, go to the [kubernetes/community repo](https://github.com/kubernetes/community) under the groups folder. Example: [sig-api-machinery/annual-report-2021.md](https://github.com/kubernetes/community/blob/master/sig-api-machinery/annual-report-2021.md)
|
||||
|
||||
You’ll see that this report summary is a growth area in itself. It takes us roughly 6 months to prepare and execute, which isn’t helpful or valuable to anyone as a fast moving project with short and long term needs. How can we make this better? Provide your feedback here: https://github.com/kubernetes/steering/issues/242
|
||||
|
||||
Reference:
|
||||
[Annual Report Documentation](https://github.com/kubernetes/community/blob/master/committee-steering/governance/annual-reports.md)
|
|
@ -312,16 +312,18 @@ controller deletes the node from its list of nodes.
|
|||
The third is monitoring the nodes' health. The node controller is
|
||||
responsible for:
|
||||
|
||||
- In the case that a node becomes unreachable, updating the NodeReady condition
|
||||
of within the Node's `.status`. In this case the node controller sets the
|
||||
NodeReady condition to `ConditionUnknown`.
|
||||
- In the case that a node becomes unreachable, updating the `Ready` condition
|
||||
in the Node's `.status` field. In this case the node controller sets the
|
||||
`Ready` condition to `Unknown`.
|
||||
- If a node remains unreachable: triggering
|
||||
[API-initiated eviction](/docs/concepts/scheduling-eviction/api-eviction/)
|
||||
for all of the Pods on the unreachable node. By default, the node controller
|
||||
waits 5 minutes between marking the node as `ConditionUnknown` and submitting
|
||||
waits 5 minutes between marking the node as `Unknown` and submitting
|
||||
the first eviction request.
|
||||
|
||||
The node controller checks the state of each node every `--node-monitor-period` seconds.
|
||||
By default, the node controller checks the state of each node every 5 seconds.
|
||||
This period can be configured using the `--node-monitor-period` flag on the
|
||||
`kube-controller-manager` component.
|
||||
|
||||
### Rate limits on eviction
|
||||
|
||||
|
@ -331,7 +333,7 @@ from more than 1 node per 10 seconds.
|
|||
|
||||
The node eviction behavior changes when a node in a given availability zone
|
||||
becomes unhealthy. The node controller checks what percentage of nodes in the zone
|
||||
are unhealthy (NodeReady condition is `ConditionUnknown` or `ConditionFalse`) at
|
||||
are unhealthy (the `Ready` condition is `Unknown` or `False`) at
|
||||
the same time:
|
||||
|
||||
- If the fraction of unhealthy nodes is at least `--unhealthy-zone-threshold`
|
||||
|
@ -384,7 +386,7 @@ If you want to explicitly reserve resources for non-Pod processes, see
|
|||
|
||||
## Node topology
|
||||
|
||||
{{< feature-state state="alpha" for_k8s_version="v1.16" >}}
|
||||
{{< feature-state state="beta" for_k8s_version="v1.18" >}}
|
||||
|
||||
If you have enabled the `TopologyManager`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/), then
|
||||
|
@ -412,7 +414,7 @@ enabled by default in 1.21.
|
|||
|
||||
Note that by default, both configuration options described below,
|
||||
`shutdownGracePeriod` and `shutdownGracePeriodCriticalPods` are set to zero,
|
||||
thus not activating Graceful node shutdown functionality.
|
||||
thus not activating the graceful node shutdown functionality.
|
||||
To activate the feature, the two kubelet config settings should be configured appropriately and
|
||||
set to non-zero values.
|
||||
|
||||
|
@ -450,6 +452,56 @@ Reason: Terminated
|
|||
Message: Pod was terminated in response to imminent node shutdown.
|
||||
```
|
||||
|
||||
{{< /note >}}
|
||||
|
||||
## Non Graceful node shutdown {#non-graceful-node-shutdown}
|
||||
|
||||
{{< feature-state state="alpha" for_k8s_version="v1.24" >}}
|
||||
|
||||
A node shutdown action may not be detected by kubelet's Node Shutdown Mananger,
|
||||
either because the command does not trigger the inhibitor locks mechanism used by
|
||||
kubelet or because of a user error, i.e., the ShutdownGracePeriod and
|
||||
ShutdownGracePeriodCriticalPods are not configured properly. Please refer to above
|
||||
section [Graceful Node Shutdown](#graceful-node-shutdown) for more details.
|
||||
|
||||
When a node is shutdown but not detected by kubelet's Node Shutdown Manager, the pods
|
||||
that are part of a StatefulSet will be stuck in terminating status on
|
||||
the shutdown node and cannot move to a new running node. This is because kubelet on
|
||||
the shutdown node is not available to delete the pods so the StatefulSet cannot
|
||||
create a new pod with the same name. If there are volumes used by the pods, the
|
||||
VolumeAttachments will not be deleted from the original shutdown node so the volumes
|
||||
used by these pods cannot be attached to a new running node. As a result, the
|
||||
application running on the StatefulSet cannot function properly. If the original
|
||||
shutdown node comes up, the pods will be deleted by kubelet and new pods will be
|
||||
created on a different running node. If the original shutdown node does not come up,
|
||||
these pods will be stuck in terminating status on the shutdown node forever.
|
||||
|
||||
To mitigate the above situation, a user can manually add the taint `node
|
||||
kubernetes.io/out-of-service` with either `NoExecute` or `NoSchedule` effect to
|
||||
a Node marking it out-of-service.
|
||||
If the `NodeOutOfServiceVolumeDetach` [feature gate](/docs/reference/
|
||||
command-line-tools-reference/feature-gates/) is enabled on
|
||||
`kube-controller-manager`, and a Node is marked out-of-service with this taint, the
|
||||
pods on the node will be forcefully deleted if there are no matching tolerations on
|
||||
it and volume detach operations for the pods terminating on the node will happen
|
||||
immediately. This allows the Pods on the out-of-service node to recover quickly on a
|
||||
different node.
|
||||
|
||||
During a non-graceful shutdown, Pods are terminated in the two phases:
|
||||
|
||||
1. Force delete the Pods that do not have matching `out-of-service` tolerations.
|
||||
2. Immediately perform detach volume operation for such pods.
|
||||
|
||||
|
||||
{{< note >}}
|
||||
- Before adding the taint `node.kubernetes.io/out-of-service` , it should be verified
|
||||
that the node is already in shutdown or power off state (not in the middle of
|
||||
restarting).
|
||||
- The user is required to manually remove the out-of-service taint after the pods are
|
||||
moved to a new node and the user has checked that the shutdown node has been
|
||||
recovered since the user was the one who originally added the taint.
|
||||
|
||||
|
||||
{{< /note >}}
|
||||
|
||||
### Pod Priority based graceful node shutdown {#pod-priority-graceful-node-shutdown}
|
||||
|
@ -534,10 +586,18 @@ next priority class value range.
|
|||
If this feature is enabled and no configuration is provided, then no ordering
|
||||
action will be taken.
|
||||
|
||||
Using this feature, requires enabling the
|
||||
`GracefulNodeShutdownBasedOnPodPriority` feature gate, and setting the kubelet
|
||||
config's `ShutdownGracePeriodByPodPriority` to the desired configuration
|
||||
containing the pod priority class values and their respective shutdown periods.
|
||||
Using this feature requires enabling the `GracefulNodeShutdownBasedOnPodPriority`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
, and setting `ShutdownGracePeriodByPodPriority` in the
|
||||
[kubelet config](/docs/reference/config-api/kubelet-config.v1beta1/)
|
||||
to the desired configuration containing the pod priority class values and
|
||||
their respective shutdown periods.
|
||||
|
||||
{{< note >}}
|
||||
The ability to take Pod priority into account during graceful node shutdown was introduced
|
||||
as an Alpha feature in Kubernetes v1.23. In Kubernetes {{< skew currentVersion >}}
|
||||
the feature is Beta and is enabled by default.
|
||||
{{< /note >}}
|
||||
|
||||
Metrics `graceful_shutdown_start_time_seconds` and `graceful_shutdown_end_time_seconds`
|
||||
are emitted under the kubelet subsystem to monitor node shutdowns.
|
||||
|
|
|
@ -59,7 +59,7 @@ Before choosing a guide, here are some considerations:
|
|||
|
||||
* [Using Sysctls in a Kubernetes Cluster](/docs/tasks/administer-cluster/sysctl-cluster/) describes to an administrator how to use the `sysctl` command-line tool to set kernel parameters .
|
||||
|
||||
* [Auditing](/docs/tasks/debug-application-cluster/audit/) describes how to interact with Kubernetes' audit logs.
|
||||
* [Auditing](/docs/tasks/debug/debug-cluster/audit/) describes how to interact with Kubernetes' audit logs.
|
||||
|
||||
### Securing the kubelet
|
||||
* [Control Plane-Node communication](/docs/concepts/architecture/control-plane-node-communication/)
|
||||
|
|
|
@ -20,7 +20,7 @@ This page lists some of the available add-ons and links to their respective inst
|
|||
* [Calico](https://docs.projectcalico.org/latest/introduction/) is a networking and network policy provider. Calico supports a flexible set of networking options so you can choose the most efficient option for your situation, including non-overlay and overlay networks, with or without BGP. Calico uses the same engine to enforce network policy for hosts, pods, and (if using Istio & Envoy) applications at the service mesh layer.
|
||||
* [Canal](https://github.com/tigera/canal/tree/master/k8s-install) unites Flannel and Calico, providing networking and network policy.
|
||||
* [Cilium](https://github.com/cilium/cilium) is a L3 network and network policy plugin that can enforce HTTP/API/L7 policies transparently. Both routing and overlay/encapsulation mode are supported, and it can work on top of other CNI plugins.
|
||||
* [CNI-Genie](https://github.com/Huawei-PaaS/CNI-Genie) enables Kubernetes to seamlessly connect to a choice of CNI plugins, such as Calico, Canal, Flannel, Romana, or Weave.
|
||||
* [CNI-Genie](https://github.com/Huawei-PaaS/CNI-Genie) enables Kubernetes to seamlessly connect to a choice of CNI plugins, such as Calico, Canal, Flannel, or Weave.
|
||||
* [Contrail](https://www.juniper.net/us/en/products-services/sdn/contrail/contrail-networking/), based on [Tungsten Fabric](https://tungsten.io), is an open source, multi-cloud network virtualization and policy management platform. Contrail and Tungsten Fabric are integrated with orchestration systems such as Kubernetes, OpenShift, OpenStack and Mesos, and provide isolation modes for virtual machines, containers/pods and bare metal workloads.
|
||||
* [Flannel](https://github.com/flannel-io/flannel#deploying-flannel-manually) is an overlay network provider that can be used with Kubernetes.
|
||||
* [Knitter](https://github.com/ZTE/Knitter/) is a plugin to support multiple network interfaces in a Kubernetes pod.
|
||||
|
@ -29,7 +29,7 @@ This page lists some of the available add-ons and links to their respective inst
|
|||
* [OVN4NFV-K8S-Plugin](https://github.com/opnfv/ovn4nfv-k8s-plugin) is OVN based CNI controller plugin to provide cloud native based Service function chaining(SFC), Multiple OVN overlay networking, dynamic subnet creation, dynamic creation of virtual networks, VLAN Provider network, Direct provider network and pluggable with other Multi-network plugins, ideal for edge based cloud native workloads in Multi-cluster networking
|
||||
* [NSX-T](https://docs.vmware.com/en/VMware-NSX-T/2.0/nsxt_20_ncp_kubernetes.pdf) Container Plug-in (NCP) provides integration between VMware NSX-T and container orchestrators such as Kubernetes, as well as integration between NSX-T and container-based CaaS/PaaS platforms such as Pivotal Container Service (PKS) and OpenShift.
|
||||
* [Nuage](https://github.com/nuagenetworks/nuage-kubernetes/blob/v5.1.1-1/docs/kubernetes-1-installation.rst) is an SDN platform that provides policy-based networking between Kubernetes Pods and non-Kubernetes environments with visibility and security monitoring.
|
||||
* **Romana** is a Layer 3 networking solution for pod networks that also supports the [NetworkPolicy API](/docs/concepts/services-networking/network-policies/). Kubeadm add-on installation details available [here](https://github.com/romana/romana/tree/master/containerize).
|
||||
* [Romana](https://github.com/romana) is a Layer 3 networking solution for pod networks that also supports the [NetworkPolicy](/docs/concepts/services-networking/network-policies/) API.
|
||||
* [Weave Net](https://www.weave.works/docs/net/latest/kubernetes/kube-addon/) provides networking and network policy, will carry on working on both sides of a network partition, and does not require an external database.
|
||||
|
||||
## Service Discovery
|
||||
|
|
|
@ -331,7 +331,7 @@ Thus, in a situation with a mixture of servers of different versions
|
|||
there may be thrashing as long as different servers have different
|
||||
opinions of the proper content of these objects.
|
||||
|
||||
Each `kube-apiserver` makes an inital maintenance pass over the
|
||||
Each `kube-apiserver` makes an initial maintenance pass over the
|
||||
mandatory and suggested configuration objects, and after that does
|
||||
periodic maintenance (once per minute) of those objects.
|
||||
|
||||
|
|
|
@ -461,7 +461,7 @@ That's it! The Deployment will declaratively update the deployed nginx applicati
|
|||
## {{% heading "whatsnext" %}}
|
||||
|
||||
|
||||
- Learn about [how to use `kubectl` for application introspection and debugging](/docs/tasks/debug-application-cluster/debug-application-introspection/).
|
||||
- Learn about [how to use `kubectl` for application introspection and debugging](/docs/tasks/debug/debug-application/debug-running-pod/).
|
||||
- See [Configuration Best Practices and Tips](/docs/concepts/configuration/overview/).
|
||||
|
||||
|
||||
|
|
|
@ -110,6 +110,55 @@ I1025 00:15:15.525108 1 example.go:116] "Example" data="This is text with
|
|||
second line.}
|
||||
```
|
||||
|
||||
### Contextual Logging
|
||||
|
||||
{{< feature-state for_k8s_version="v1.24" state="alpha" >}}
|
||||
|
||||
Contextual logging builds on top of structured logging. It is primarily about
|
||||
how developers use logging calls: code based on that concept is more flexible
|
||||
and supports additional use cases as described in the [Contextual Logging
|
||||
KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/3077-contextual-logging).
|
||||
|
||||
If developers use additional functions like `WithValues` or `WithName` in
|
||||
their components, then log entries contain additional information that gets
|
||||
passed into functions by their caller.
|
||||
|
||||
Currently this is gated behind the `StructuredLogging` feature gate and
|
||||
disabled by default. The infrastructure for this was added in 1.24 without
|
||||
modifying components. The
|
||||
[`component-base/logs/example`](https://github.com/kubernetes/kubernetes/blob/v1.24.0-beta.0/staging/src/k8s.io/component-base/logs/example/cmd/logger.go)
|
||||
command demonstrates how to use the new logging calls and how a component
|
||||
behaves that supports contextual logging.
|
||||
|
||||
```console
|
||||
$ cd $GOPATH/src/k8s.io/kubernetes/staging/src/k8s.io/component-base/logs/example/cmd/
|
||||
$ go run . --help
|
||||
...
|
||||
--feature-gates mapStringBool A set of key=value pairs that describe feature gates for alpha/experimental features. Options are:
|
||||
AllAlpha=true|false (ALPHA - default=false)
|
||||
AllBeta=true|false (BETA - default=false)
|
||||
ContextualLogging=true|false (ALPHA - default=false)
|
||||
$ go run . --feature-gates ContextualLogging=true
|
||||
...
|
||||
I0404 18:00:02.916429 451895 logger.go:94] "example/myname: runtime" foo="bar" duration="1m0s"
|
||||
I0404 18:00:02.916447 451895 logger.go:95] "example: another runtime" foo="bar" duration="1m0s"
|
||||
```
|
||||
|
||||
The `example` prefix and `foo="bar"` were added by the caller of the function
|
||||
which logs the `runtime` message and `duration="1m0s"` value, without having to
|
||||
modify that function.
|
||||
|
||||
With contextual logging disable, `WithValues` and `WithName` do nothing and log
|
||||
calls go through the global klog logger. Therefore this additional information
|
||||
is not in the log output anymore:
|
||||
|
||||
```console
|
||||
$ go run . --feature-gates ContextualLogging=false
|
||||
...
|
||||
I0404 18:03:31.171945 452150 logger.go:94] "runtime" duration="1m0s"
|
||||
I0404 18:03:31.171962 452150 logger.go:95] "another runtime" duration="1m0s"
|
||||
```
|
||||
|
||||
### JSON log format
|
||||
|
||||
{{< feature-state for_k8s_version="v1.19" state="alpha" >}}
|
||||
|
@ -150,27 +199,6 @@ List of components currently supporting JSON format:
|
|||
* {{< glossary_tooltip term_id="kube-scheduler" text="kube-scheduler" >}}
|
||||
* {{< glossary_tooltip term_id="kubelet" text="kubelet" >}}
|
||||
|
||||
### Log sanitization
|
||||
|
||||
{{< feature-state for_k8s_version="v1.20" state="alpha" >}}
|
||||
|
||||
{{<warning >}}
|
||||
Log sanitization might incur significant computation overhead and therefore should not be enabled in production.
|
||||
{{< /warning >}}
|
||||
|
||||
The `--experimental-logging-sanitization` flag enables the klog sanitization filter.
|
||||
If enabled all log arguments are inspected for fields tagged as sensitive data (e.g. passwords, keys, tokens) and logging of these fields will be prevented.
|
||||
|
||||
List of components currently supporting log sanitization:
|
||||
* kube-controller-manager
|
||||
* kube-apiserver
|
||||
* kube-scheduler
|
||||
* kubelet
|
||||
|
||||
{{< note >}}
|
||||
The Log sanitization filter does not prevent user workload logs from leaking sensitive data.
|
||||
{{< /note >}}
|
||||
|
||||
### Log verbosity level
|
||||
|
||||
The `-v` flag controls log verbosity. Increasing the value increases the number of logged events. Decreasing the value decreases the number of logged events.
|
||||
|
@ -197,5 +225,6 @@ The `logrotate` tool rotates logs daily, or once the log size is greater than 10
|
|||
|
||||
* Read about the [Kubernetes Logging Architecture](/docs/concepts/cluster-administration/logging/)
|
||||
* Read about [Structured Logging](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging)
|
||||
* Read about [Contextual Logging](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/3077-contextual-logging)
|
||||
* Read about [deprecation of klog flags](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/2845-deprecate-klog-specific-flags-in-k8s-components)
|
||||
* Read about the [Conventions for logging severity](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/logging.md)
|
||||
|
|
|
@ -279,5 +279,6 @@ to the deleted ConfigMap, it is recommended to recreate these pods.
|
|||
|
||||
* Read about [Secrets](/docs/concepts/configuration/secret/).
|
||||
* Read [Configure a Pod to Use a ConfigMap](/docs/tasks/configure-pod-container/configure-pod-configmap/).
|
||||
* Read about [changing a ConfigMap (or any other Kubernetes object)](/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/)
|
||||
* Read [The Twelve-Factor App](https://12factor.net/) to understand the motivation for
|
||||
separating code from configuration.
|
||||
|
|
|
@ -47,10 +47,9 @@ or by enforcement (the system prevents the container from ever exceeding the lim
|
|||
runtimes can have different ways to implement the same restrictions.
|
||||
|
||||
{{< note >}}
|
||||
If a container specifies its own memory limit, but does not specify a memory request, Kubernetes
|
||||
automatically assigns a memory request that matches the limit. Similarly, if a container specifies its own
|
||||
CPU limit, but does not specify a CPU request, Kubernetes automatically assigns a CPU request that matches
|
||||
the limit.
|
||||
If you specify a limit for a resource, but do not specify any request, and no admission-time
|
||||
mechanism has applied a default request for that resource, then Kubernetes copies the limit
|
||||
you specified and uses it as the requested value for the resource.
|
||||
{{< /note >}}
|
||||
|
||||
## Resource types
|
||||
|
@ -229,9 +228,9 @@ see the [Troubleshooting](#troubleshooting) section.
|
|||
The kubelet reports the resource usage of a Pod as part of the Pod
|
||||
[`status`](/docs/concepts/overview/working-with-objects/kubernetes-objects/#object-spec-and-status).
|
||||
|
||||
If optional [tools for monitoring](/docs/tasks/debug-application-cluster/resource-usage-monitoring/)
|
||||
If optional [tools for monitoring](/docs/tasks/debug/debug-cluster/resource-usage-monitoring/)
|
||||
are available in your cluster, then Pod resource usage can be retrieved either
|
||||
from the [Metrics API](/docs/tasks/debug-application-cluster/resource-metrics-pipeline/#metrics-api)
|
||||
from the [Metrics API](/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#metrics-api)
|
||||
directly or from your monitoring tools.
|
||||
|
||||
## Local ephemeral storage
|
||||
|
|
|
@ -247,6 +247,8 @@ You can still [manually create](/docs/tasks/configure-pod-container/configure-se
|
|||
a service account token Secret; for example, if you need a token that never expires.
|
||||
However, using the [TokenRequest](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/)
|
||||
subresource to obtain a token to access the API is recommended instead.
|
||||
You can use the [`kubectl create token`](/docs/reference/generated/kubectl/kubectl-commands#-em-token-em-)
|
||||
command to obtain a token from the `TokenRequest` API.
|
||||
{{< /note >}}
|
||||
|
||||
#### Projection of Secret keys to specific paths
|
||||
|
@ -886,15 +888,30 @@ In this case, `0` means you have created an empty Secret.
|
|||
### Service account token Secrets
|
||||
|
||||
A `kubernetes.io/service-account-token` type of Secret is used to store a
|
||||
token that identifies a
|
||||
token credential that identifies a
|
||||
{{< glossary_tooltip text="service account" term_id="service-account" >}}.
|
||||
|
||||
Since 1.22, this type of Secret is no longer used to mount credentials into Pods,
|
||||
and obtaining tokens via the [TokenRequest](/docs/reference/kubernetes-api/authentication-resources/token-request-v1/)
|
||||
API is recommended instead of using service account token Secret objects.
|
||||
Tokens obtained from the `TokenRequest` API are more secure than ones stored in Secret objects,
|
||||
because they have a bounded lifetime and are not readable by other API clients.
|
||||
You can use the [`kubectl create token`](/docs/reference/generated/kubectl/kubectl-commands#-em-token-em-)
|
||||
command to obtain a token from the `TokenRequest` API.
|
||||
|
||||
You should only create a service account token Secret object
|
||||
if you can't use the `TokenRequest` API to obtain a token,
|
||||
and the security exposure of persisting a non-expiring token credential
|
||||
in a readable API object is acceptable to you.
|
||||
|
||||
When using this Secret type, you need to ensure that the
|
||||
`kubernetes.io/service-account.name` annotation is set to an existing
|
||||
service account name. A Kubernetes
|
||||
{{< glossary_tooltip text="controller" term_id="controller" >}} fills in some
|
||||
other fields such as the `kubernetes.io/service-account.uid` annotation, and the
|
||||
`token` key in the `data` field, which is set to contain an authentication
|
||||
token.
|
||||
service account name. If you are creating both the ServiceAccount and
|
||||
the Secret objects, you should create the ServiceAccount object first.
|
||||
|
||||
After the Secret is created, a Kubernetes {{< glossary_tooltip text="controller" term_id="controller" >}}
|
||||
fills in some other fields such as the `kubernetes.io/service-account.uid` annotation, and the
|
||||
`token` key in the `data` field, which is populated with an authentication token.
|
||||
|
||||
The following example configuration declares a service account token Secret:
|
||||
|
||||
|
@ -911,20 +928,14 @@ data:
|
|||
extra: YmFyCg==
|
||||
```
|
||||
|
||||
When creating a `Pod`, Kubernetes automatically finds or creates a service account
|
||||
Secret and then automatically modifies your Pod to use this Secret. The service account
|
||||
token Secret contains credentials for accessing the Kubernetes API.
|
||||
|
||||
The automatic creation and use of API credentials can be disabled or
|
||||
overridden if desired. However, if all you need to do is securely access the
|
||||
API server, this is the recommended workflow.
|
||||
After creating the Secret, wait for Kubernetes to populate the `token` key in the `data` field.
|
||||
|
||||
See the [ServiceAccount](/docs/tasks/configure-pod-container/configure-service-account/)
|
||||
documentation for more information on how service accounts work.
|
||||
You can also check the `automountServiceAccountToken` field and the
|
||||
`serviceAccountName` field of the
|
||||
[`Pod`](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#pod-v1-core)
|
||||
for information on referencing service account from Pods.
|
||||
for information on referencing service account credentials from within Pods.
|
||||
|
||||
### Docker config Secrets
|
||||
|
||||
|
@ -982,7 +993,7 @@ kubectl create secret docker-registry secret-tiger-docker \
|
|||
```
|
||||
|
||||
That command creates a Secret of type `kubernetes.io/dockerconfigjson`.
|
||||
If you dump the `.data.dockercfgjson` field from that new Secret and then
|
||||
If you dump the `.data.dockerconfigjson` field from that new Secret and then
|
||||
decode it from base64:
|
||||
|
||||
```shell
|
||||
|
@ -1291,7 +1302,7 @@ on that node.
|
|||
- When deploying applications that interact with the Secret API, you should
|
||||
limit access using
|
||||
[authorization policies](/docs/reference/access-authn-authz/authorization/) such as
|
||||
[RBAC]( /docs/reference/access-authn-authz/rbac/).
|
||||
[RBAC](/docs/reference/access-authn-authz/rbac/).
|
||||
- In the Kubernetes API, `watch` and `list` requests for Secrets within a namespace
|
||||
are extremely powerful capabilities. Avoid granting this access where feasible, since
|
||||
listing Secrets allows the clients to inspect the values of every Secret in that
|
||||
|
@ -1310,7 +1321,7 @@ have access to run a Pod that then exposes the Secret.
|
|||
- When deploying applications that interact with the Secret API, you should
|
||||
limit access using
|
||||
[authorization policies](/docs/reference/access-authn-authz/authorization/) such as
|
||||
[RBAC]( /docs/reference/access-authn-authz/rbac/).
|
||||
[RBAC](/docs/reference/access-authn-authz/rbac/).
|
||||
- In the API server, objects (including Secrets) are persisted into
|
||||
{{< glossary_tooltip term_id="etcd" >}}; therefore:
|
||||
- only allow cluster admistrators to access etcd (this includes read-only access);
|
||||
|
|
|
@ -0,0 +1,83 @@
|
|||
---
|
||||
reviewers:
|
||||
- jayunit100
|
||||
- jsturtevant
|
||||
- marosset
|
||||
- perithompson
|
||||
title: Resource Management for Windows nodes
|
||||
content_type: concept
|
||||
weight: 75
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
This page outlines the differences in how resources are managed between Linux and Windows.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
On Linux nodes, {{< glossary_tooltip text="cgroups" term_id="cgroup" >}} are used
|
||||
as a pod boundary for resource control. Containers are created within that boundary
|
||||
for network, process and file system isolation. The Linux cgroup APIs can be used to
|
||||
gather CPU, I/O, and memory use statistics.
|
||||
|
||||
In contrast, Windows uses a [_job object_](https://docs.microsoft.com/windows/win32/procthread/job-objects) per container with a system namespace filter
|
||||
to contain all processes in a container and provide logical isolation from the
|
||||
host.
|
||||
(Job objects are a Windows process isolation mechanism and are different from
|
||||
what Kubernetes refers to as a {{< glossary_tooltip term_id="job" text="Job" >}}).
|
||||
|
||||
There is no way to run a Windows container without the namespace filtering in
|
||||
place. This means that system privileges cannot be asserted in the context of the
|
||||
host, and thus privileged containers are not available on Windows.
|
||||
Containers cannot assume an identity from the host because the Security Account Manager
|
||||
(SAM) is separate.
|
||||
|
||||
## Memory reservations {#resource-management-memory}
|
||||
|
||||
Windows does not have an out-of-memory process killer as Linux does. Windows always
|
||||
treats all user-mode memory allocations as virtual, and pagefiles are mandatory.
|
||||
|
||||
Windows nodes do not overcommit memory for processes running in containers. The
|
||||
net effect is that Windows won't reach out of memory conditions the same way Linux
|
||||
does, and processes page to disk instead of being subject to out of memory (OOM)
|
||||
termination. If memory is over-provisioned and all physical memory is exhausted,
|
||||
then paging can slow down performance.
|
||||
|
||||
You can place bounds on memory use for workloads using the kubelet
|
||||
parameters `--kubelet-reserve` and/or `--system-reserve`; these account
|
||||
for memory usage on the node (outside of containers), and reduce
|
||||
[NodeAllocatable](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable).
|
||||
As you deploy workloads, set resource limits on containers. This also subtracts from
|
||||
`NodeAllocatable` and prevents the scheduler from adding more pods once a node is full.
|
||||
|
||||
{{< note >}}
|
||||
When you set memory resource limits for Windows containers, you should either set a
|
||||
limit and leave the memory request unspecified, or set the request equal to the limit.
|
||||
{{< /note >}}
|
||||
|
||||
On Windows, good practice to avoid over-provisioning is to configure the kubelet
|
||||
with a system reserved memory of at least 2GiB to account for Windows, Kubernetes
|
||||
and container runtime overheads.
|
||||
|
||||
## CPU reservations {#resource-management-cpu}
|
||||
|
||||
To account for CPU use by the operating system, the container runtime, and by
|
||||
Kubernetes host processes such as the kubelet, you can (and should) reserve a
|
||||
percentage of total CPU. You should determine this CPU reservation taking account of
|
||||
to the number of CPU cores available on the node. To decide on the CPU percentage to
|
||||
reserve, identify the maximum pod density for each node and monitor the CPU usage of
|
||||
the system services running there, then choose a value that meets your workload needs.
|
||||
|
||||
You can place bounds on CPU usage for workloads using the
|
||||
kubelet parameters `--kubelet-reserve` and/or `--system-reserve` to
|
||||
account for CPU usage on the node (outside of containers).
|
||||
This reduces `NodeAllocatable`.
|
||||
The cluster-wide scheduler then takes this reservation into account when determining
|
||||
pod placement.
|
||||
|
||||
On Windows, the kubelet supports a command-line flag to set the priority of the
|
||||
kubelet process: `--windows-priorityclass`. This flag allows the kubelet process to get
|
||||
more CPU time slices when compared to other processes running on the Windows host.
|
||||
More information on the allowable values and their meaning is available at
|
||||
[Windows Priority Classes](https://docs.microsoft.com/en-us/windows/win32/procthread/scheduling-priorities#priority-class).
|
||||
To ensure that running Pods do not starve the kubelet of CPU cycles, set this flag to `ABOVE_NORMAL_PRIORITY_CLASS` or above.
|
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
reviewers:
|
||||
- tallclair
|
||||
- dchen1107
|
||||
- tallclair
|
||||
- dchen1107
|
||||
title: Runtime Class
|
||||
content_type: concept
|
||||
weight: 20
|
||||
|
@ -16,9 +16,6 @@ This page describes the RuntimeClass resource and runtime selection mechanism.
|
|||
RuntimeClass is a feature for selecting the container runtime configuration. The container runtime
|
||||
configuration is used to run a Pod's containers.
|
||||
|
||||
|
||||
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## Motivation
|
||||
|
@ -62,12 +59,15 @@ The RuntimeClass resource currently only has 2 significant fields: the RuntimeCl
|
|||
(`metadata.name`) and the handler (`handler`). The object definition looks like this:
|
||||
|
||||
```yaml
|
||||
apiVersion: node.k8s.io/v1 # RuntimeClass is defined in the node.k8s.io API group
|
||||
# RuntimeClass is defined in the node.k8s.io API group
|
||||
apiVersion: node.k8s.io/v1
|
||||
kind: RuntimeClass
|
||||
metadata:
|
||||
name: myclass # The name the RuntimeClass will be referenced by
|
||||
# RuntimeClass is a non-namespaced resource
|
||||
handler: myconfiguration # The name of the corresponding CRI configuration
|
||||
# The name the RuntimeClass will be referenced by.
|
||||
# RuntimeClass is a non-namespaced resource.
|
||||
name: myclass
|
||||
# The name of the corresponding CRI configuration
|
||||
handler: myconfiguration
|
||||
```
|
||||
|
||||
The name of a RuntimeClass object must be a valid
|
||||
|
@ -75,14 +75,14 @@ The name of a RuntimeClass object must be a valid
|
|||
|
||||
{{< note >}}
|
||||
It is recommended that RuntimeClass write operations (create/update/patch/delete) be
|
||||
restricted to the cluster administrator. This is typically the default. See [Authorization
|
||||
Overview](/docs/reference/access-authn-authz/authorization/) for more details.
|
||||
restricted to the cluster administrator. This is typically the default. See
|
||||
[Authorization Overview](/docs/reference/access-authn-authz/authorization/) for more details.
|
||||
{{< /note >}}
|
||||
|
||||
## Usage
|
||||
|
||||
Once RuntimeClasses are configured for the cluster, using them is very simple. Specify a
|
||||
`runtimeClassName` in the Pod spec. For example:
|
||||
Once RuntimeClasses are configured for the cluster, you can specify a
|
||||
`runtimeClassName` in the Pod spec to use it. For example:
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
|
@ -97,7 +97,7 @@ spec:
|
|||
This will instruct the kubelet to use the named RuntimeClass to run this pod. If the named
|
||||
RuntimeClass does not exist, or the CRI cannot run the corresponding handler, the pod will enter the
|
||||
`Failed` terminal [phase](/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase). Look for a
|
||||
corresponding [event](/docs/tasks/debug-application-cluster/debug-application-introspection/) for an
|
||||
corresponding [event](/docs/tasks/debug/debug-application/debug-running-pod/) for an
|
||||
error message.
|
||||
|
||||
If no `runtimeClassName` is specified, the default RuntimeHandler will be used, which is equivalent
|
||||
|
@ -107,16 +107,6 @@ to the behavior when the RuntimeClass feature is disabled.
|
|||
|
||||
For more details on setting up CRI runtimes, see [CRI installation](/docs/setup/production-environment/container-runtimes/).
|
||||
|
||||
#### dockershim
|
||||
|
||||
{{< feature-state for_k8s_version="v1.20" state="deprecated" >}}
|
||||
|
||||
Dockershim is deprecated as of Kubernetes v1.20, and will be removed in v1.24. For more information on the deprecation,
|
||||
see [dockershim deprecation](/blog/2020/12/08/kubernetes-1-20-release-announcement/#dockershim-deprecation)
|
||||
|
||||
RuntimeClasses with dockershim must set the runtime handler to `docker`. Dockershim does not support
|
||||
custom configurable runtime handlers.
|
||||
|
||||
#### {{< glossary_tooltip term_id="containerd" >}}
|
||||
|
||||
Runtime handlers are configured through containerd's configuration at
|
||||
|
@ -126,14 +116,14 @@ Runtime handlers are configured through containerd's configuration at
|
|||
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.${HANDLER_NAME}]
|
||||
```
|
||||
|
||||
See containerd's config documentation for more details:
|
||||
https://github.com/containerd/cri/blob/master/docs/config.md
|
||||
See containerd's [config documentation](https://github.com/containerd/cri/blob/master/docs/config.md)
|
||||
for more details:
|
||||
|
||||
#### {{< glossary_tooltip term_id="cri-o" >}}
|
||||
|
||||
Runtime handlers are configured through CRI-O's configuration at `/etc/crio/crio.conf`. Valid
|
||||
handlers are configured under the [crio.runtime
|
||||
table](https://github.com/cri-o/cri-o/blob/master/docs/crio.conf.5.md#crioruntime-table):
|
||||
handlers are configured under the
|
||||
[crio.runtime table](https://github.com/cri-o/cri-o/blob/master/docs/crio.conf.5.md#crioruntime-table):
|
||||
|
||||
```
|
||||
[crio.runtime.runtimes.${HANDLER_NAME}]
|
||||
|
@ -161,27 +151,24 @@ can add `tolerations` to the RuntimeClass. As with the `nodeSelector`, the toler
|
|||
with the pod's tolerations in admission, effectively taking the union of the set of nodes tolerated
|
||||
by each.
|
||||
|
||||
To learn more about configuring the node selector and tolerations, see [Assigning Pods to
|
||||
Nodes](/docs/concepts/scheduling-eviction/assign-pod-node/).
|
||||
To learn more about configuring the node selector and tolerations, see
|
||||
[Assigning Pods to Nodes](/docs/concepts/scheduling-eviction/assign-pod-node/).
|
||||
|
||||
### Pod Overhead
|
||||
|
||||
{{< feature-state for_k8s_version="v1.18" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
|
||||
|
||||
You can specify _overhead_ resources that are associated with running a Pod. Declaring overhead allows
|
||||
the cluster (including the scheduler) to account for it when making decisions about Pods and resources.
|
||||
To use Pod overhead, you must have the PodOverhead [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
enabled (it is on by default).
|
||||
|
||||
Pod overhead is defined in RuntimeClass through the `overhead` fields. Through the use of these fields,
|
||||
Pod overhead is defined in RuntimeClass through the `overhead` field. Through the use of this field,
|
||||
you can specify the overhead of running pods utilizing this RuntimeClass and ensure these overheads
|
||||
are accounted for in Kubernetes.
|
||||
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
|
||||
- [RuntimeClass Design](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/585-runtime-class/README.md)
|
||||
- [RuntimeClass Scheduling Design](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/585-runtime-class/README.md#runtimeclass-scheduling)
|
||||
- Read about the [Pod Overhead](/docs/concepts/scheduling-eviction/pod-overhead/) concept
|
||||
- [PodOverhead Feature Design](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/688-pod-overhead)
|
||||
|
||||
|
|
|
@ -346,8 +346,6 @@ Here are some examples of device plugin implementations:
|
|||
* The [AMD GPU device plugin](https://github.com/RadeonOpenCompute/k8s-device-plugin)
|
||||
* The [Intel device plugins](https://github.com/intel/intel-device-plugins-for-kubernetes) for Intel GPU, FPGA, QAT, VPU, SGX, DSA, DLB and IAA devices
|
||||
* The [KubeVirt device plugins](https://github.com/kubevirt/kubernetes-device-plugins) for hardware-assisted virtualization
|
||||
* The [NVIDIA GPU device plugin](https://github.com/NVIDIA/k8s-device-plugin)
|
||||
* Requires [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) 2.0, which allows you to run GPU-enabled Docker containers.
|
||||
* The [NVIDIA GPU device plugin for Container-Optimized OS](https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu)
|
||||
* The [RDMA device plugin](https://github.com/hustcat/k8s-rdma-device-plugin)
|
||||
* The [SocketCAN device plugin](https://github.com/collabora/k8s-socketcan)
|
||||
|
|
|
@ -11,36 +11,52 @@ weight: 10
|
|||
|
||||
<!-- overview -->
|
||||
|
||||
Network plugins in Kubernetes come in a few flavors:
|
||||
Kubernetes {{< skew currentVersion >}} supports [Container Network Interface](https://github.com/containernetworking/cni)
|
||||
(CNI) plugins for cluster networking. You must use a CNI plugin that is compatible with your cluster and that suits your needs. Different plugins are available (both open- and closed- source) in the wider Kubernetes ecosystem.
|
||||
|
||||
* CNI plugins: adhere to the [Container Network Interface](https://github.com/containernetworking/cni) (CNI) specification, designed for interoperability.
|
||||
* Kubernetes follows the [v0.4.0](https://github.com/containernetworking/cni/blob/spec-v0.4.0/SPEC.md) release of the CNI specification.
|
||||
* Kubenet plugin: implements basic `cbr0` using the `bridge` and `host-local` CNI plugins
|
||||
A CNI plugin is required to implement the [Kubernetes network model](/docs/concepts/services-networking/#the-kubernetes-network-model).
|
||||
|
||||
You must use a CNI plugin that is compatible with the
|
||||
[v0.4.0](https://github.com/containernetworking/cni/blob/spec-v0.4.0/SPEC.md) or later
|
||||
releases of the CNI specification. The Kubernetes project recommends using a plugin that is
|
||||
compatible with the [v1.0.0](https://github.com/containernetworking/cni/blob/spec-v1.0.0/SPEC.md)
|
||||
CNI specification (plugins can be compatible with multiple spec versions).
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## Installation
|
||||
|
||||
The kubelet has a single default network plugin, and a default network common to the entire cluster. It probes for plugins when it starts up, remembers what it finds, and executes the selected plugin at appropriate times in the pod lifecycle (this is only true for Docker, as CRI manages its own CNI plugins). There are two Kubelet command line parameters to keep in mind when using plugins:
|
||||
A Container Runtime, in the networking context, is a daemon on a node configured to provide CRI Services for kubelet. In particular, the Container Runtime must be configured to load the CNI plugins required to implement the Kubernetes network model.
|
||||
|
||||
* `cni-bin-dir`: Kubelet probes this directory for plugins on startup
|
||||
* `network-plugin`: The network plugin to use from `cni-bin-dir`. It must match the name reported by a plugin probed from the plugin directory. For CNI plugins, this is `cni`.
|
||||
{{< note >}}
|
||||
Prior to Kubernetes 1.24, the CNI plugins could also be managed by the kubelet using the `cni-bin-dir` and `network-plugin` command-line parameters.
|
||||
These command-line parameters were removed in Kubernetes 1.24, with management of the CNI no longer in scope for kubelet.
|
||||
|
||||
See [Troubleshooting CNI plugin-related errors](/docs/tasks/administer-cluster/migrating-from-dockershim/troubleshooting-cni-plugin-related-errors/)
|
||||
if you are facing issues following the removal of dockershim.
|
||||
{{< /note >}}
|
||||
|
||||
For specific information about how a Container Runtime manages the CNI plugins, see the documentation for that Container Runtime, for example:
|
||||
- [containerd](https://github.com/containerd/containerd/blob/main/script/setup/install-cni)
|
||||
- [CRI-O](https://github.com/cri-o/cri-o/blob/main/contrib/cni/README.md)
|
||||
|
||||
For specific information about how to install and manage a CNI plugin, see the documentation for that plugin or [networking provider](/docs/concepts/cluster-administration/networking/#how-to-implement-the-kubernetes-networking-model).
|
||||
|
||||
## Network Plugin Requirements
|
||||
|
||||
Besides providing the [`NetworkPlugin` interface](https://github.com/kubernetes/kubernetes/tree/{{< param "fullversion" >}}/pkg/kubelet/dockershim/network/plugins.go) to configure and clean up pod networking, the plugin may also need specific support for kube-proxy. The iptables proxy obviously depends on iptables, and the plugin may need to ensure that container traffic is made available to iptables. For example, if the plugin connects containers to a Linux bridge, the plugin must set the `net/bridge/bridge-nf-call-iptables` sysctl to `1` to ensure that the iptables proxy functions correctly. If the plugin does not use a Linux bridge (but instead something like Open vSwitch or some other mechanism) it should ensure container traffic is appropriately routed for the proxy.
|
||||
For plugin developers and users who regularly build or deploy Kubernetes, the plugin may also need specific configuration to support kube-proxy.
|
||||
The iptables proxy depends on iptables, and the plugin may need to ensure that container traffic is made available to iptables.
|
||||
For example, if the plugin connects containers to a Linux bridge, the plugin must set the `net/bridge/bridge-nf-call-iptables` sysctl to `1` to ensure that the iptables proxy functions correctly.
|
||||
If the plugin does not use a Linux bridge, but uses something like Open vSwitch or some other mechanism instead, it should ensure container traffic is appropriately routed for the proxy.
|
||||
|
||||
By default if no kubelet network plugin is specified, the `noop` plugin is used, which sets `net/bridge/bridge-nf-call-iptables=1` to ensure simple configurations (like Docker with a bridge) work correctly with the iptables proxy.
|
||||
By default, if no kubelet network plugin is specified, the `noop` plugin is used, which sets `net/bridge/bridge-nf-call-iptables=1` to ensure simple configurations (like Docker with a bridge) work correctly with the iptables proxy.
|
||||
|
||||
### CNI
|
||||
### Loopback CNI
|
||||
|
||||
The CNI plugin is selected by passing Kubelet the `--network-plugin=cni` command-line option. Kubelet reads a file from `--cni-conf-dir` (default `/etc/cni/net.d`) and uses the CNI configuration from that file to set up each pod's network. The CNI configuration file must match the [CNI specification](https://github.com/containernetworking/cni/blob/master/SPEC.md#network-configuration), and any required CNI plugins referenced by the configuration must be present in `--cni-bin-dir` (default `/opt/cni/bin`).
|
||||
In addition to the CNI plugin installed on the nodes for implementing the Kubernetes network model, Kubernetes also requires the container runtimes to provide a loopback interface `lo`, which is used for each sandbox (pod sandboxes, vm sandboxes, ...).
|
||||
Implementing the loopback interface can be accomplished by re-using the [CNI loopback plugin.](https://github.com/containernetworking/plugins/blob/master/plugins/main/loopback/loopback.go) or by developing your own code to achieve this (see [this example from CRI-O](https://github.com/cri-o/ocicni/blob/release-1.24/pkg/ocicni/util_linux.go#L91)).
|
||||
|
||||
If there are multiple CNI configuration files in the directory, the kubelet uses the configuration file that comes first by name in lexicographic order.
|
||||
|
||||
In addition to the CNI plugin specified by the configuration file, Kubernetes requires the standard CNI [`lo`](https://github.com/containernetworking/plugins/blob/master/plugins/main/loopback/loopback.go) plugin, at minimum version 0.2.0
|
||||
|
||||
#### Support hostPort
|
||||
### Support hostPort
|
||||
|
||||
The CNI networking plugin supports `hostPort`. You can use the official [portmap](https://github.com/containernetworking/plugins/tree/master/plugins/meta/portmap)
|
||||
plugin offered by the CNI plugin team or use your own plugin with portMapping functionality.
|
||||
|
@ -77,7 +93,7 @@ For example:
|
|||
}
|
||||
```
|
||||
|
||||
#### Support traffic shaping
|
||||
### Support traffic shaping
|
||||
|
||||
**Experimental Feature**
|
||||
|
||||
|
@ -129,37 +145,4 @@ metadata:
|
|||
...
|
||||
```
|
||||
|
||||
### kubenet
|
||||
|
||||
Kubenet is a very basic, simple network plugin, on Linux only. It does not, of itself, implement more advanced features like cross-node networking or network policy. It is typically used together with a cloud provider that sets up routing rules for communication between nodes, or in single-node environments.
|
||||
|
||||
Kubenet creates a Linux bridge named `cbr0` and creates a veth pair for each pod with the host end of each pair connected to `cbr0`. The pod end of the pair is assigned an IP address allocated from a range assigned to the node either through configuration or by the controller-manager. `cbr0` is assigned an MTU matching the smallest MTU of an enabled normal interface on the host.
|
||||
|
||||
The plugin requires a few things:
|
||||
|
||||
* The standard CNI `bridge`, `lo` and `host-local` plugins are required, at minimum version 0.2.0. Kubenet will first search for them in `/opt/cni/bin`. Specify `cni-bin-dir` to supply additional search path. The first found match will take effect.
|
||||
* Kubelet must be run with the `--network-plugin=kubenet` argument to enable the plugin
|
||||
* Kubelet should also be run with the `--non-masquerade-cidr=<clusterCidr>` argument to ensure traffic to IPs outside this range will use IP masquerade.
|
||||
* The node must be assigned an IP subnet through either the `--pod-cidr` kubelet command-line option or the `--allocate-node-cidrs=true --cluster-cidr=<cidr>` controller-manager command-line options.
|
||||
|
||||
### Customizing the MTU (with kubenet)
|
||||
|
||||
The MTU should always be configured correctly to get the best networking performance. Network plugins will usually try
|
||||
to infer a sensible MTU, but sometimes the logic will not result in an optimal MTU. For example, if the
|
||||
Docker bridge or another interface has a small MTU, kubenet will currently select that MTU. Or if you are
|
||||
using IPSEC encapsulation, the MTU must be reduced, and this calculation is out-of-scope for
|
||||
most network plugins.
|
||||
|
||||
Where needed, you can specify the MTU explicitly with the `network-plugin-mtu` kubelet option. For example,
|
||||
on AWS the `eth0` MTU is typically 9001, so you might specify `--network-plugin-mtu=9001`. If you're using IPSEC you
|
||||
might reduce it to allow for encapsulation overhead; for example: `--network-plugin-mtu=8873`.
|
||||
|
||||
This option is provided to the network-plugin; currently **only kubenet supports `network-plugin-mtu`**.
|
||||
|
||||
## Usage Summary
|
||||
|
||||
* `--network-plugin=cni` specifies that we use the `cni` network plugin with actual CNI plugin binaries located in `--cni-bin-dir` (default `/opt/cni/bin`) and CNI plugin configuration located in `--cni-conf-dir` (default `/etc/cni/net.d`).
|
||||
* `--network-plugin=kubenet` specifies that we use the `kubenet` network plugin with CNI `bridge`, `lo` and `host-local` plugins placed in `/opt/cni/bin` or `cni-bin-dir`.
|
||||
* `--network-plugin-mtu=9001` specifies the MTU to use, currently only used by the `kubenet` network plugin.
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
|
|
@ -111,6 +111,7 @@ Operator.
|
|||
{{% thirdparty-content %}}
|
||||
|
||||
* [Charmed Operator Framework](https://juju.is/)
|
||||
* [Java Operator SDK](https://github.com/java-operator-sdk/java-operator-sdk)
|
||||
* [Kopf](https://github.com/nolar/kopf) (Kubernetes Operator Pythonic Framework)
|
||||
* [kubebuilder](https://book.kubebuilder.io/)
|
||||
* [KubeOps](https://buehler.github.io/dotnet-operator-sdk/) (.NET operator SDK)
|
||||
|
|
|
@ -114,7 +114,7 @@ Containers started by Kubernetes automatically include this DNS server in their
|
|||
|
||||
### Container Resource Monitoring
|
||||
|
||||
[Container Resource Monitoring](/docs/tasks/debug-application-cluster/resource-usage-monitoring/) records generic time-series metrics
|
||||
[Container Resource Monitoring](/docs/tasks/debug/debug-cluster/resource-usage-monitoring/) records generic time-series metrics
|
||||
about containers in a central database, and provides a UI for browsing that data.
|
||||
|
||||
### Cluster-level Logging
|
||||
|
|
|
@ -82,18 +82,42 @@ packages that define the API objects.
|
|||
|
||||
### OpenAPI V3
|
||||
|
||||
{{< feature-state state="alpha" for_k8s_version="v1.23" >}}
|
||||
{{< feature-state state="beta" for_k8s_version="v1.24" >}}
|
||||
|
||||
Kubernetes v1.23 offers initial support for publishing its APIs as OpenAPI v3; this is an
|
||||
alpha feature that is disabled by default.
|
||||
You can enable the alpha feature by turning on the
|
||||
Kubernetes {{< param "version" >}} offers beta support for publishing its APIs as OpenAPI v3; this is a
|
||||
beta feature that is enabled by default.
|
||||
You can disable the beta feature by turning off the
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) named `OpenAPIV3`
|
||||
for the kube-apiserver component.
|
||||
|
||||
With the feature enabled, the Kubernetes API server serves an
|
||||
aggregated OpenAPI v3 spec per Kubernetes group version at the
|
||||
`/openapi/v3/apis/<group>/<version>` endpoint. Please refer to the
|
||||
table below for accepted request headers.
|
||||
A discovery endpoint `/openapi/v3` is provided to see a list of all
|
||||
group/versions available. This endpoint only returns JSON. These group/versions
|
||||
are provided in the following format:
|
||||
```json
|
||||
{
|
||||
"paths": {
|
||||
...
|
||||
"api/v1": {
|
||||
"serverRelativeURL": "/openapi/v3/api/v1?hash=CC0E9BFD992D8C59AEC98A1E2336F899E8318D3CF4C68944C3DEC640AF5AB52D864AC50DAA8D145B3494F75FA3CFF939FCBDDA431DAD3CA79738B297795818CF"
|
||||
},
|
||||
"apis/admissionregistration.k8s.io/v1": {
|
||||
"serverRelativeURL": "/openapi/v3/apis/admissionregistration.k8s.io/v1?hash=E19CC93A116982CE5422FC42B590A8AFAD92CDE9AE4D59B5CAAD568F083AD07946E6CB5817531680BCE6E215C16973CD39003B0425F3477CFD854E89A9DB6597"
|
||||
},
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
The relative URLs are pointing to immutable OpenAPI descriptions, in
|
||||
order to improve client-side caching. The proper HTTP caching headers
|
||||
are also set by the API server for that purpose (`Expires` to 1 year in
|
||||
the future, and `Cache-Control` to `immutable`). When an obsolete URL is
|
||||
used, the API server returns a redirect to the newest URL.
|
||||
|
||||
The Kubernetes API server publishes an OpenAPI v3 spec per Kubernetes
|
||||
group version at the `/openapi/v3/apis/<group>/<version>?hash=<hash>`
|
||||
endpoint.
|
||||
|
||||
Refer to the table below for accepted request headers.
|
||||
|
||||
<table>
|
||||
<caption style="display:none">Valid request header values for OpenAPI v3 queries</caption>
|
||||
|
@ -126,9 +150,6 @@ table below for accepted request headers.
|
|||
</tbody>
|
||||
</table>
|
||||
|
||||
A discovery endpoint `/openapi/v3` is provided to see a list of all
|
||||
group/versions available. This endpoint only returns JSON.
|
||||
|
||||
## Persistence
|
||||
|
||||
Kubernetes stores the serialized state of objects by writing them into
|
||||
|
|
|
@ -442,7 +442,7 @@ pods 0 10
|
|||
|
||||
### Cross-namespace Pod Affinity Quota
|
||||
|
||||
{{< feature-state for_k8s_version="v1.22" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
|
||||
|
||||
Operators can use `CrossNamespacePodAffinity` quota scope to limit which namespaces are allowed to
|
||||
have pods with affinity terms that cross namespaces. Specifically, it controls which pods are allowed
|
||||
|
@ -493,10 +493,6 @@ With the above configuration, pods can use `namespaces` and `namespaceSelector`
|
|||
if the namespace where they are created have a resource quota object with
|
||||
`CrossNamespaceAffinity` scope and a hard limit greater than or equal to the number of pods using those fields.
|
||||
|
||||
This feature is beta and enabled by default. You can disable it using the
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
`PodAffinityNamespaceSelector` in both kube-apiserver and kube-scheduler.
|
||||
|
||||
## Requests compared to Limits {#requests-vs-limits}
|
||||
|
||||
When allocating compute resources, each container may specify a request and a limit value for either CPU or memory.
|
||||
|
|
|
@ -120,7 +120,7 @@ your Pod spec.
|
|||
|
||||
For example, consider the following Pod spec:
|
||||
|
||||
{{<codenew file="pods/pod-with-node-affinity.yaml">}}
|
||||
{{< codenew file="pods/pod-with-node-affinity.yaml" >}}
|
||||
|
||||
In this example, the following rules apply:
|
||||
|
||||
|
@ -167,7 +167,7 @@ scheduling decision for the Pod.
|
|||
|
||||
For example, consider the following Pod spec:
|
||||
|
||||
{{<codenew file="pods/pod-with-affinity-anti-affinity.yaml">}}
|
||||
{{< codenew file="pods/pod-with-affinity-anti-affinity.yaml" >}}
|
||||
|
||||
If there are two possible nodes that match the
|
||||
`requiredDuringSchedulingIgnoredDuringExecution` rule, one with the
|
||||
|
@ -302,9 +302,8 @@ the Pod onto a node that is in the same zone as one or more Pods with the label
|
|||
`topology.kubernetes.io/zone=R` label if there are other nodes in the
|
||||
same zone currently running Pods with the `Security=S2` Pod label.
|
||||
|
||||
See the
|
||||
[design doc](https://git.k8s.io/community/contributors/design-proposals/scheduling/podaffinity.md)
|
||||
for many more examples of Pod affinity and anti-affinity.
|
||||
To get yourself more familiar with the examples of Pod affinity and anti-affinity,
|
||||
refer to the [design proposal](https://github.com/kubernetes/design-proposals-archive/blob/main/scheduling/podaffinity.md).
|
||||
|
||||
You can use the `In`, `NotIn`, `Exists` and `DoesNotExist` values in the
|
||||
`operator` field for Pod affinity and anti-affinity.
|
||||
|
@ -326,19 +325,13 @@ If omitted or empty, `namespaces` defaults to the namespace of the Pod where the
|
|||
affinity/anti-affinity definition appears.
|
||||
|
||||
#### Namespace selector
|
||||
{{< feature-state for_k8s_version="v1.22" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
|
||||
|
||||
You can also select matching namespaces using `namespaceSelector`, which is a label query over the set of namespaces.
|
||||
The affinity term is applied to namespaces selected by both `namespaceSelector` and the `namespaces` field.
|
||||
Note that an empty `namespaceSelector` ({}) matches all namespaces, while a null or empty `namespaces` list and
|
||||
null `namespaceSelector` matches the namespace of the Pod where the rule is defined.
|
||||
|
||||
{{<note>}}
|
||||
This feature is beta and enabled by default. You can disable it via the
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
`PodAffinityNamespaceSelector` in both kube-apiserver and kube-scheduler.
|
||||
{{</note>}}
|
||||
|
||||
#### More practical use-cases
|
||||
|
||||
Inter-pod affinity and anti-affinity can be even more useful when they are used with higher
|
||||
|
|
|
@ -10,17 +10,12 @@ weight: 30
|
|||
|
||||
<!-- overview -->
|
||||
|
||||
{{< feature-state for_k8s_version="v1.18" state="beta" >}}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
|
||||
|
||||
When you run a Pod on a Node, the Pod itself takes an amount of system resources. These
|
||||
resources are additional to the resources needed to run the container(s) inside the Pod.
|
||||
_Pod Overhead_ is a feature for accounting for the resources consumed by the Pod infrastructure
|
||||
on top of the container requests & limits.
|
||||
|
||||
|
||||
|
||||
|
||||
In Kubernetes, _Pod Overhead_ is a way to account for the resources consumed by the Pod
|
||||
infrastructure on top of the container requests & limits.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
|
@ -29,26 +24,23 @@ In Kubernetes, the Pod's overhead is set at
|
|||
time according to the overhead associated with the Pod's
|
||||
[RuntimeClass](/docs/concepts/containers/runtime-class/).
|
||||
|
||||
When Pod Overhead is enabled, the overhead is considered in addition to the sum of container
|
||||
resource requests when scheduling a Pod. Similarly, the kubelet will include the Pod overhead when sizing
|
||||
the Pod cgroup, and when carrying out Pod eviction ranking.
|
||||
A pod's overhead is considered in addition to the sum of container resource requests when
|
||||
scheduling a Pod. Similarly, the kubelet will include the Pod overhead when sizing the Pod cgroup,
|
||||
and when carrying out Pod eviction ranking.
|
||||
|
||||
## Enabling Pod Overhead {#set-up}
|
||||
## Configuring Pod overhead {#set-up}
|
||||
|
||||
You need to make sure that the `PodOverhead`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is enabled (it is on by default as of 1.18)
|
||||
across your cluster, and a `RuntimeClass` is utilized which defines the `overhead` field.
|
||||
You need to make sure a `RuntimeClass` is utilized which defines the `overhead` field.
|
||||
|
||||
## Usage example
|
||||
|
||||
To use the PodOverhead feature, you need a RuntimeClass that defines the `overhead` field. As
|
||||
an example, you could use the following RuntimeClass definition with a virtualizing container runtime
|
||||
that uses around 120MiB per Pod for the virtual machine and the guest OS:
|
||||
To work with Pod overhead, you need a RuntimeClass that defines the `overhead` field. As
|
||||
an example, you could use the following RuntimeClass definition with a virtualization container
|
||||
runtime that uses around 120MiB per Pod for the virtual machine and the guest OS:
|
||||
|
||||
```yaml
|
||||
---
|
||||
kind: RuntimeClass
|
||||
apiVersion: node.k8s.io/v1
|
||||
kind: RuntimeClass
|
||||
metadata:
|
||||
name: kata-fc
|
||||
handler: kata-fc
|
||||
|
@ -92,13 +84,15 @@ updates the workload's PodSpec to include the `overhead` as described in the Run
|
|||
the Pod will be rejected. In the given example, since only the RuntimeClass name is specified, the admission controller mutates the Pod
|
||||
to include an `overhead`.
|
||||
|
||||
After the RuntimeClass admission controller, you can check the updated PodSpec:
|
||||
After the RuntimeClass admission controller has made modifications, you can check the updated
|
||||
Pod overhead value:
|
||||
|
||||
```bash
|
||||
kubectl get pod test-pod -o jsonpath='{.spec.overhead}'
|
||||
```
|
||||
|
||||
The output is:
|
||||
|
||||
```
|
||||
map[cpu:250m memory:120Mi]
|
||||
```
|
||||
|
@ -110,33 +104,38 @@ When the kube-scheduler is deciding which node should run a new Pod, the schedul
|
|||
`overhead` as well as the sum of container requests for that Pod. For this example, the scheduler adds the
|
||||
requests and the overhead, then looks for a node that has 2.25 CPU and 320 MiB of memory available.
|
||||
|
||||
Once a Pod is scheduled to a node, the kubelet on that node creates a new {{< glossary_tooltip text="cgroup" term_id="cgroup" >}}
|
||||
for the Pod. It is within this pod that the underlying container runtime will create containers.
|
||||
Once a Pod is scheduled to a node, the kubelet on that node creates a new {{< glossary_tooltip
|
||||
text="cgroup" term_id="cgroup" >}} for the Pod. It is within this pod that the underlying
|
||||
container runtime will create containers.
|
||||
|
||||
If the resource has a limit defined for each container (Guaranteed QoS or Bustrable QoS with limits defined),
|
||||
If the resource has a limit defined for each container (Guaranteed QoS or Burstable QoS with limits defined),
|
||||
the kubelet will set an upper limit for the pod cgroup associated with that resource (cpu.cfs_quota_us for CPU
|
||||
and memory.limit_in_bytes memory). This upper limit is based on the sum of the container limits plus the `overhead`
|
||||
defined in the PodSpec.
|
||||
|
||||
For CPU, if the Pod is Guaranteed or Burstable QoS, the kubelet will set `cpu.shares` based on the sum of container
|
||||
requests plus the `overhead` defined in the PodSpec.
|
||||
For CPU, if the Pod is Guaranteed or Burstable QoS, the kubelet will set `cpu.shares` based on the
|
||||
sum of container requests plus the `overhead` defined in the PodSpec.
|
||||
|
||||
Looking at our example, verify the container requests for the workload:
|
||||
|
||||
```bash
|
||||
kubectl get pod test-pod -o jsonpath='{.spec.containers[*].resources.limits}'
|
||||
```
|
||||
|
||||
The total container requests are 2000m CPU and 200MiB of memory:
|
||||
|
||||
```
|
||||
map[cpu: 500m memory:100Mi] map[cpu:1500m memory:100Mi]
|
||||
```
|
||||
|
||||
Check this against what is observed by the node:
|
||||
|
||||
```bash
|
||||
kubectl describe node | grep test-pod -B2
|
||||
```
|
||||
|
||||
The output shows 2250m CPU and 320MiB of memory are requested, which includes PodOverhead:
|
||||
The output shows requests for 2250m CPU, and for 320MiB of memory. The requests include Pod overhead:
|
||||
|
||||
```
|
||||
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
|
||||
--------- ---- ------------ ---------- --------------- ------------- ---
|
||||
|
@ -145,9 +144,10 @@ The output shows 2250m CPU and 320MiB of memory are requested, which includes Po
|
|||
|
||||
## Verify Pod cgroup limits
|
||||
|
||||
Check the Pod's memory cgroups on the node where the workload is running. In the following example, [`crictl`](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md)
|
||||
Check the Pod's memory cgroups on the node where the workload is running. In the following example,
|
||||
[`crictl`](https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/crictl.md)
|
||||
is used on the node, which provides a CLI for CRI-compatible container runtimes. This is an
|
||||
advanced example to show PodOverhead behavior, and it is not expected that users should need to check
|
||||
advanced example to show Pod overhead behavior, and it is not expected that users should need to check
|
||||
cgroups directly on the node.
|
||||
|
||||
First, on the particular node, determine the Pod identifier:
|
||||
|
@ -158,17 +158,21 @@ POD_ID="$(sudo crictl pods --name test-pod -q)"
|
|||
```
|
||||
|
||||
From this, you can determine the cgroup path for the Pod:
|
||||
|
||||
```bash
|
||||
# Run this on the node where the Pod is scheduled
|
||||
sudo crictl inspectp -o=json $POD_ID | grep cgroupsPath
|
||||
```
|
||||
|
||||
The resulting cgroup path includes the Pod's `pause` container. The Pod level cgroup is one directory above.
|
||||
|
||||
```
|
||||
"cgroupsPath": "/kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2/7ccf55aee35dd16aca4189c952d83487297f3cd760f1bbf09620e206e7d0c27a"
|
||||
```
|
||||
|
||||
In this specific case, the pod cgroup path is `kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2`. Verify the Pod level cgroup setting for memory:
|
||||
In this specific case, the pod cgroup path is `kubepods/podd7f4b509-cf94-4951-9417-d1087c92a5b2`.
|
||||
Verify the Pod level cgroup setting for memory:
|
||||
|
||||
```bash
|
||||
# Run this on the node where the Pod is scheduled.
|
||||
# Also, change the name of the cgroup to match the cgroup allocated for your pod.
|
||||
|
@ -176,22 +180,20 @@ In this specific case, the pod cgroup path is `kubepods/podd7f4b509-cf94-4951-94
|
|||
```
|
||||
|
||||
This is 320 MiB, as expected:
|
||||
|
||||
```
|
||||
335544320
|
||||
```
|
||||
|
||||
### Observability
|
||||
|
||||
A `kube_pod_overhead` metric is available in [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
|
||||
to help identify when PodOverhead is being utilized and to help observe stability of workloads
|
||||
running with a defined Overhead. This functionality is not available in the 1.9 release of
|
||||
kube-state-metrics, but is expected in a following release. Users will need to build kube-state-metrics
|
||||
from source in the meantime.
|
||||
|
||||
|
||||
Some `kube_pod_overhead_*` metrics are available in [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
|
||||
to help identify when Pod overhead is being utilized and to help observe stability of workloads
|
||||
running with a defined overhead.
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
* Learn more about [RuntimeClass](/docs/concepts/containers/runtime-class/)
|
||||
* Read the [PodOverhead Design](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/688-pod-overhead)
|
||||
enhancement proposal for extra context
|
||||
|
||||
* [RuntimeClass](/docs/concepts/containers/runtime-class/)
|
||||
* [PodOverhead Design](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/688-pod-overhead)
|
||||
|
|
|
@ -104,7 +104,7 @@ description: "This priority class should be used for XYZ service pods only."
|
|||
|
||||
## Non-preempting PriorityClass {#non-preempting-priority-class}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.19" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
|
||||
|
||||
Pods with `preemptionPolicy: Never` will be placed in the scheduling queue
|
||||
ahead of lower-priority pods,
|
||||
|
@ -203,9 +203,10 @@ resources reserved for Pod P and also gives users information about preemptions
|
|||
in their clusters.
|
||||
|
||||
Please note that Pod P is not necessarily scheduled to the "nominated Node".
|
||||
The scheduler always tries the "nominated Node" before iterating over any other nodes.
|
||||
After victim Pods are preempted, they get their graceful termination period. If
|
||||
another node becomes available while scheduler is waiting for the victim Pods to
|
||||
terminate, scheduler will use the other node to schedule Pod P. As a result
|
||||
terminate, scheduler may use the other node to schedule Pod P. As a result
|
||||
`nominatedNodeName` and `nodeName` of Pod spec are not always the same. Also, if
|
||||
scheduler preempts Pods on Node N, but then a higher priority Pod than Pod P
|
||||
arrives, scheduler may give Node N to the new higher priority Pod. In such a
|
||||
|
|
|
@ -134,7 +134,7 @@ for the corresponding API object, and then written to the object store (shown as
|
|||
Kubernetes auditing provides a security-relevant, chronological set of records documenting the sequence of actions in a cluster.
|
||||
The cluster audits the activities generated by users, by applications that use the Kubernetes API, and by the control plane itself.
|
||||
|
||||
For more information, see [Auditing](/docs/tasks/debug-application-cluster/audit/).
|
||||
For more information, see [Auditing](/docs/tasks/debug/debug-cluster/audit/).
|
||||
|
||||
## API server ports and IPs
|
||||
|
||||
|
|
|
@ -19,7 +19,7 @@ The Kubernetes [Pod Security Standards](/docs/concepts/security/pod-security-sta
|
|||
different isolation levels for Pods. These standards let you define how you want to restrict the
|
||||
behavior of pods in a clear, consistent fashion.
|
||||
|
||||
As a Beta feature, Kubernetes offers a built-in _Pod Security_ {{< glossary_tooltip
|
||||
As a beta feature, Kubernetes offers a built-in _Pod Security_ {{< glossary_tooltip
|
||||
text="admission controller" term_id="admission-controller" >}}, the successor
|
||||
to [PodSecurityPolicies](/docs/concepts/security/pod-security-policy/). Pod security restrictions
|
||||
are applied at the {{< glossary_tooltip text="namespace" term_id="namespace" >}} level when pods
|
||||
|
@ -30,25 +30,21 @@ The PodSecurityPolicy API is deprecated and will be
|
|||
[removed](/docs/reference/using-api/deprecation-guide/#v1-25) from Kubernetes in v1.25.
|
||||
{{< /note >}}
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## Enabling the `PodSecurity` admission plugin
|
||||
## {{% heading "prerequisites" %}}
|
||||
|
||||
In v1.23, the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
is a Beta feature and is enabled by default.
|
||||
To use this mechanism, your cluster must enforce Pod Security admission.
|
||||
|
||||
In v1.22, the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
is an Alpha feature and must be enabled in `kube-apiserver` in order to use the built-in admission plugin.
|
||||
### Built-in Pod Security admission enforcement
|
||||
|
||||
```shell
|
||||
--feature-gates="...,PodSecurity=true"
|
||||
```
|
||||
In Kubernetes v{{< skew currentVersion >}}, the `PodSecurity` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
is a beta feature and is enabled by default. You must have this feature gate enabled.
|
||||
If you are running a different version of Kubernetes, consult the documentation for that release.
|
||||
|
||||
## Alternative: installing the `PodSecurity` admission webhook {#webhook}
|
||||
### Alternative: installing the `PodSecurity` admission webhook {#webhook}
|
||||
|
||||
For environments where the built-in `PodSecurity` admission plugin cannot be used,
|
||||
either because the cluster is older than v1.22, or the `PodSecurity` feature cannot be enabled,
|
||||
the `PodSecurity` admission logic is also available as a Beta [validating admission webhook](https://git.k8s.io/pod-security-admission/webhook).
|
||||
The `PodSecurity` admission logic is also available as a [validating admission webhook](https://git.k8s.io/pod-security-admission/webhook). This implementation is also beta.
|
||||
For environments where the built-in `PodSecurity` admission plugin cannot be enabled, you can instead enable that logic via a validating admission webhook.
|
||||
|
||||
A pre-built container image, certificate generation scripts, and example manifests
|
||||
are available at [https://git.k8s.io/pod-security-admission/webhook](https://git.k8s.io/pod-security-admission/webhook).
|
||||
|
@ -66,6 +62,8 @@ The generated certificate is valid for 2 years. Before it expires,
|
|||
regenerate the certificate or remove the webhook in favor of the built-in admission plugin.
|
||||
{{< /note >}}
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## Pod Security levels
|
||||
|
||||
Pod Security admission places requirements on a Pod's [Security
|
||||
|
@ -88,7 +86,7 @@ takes if a potential violation is detected:
|
|||
Mode | Description
|
||||
:---------|:------------
|
||||
**enforce** | Policy violations will cause the pod to be rejected.
|
||||
**audit** | Policy violations will trigger the addition of an audit annotation to the event recorded in the [audit log](/docs/tasks/debug-application-cluster/audit/), but are otherwise allowed.
|
||||
**audit** | Policy violations will trigger the addition of an audit annotation to the event recorded in the [audit log](/docs/tasks/debug/debug-cluster/audit/), but are otherwise allowed.
|
||||
**warn** | Policy violations will trigger a user-facing warning, but are otherwise allowed.
|
||||
{{< /table >}}
|
||||
|
||||
|
|
|
@ -658,8 +658,7 @@ added. Capabilities listed in `RequiredDropCapabilities` must not be included in
|
|||
|
||||
**DefaultAddCapabilities** - The capabilities which are added to containers by
|
||||
default, in addition to the runtime defaults. See the
|
||||
[Docker documentation](https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities)
|
||||
for the default list of capabilities when using the Docker runtime.
|
||||
documentation for your container runtime for information on working with Linux capabilities.
|
||||
|
||||
### SELinux
|
||||
|
||||
|
|
|
@ -29,10 +29,9 @@ This guide outlines the requirements of each policy.
|
|||
**The _Privileged_ policy is purposely-open, and entirely unrestricted.** This type of policy is
|
||||
typically aimed at system- and infrastructure-level workloads managed by privileged, trusted users.
|
||||
|
||||
The Privileged policy is defined by an absence of restrictions. For allow-by-default enforcement
|
||||
mechanisms (such as gatekeeper), the Privileged policy may be an absence of applied constraints
|
||||
rather than an instantiated profile. In contrast, for a deny-by-default mechanism (such as Pod
|
||||
Security Policy) the Privileged policy should enable all controls (disable all restrictions).
|
||||
The Privileged policy is defined by an absence of restrictions. Allow-by-default
|
||||
mechanisms (such as gatekeeper) may be Privileged by default. In contrast, for a deny-by-default mechanism (such as Pod
|
||||
Security Policy) the Privileged policy should disable all restrictions.
|
||||
|
||||
### Baseline
|
||||
|
||||
|
@ -458,6 +457,16 @@ of individual policies are not defined here.
|
|||
- {{< example file="policy/baseline-psp.yaml" >}}Baseline{{< /example >}}
|
||||
- {{< example file="policy/restricted-psp.yaml" >}}Restricted{{< /example >}}
|
||||
|
||||
### Alternatives
|
||||
|
||||
{{% thirdparty-content %}}
|
||||
|
||||
Other alternatives for enforcing policies are being developed in the Kubernetes ecosystem, such as:
|
||||
- [Kubewarden](https://github.com/kubewarden)
|
||||
- [Kyverno](https://kyverno.io/policies/pod-security/)
|
||||
- [OPA Gatekeeper](https://github.com/open-policy-agent/gatekeeper)
|
||||
|
||||
|
||||
## FAQ
|
||||
|
||||
### Why isn't there a profile between privileged and baseline?
|
||||
|
@ -481,14 +490,6 @@ as well as other related parameters outside the Security Context. As of July 202
|
|||
[Pod Security Policies](/docs/concepts/security/pod-security-policy/) are deprecated in favor of the
|
||||
built-in [Pod Security Admission Controller](/docs/concepts/security/pod-security-admission/).
|
||||
|
||||
{{% thirdparty-content %}}
|
||||
|
||||
Other alternatives for enforcing security profiles are being developed in the Kubernetes
|
||||
ecosystem, such as:
|
||||
- [OPA Gatekeeper](https://github.com/open-policy-agent/gatekeeper).
|
||||
- [Kubewarden](https://github.com/kubewarden).
|
||||
- [Kyverno](https://kyverno.io/policies/pod-security/).
|
||||
|
||||
### What profiles should I apply to my Windows Pods?
|
||||
|
||||
Windows in Kubernetes has some limitations and differentiators from standard Linux-based
|
||||
|
|
|
@ -0,0 +1,179 @@
|
|||
---
|
||||
reviewers:
|
||||
title: Role Based Access Control Good Practices
|
||||
description: >
|
||||
Principles and practices for good RBAC design for cluster operators.
|
||||
content_type: concept
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
Kubernetes {{< glossary_tooltip text="RBAC" term_id="rbac" >}} is a key security control
|
||||
to ensure that cluster users and workloads have only the access to resources required to
|
||||
execute their roles. It is important to ensure that, when designing permissions for cluster
|
||||
users, the cluster administrator understands the areas where privilge escalation could occur,
|
||||
to reduce the risk of excessive access leading to security incidents.
|
||||
|
||||
The good practices laid out here should be read in conjunction with the general [RBAC documentation](/docs/reference/access-authn-authz/rbac/#restrictions-on-role-creation-or-update).
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## General good practice
|
||||
|
||||
### Least privilege
|
||||
|
||||
Ideally minimal RBAC rights should be assigned to users and service accounts. Only permissions
|
||||
explicitly required for their operation should be used. Whilst each cluster will be different,
|
||||
some general rules that can be applied are :
|
||||
|
||||
- Assign permissions at the namespace level where possible. Use RoleBindings as opposed to
|
||||
ClusterRoleBindings to give users rights only within a specific namespace.
|
||||
- Avoid providing wildcard permissions when possible, especially to all resources.
|
||||
As Kubernetes is an extensible system, providing wildcard access gives rights
|
||||
not just to all object types presently in the cluster, but also to all future object types
|
||||
which are created in the future.
|
||||
- Administrators should not use `cluster-admin` accounts except where specifically needed.
|
||||
Providing a low privileged account with [impersonation rights](/docs/reference/access-authn-authz/authentication/#user-impersonation)
|
||||
can avoid accidental modification of cluster resources.
|
||||
- Avoid adding users to the `system:masters` group. Any user who is a member of this group
|
||||
bypasses all RBAC rights checks and will always have unrestricted superuser access, which cannot be
|
||||
revoked by removing RoleBindings or ClusterRoleBindings. As an aside, if a cluster is
|
||||
using an authorization webhook, membership of this group also bypasses that webhook (requests
|
||||
from users who are members of that group are never sent to the webhook)
|
||||
|
||||
### Minimize distribution of privileged tokens
|
||||
|
||||
Ideally, pods shouldn't be assigned service accounts that have been granted powerful permissions (for example, any of the rights listed under
|
||||
[privilege escalation risks](#privilege-escalation-risks)).
|
||||
In cases where a workload requires powerful permissions, consider the following practices:
|
||||
|
||||
- Limit the number of nodes running powerful pods. Ensure that any DaemonSets you run
|
||||
are necessary and are run with least privilege to limit the blast radius of container escapes.
|
||||
- Avoid running powerful pods alongside untrusted or publicly-exposed ones. Consider using
|
||||
[Taints and Toleration](/docs/concepts/scheduling-eviction/taint-and-toleration/), [NodeAffinity](/docs/concepts/scheduling-eviction/assign-pod-node/#node-affinity), or [PodAntiAffinity](/docs/concepts/scheduling-eviction/assign-pod-node/#inter-pod-affinity-and-anti-affinity) to ensure
|
||||
pods don't run alongside untrusted or less-trusted Pods. Pay especial attention to
|
||||
situations where less-trustworthy Pods are not meeting the **Restricted** Pod Security Standard.
|
||||
|
||||
### Hardening
|
||||
|
||||
Kubernetes defaults to providing access which may not be required in every cluster. Reviewing
|
||||
the RBAC rights provided by default can provide opportunities for security hardening.
|
||||
In general, changes should not be made to rights provided to `system:` accounts some options
|
||||
to harden cluster rights exist:
|
||||
|
||||
- Review bindings for the `system:unauthenticated` group and remove where possible, as this gives
|
||||
access to anyone who can contact the API server at a network level.
|
||||
- Avoid the default auto-mounting of service account tokens by setting
|
||||
`automountServiceAccountToken: false`. For more details, see
|
||||
[using default service account token](/docs/tasks/configure-pod-container/configure-service-account/#use-the-default-service-account-to-access-the-api-server).
|
||||
Setting this value for a Pod will overwrite the service account setting, workloads
|
||||
which require service account tokens can still mount them.
|
||||
|
||||
### Periodic review
|
||||
|
||||
It is vital to periodically review the Kubernetes RBAC settings for redundant entries and
|
||||
possible privilege escalations.
|
||||
If an attacker is able to create a user account with the same name as a deleted user,
|
||||
they can automatically inherit all the rights of the deleted user, especially the
|
||||
rights assigned to that user.
|
||||
|
||||
## Kubernetes RBAC - privilege escalation risks {#privilege-escalation-risks}
|
||||
|
||||
Within Kubernetes RBAC there are a number of privileges which, if granted, can allow a user or a service account
|
||||
to escalate their privileges in the cluster or affect systems outside the cluster.
|
||||
|
||||
This section is intended to provide visibility of the areas where cluster operators
|
||||
should take care, to ensure that they do not inadvertently allow for more access to clusters than intended.
|
||||
|
||||
### Listing secrets
|
||||
|
||||
It is generally clear that allowing `get` access on Secrets will allow a user to read their contents.
|
||||
It is also important to note that `list` and `watch` access also effectively allow for users to reveal the Secret contents.
|
||||
For example, when a List response is returned (for example, via `kubectl get secrets -A -o yaml`), the response
|
||||
includes the contents of all Secrets.
|
||||
|
||||
### Workload creation
|
||||
|
||||
Users who are able to create workloads (either Pods, or
|
||||
[workload resources](/docs/concepts/workloads/controllers/) that manage Pods) will
|
||||
be able to gain access to the underlying node unless restrictions based on the Kubernetes
|
||||
[Pod Security Standards](/docs/concepts/security/pod-security-standards/) are in place.
|
||||
|
||||
Users who can run privileged Pods can use that access to gain node access and potentially to
|
||||
further elevate their privileges. Where you do not fully trust a user or other principal
|
||||
with the ability to create suitably secure and isolated Pods, you should enforce either the
|
||||
**Baseline** or **Restricted** Pod Security Standard.
|
||||
You can use [Pod Security admission](/docs/concepts/security/pod-security-admission/)
|
||||
or other (third party) mechanisms to implement that enforcement.
|
||||
|
||||
You can also use the deprecated [PodSecurityPolicy](/docs/concepts/policy/pod-security-policy/) mechanism
|
||||
to restrict users' abilities to create privileged Pods (N.B. PodSecurityPolicy is scheduled for removal
|
||||
in version 1.25).
|
||||
|
||||
Creating a workload in a namespace also grants indirect access to Secrets in that namespace.
|
||||
Creating a pod in kube-system or a similarly privileged namespace can grant a user access to
|
||||
Secrets they would not have through RBAC directly.
|
||||
|
||||
### Persistent volume creation
|
||||
|
||||
As noted in the [PodSecurityPolicy](/docs/concepts/policy/pod-security-policy/#volumes-and-file-systems) documentation, access to create PersistentVolumes can allow for escalation of access to the underlying host. Where access to persistent storage is required trusted administrators should create
|
||||
PersistentVolumes, and constrained users should use PersistentVolumeClaims to access that storage.
|
||||
|
||||
### Access to `proxy` subresource of Nodes
|
||||
|
||||
Users with access to the proxy sub-resource of node objects have rights to the Kubelet API,
|
||||
which allows for command execution on every pod on the node(s) which they have rights to.
|
||||
This access bypasses audit logging and admission control, so care should be taken before
|
||||
granting rights to this resource.
|
||||
|
||||
### Escalate verb
|
||||
|
||||
Generally the RBAC system prevents users from creating clusterroles with more rights than
|
||||
they possess. The exception to this is the `escalate` verb. As noted in the [RBAC documentation](/docs/reference/access-authn-authz/rbac/#restrictions-on-role-creation-or-update),
|
||||
users with this right can effectively escalate their privileges.
|
||||
|
||||
### Bind verb
|
||||
|
||||
Similar to the `escalate` verb, granting users this right allows for bypass of Kubernetes
|
||||
in-built protections against privilege escalation, allowing users to create bindings to
|
||||
roles with rights they do not already have.
|
||||
|
||||
### Impersonate verb
|
||||
|
||||
This verb allows users to impersonate and gain the rights of other users in the cluster.
|
||||
Care should be taken when granting it, to ensure that excessive permissions cannot be gained
|
||||
via one of the impersonated accounts.
|
||||
|
||||
### CSRs and certificate issuing
|
||||
|
||||
The CSR API allows for users with `create` rights to CSRs and `update` rights on `certificatesigningrequests/approval`
|
||||
where the signer is `kubernetes.io/kube-apiserver-client` to create new client certificates
|
||||
which allow users to authenticate to the cluster. Those client certificates can have arbitrary
|
||||
names including duplicates of Kubernetes system components. This will effectively allow for privilege escalation.
|
||||
|
||||
### Token request
|
||||
|
||||
Users with `create` rights on `serviceaccounts/token` can create TokenRequests to issue
|
||||
tokens for existing service accounts.
|
||||
|
||||
### Control admission webhooks
|
||||
|
||||
Users with control over `validatingwebhookconfigurations` or `mutatingwebhookconfigurations`
|
||||
can control webhooks that can read any object admitted to the cluster, and in the case of
|
||||
mutating webhooks, also mutate admitted objects.
|
||||
|
||||
|
||||
## Kubernetes RBAC - denial of service risks {#denial-of-service-risks}
|
||||
|
||||
### Object creation denial-of-service {#object-creation-dos}
|
||||
Users who have rights to create objects in a cluster may be able to create sufficient large
|
||||
objects to create a denial of service condition either based on the size or number of objects, as discussed in
|
||||
[etcd used by Kubernetes is vulnerable to OOM attack](https://github.com/kubernetes/kubernetes/issues/107325). This may be
|
||||
specifically relevant in multi-tenant clusters if semi-trusted or untrusted users
|
||||
are allowed limited access to a system.
|
||||
|
||||
One option for mitigation of this issue would be to use [resource quotas](/docs/concepts/policy/resource-quotas/#object-count-quota)
|
||||
to limit the quantity of objects which can be created.
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
* To learn more about RBAC, see the [RBAC documentation](/docs/reference/access-authn-authz/rbac/).
|
|
@ -0,0 +1,55 @@
|
|||
---
|
||||
reviewers:
|
||||
- jayunit100
|
||||
- jsturtevant
|
||||
- marosset
|
||||
- perithompson
|
||||
title: Security For Windows Nodes
|
||||
content_type: concept
|
||||
weight: 75
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
This page describes security considerations and best practices specific to the Windows operating system.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## Protection for Secret data on nodes
|
||||
|
||||
On Windows, data from Secrets are written out in clear text onto the node's local
|
||||
storage (as compared to using tmpfs / in-memory filesystems on Linux). As a cluster
|
||||
operator, you should take both of the following additional measures:
|
||||
|
||||
1. Use file ACLs to secure the Secrets' file location.
|
||||
1. Apply volume-level encryption using [BitLocker](https://docs.microsoft.com/windows/security/information-protection/bitlocker/bitlocker-how-to-deploy-on-windows-server).
|
||||
|
||||
## Container users
|
||||
|
||||
[RunAsUsername](/docs/tasks/configure-pod-container/configure-runasusername)
|
||||
can be specified for Windows Pods or containers to execute the container
|
||||
processes as specific user. This is roughly equivalent to
|
||||
[RunAsUser](/docs/concepts/policy/pod-security-policy/#users-and-groups).
|
||||
|
||||
Windows containers offer two default user accounts, ContainerUser and ContainerAdministrator.
|
||||
The differences between these two user accounts are covered in
|
||||
[When to use ContainerAdmin and ContainerUser user accounts](https://docs.microsoft.com/virtualization/windowscontainers/manage-containers/container-security#when-to-use-containeradmin-and-containeruser-user-accounts) within Microsoft's _Secure Windows containers_ documentation.
|
||||
|
||||
Local users can be added to container images during the container build process.
|
||||
|
||||
{{< note >}}
|
||||
|
||||
* [Nano Server](https://hub.docker.com/_/microsoft-windows-nanoserver) based images run as `ContainerUser` by default
|
||||
* [Server Core](https://hub.docker.com/_/microsoft-windows-servercore) based images run as `ContainerAdministrator` by default
|
||||
|
||||
{{< /note >}}
|
||||
|
||||
Windows containers can also run as Active Directory identities by utilizing [Group Managed Service Accounts](/docs/tasks/configure-pod-container/configure-gmsa/)
|
||||
|
||||
## Pod-level security isolation
|
||||
|
||||
Linux-specific pod security context mechanisms (such as SELinux, AppArmor, Seccomp, or custom
|
||||
POSIX capabilities) are not supported on Windows nodes.
|
||||
|
||||
Privileged containers are [not supported](/docs/concepts/windows/intro/#compatibility-v1-pod-spec-containers-securitycontext) on Windows.
|
||||
Instead [HostProcess containers](/docs/tasks/configure-pod-container/create-hostprocess-pod) can be used on Windows to perform many of the tasks performed by privileged containers on Linux.
|
|
@ -7,26 +7,25 @@ description: >
|
|||
|
||||
## The Kubernetes network model
|
||||
|
||||
Every [`Pod`](/docs/concepts/workloads/pods/) gets its own IP address.
|
||||
Every [`Pod`](/docs/concepts/workloads/pods/) in a cluster gets its own unique cluster-wide IP address.
|
||||
This means you do not need to explicitly create links between `Pods` and you
|
||||
almost never need to deal with mapping container ports to host ports.
|
||||
This creates a clean, backwards-compatible model where `Pods` can be treated
|
||||
much like VMs or physical hosts from the perspectives of port allocation,
|
||||
naming, service discovery, [load balancing](/docs/concepts/services-networking/ingress/#load-balancing), application configuration,
|
||||
and migration.
|
||||
naming, service discovery, [load balancing](/docs/concepts/services-networking/ingress/#load-balancing),
|
||||
application configuration, and migration.
|
||||
|
||||
Kubernetes imposes the following fundamental requirements on any networking
|
||||
implementation (barring any intentional network segmentation policies):
|
||||
|
||||
* pods on a [node](/docs/concepts/architecture/nodes/) can communicate with all pods on all nodes without NAT
|
||||
* pods can communicate with all other pods on any other [node](/docs/concepts/architecture/nodes/)
|
||||
without NAT
|
||||
* agents on a node (e.g. system daemons, kubelet) can communicate with all
|
||||
pods on that node
|
||||
|
||||
Note: For those platforms that support `Pods` running in the host network (e.g.
|
||||
Linux):
|
||||
|
||||
* pods in the host network of a node can communicate with all pods on all
|
||||
nodes without NAT
|
||||
Linux), when pods are attached to the host network of a node they can still communicate
|
||||
with all pods on all nodes without NAT.
|
||||
|
||||
This model is not only less complex overall, but it is principally compatible
|
||||
with the desire for Kubernetes to enable low-friction porting of apps from VMs
|
||||
|
|
|
@ -8,8 +8,8 @@ weight: 20
|
|||
---
|
||||
<!-- overview -->
|
||||
|
||||
Kubernetes creates DNS records for services and pods. You can contact
|
||||
services with consistent DNS names instead of IP addresses.
|
||||
Kubernetes creates DNS records for Services and Pods. You can contact
|
||||
Services with consistent DNS names instead of IP addresses.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
|
@ -25,20 +25,20 @@ Pod's own namespace and the cluster's default domain.
|
|||
|
||||
### Namespaces of Services
|
||||
|
||||
A DNS query may return different results based on the namespace of the pod making
|
||||
it. DNS queries that don't specify a namespace are limited to the pod's
|
||||
namespace. Access services in other namespaces by specifying it in the DNS query.
|
||||
A DNS query may return different results based on the namespace of the Pod making
|
||||
it. DNS queries that don't specify a namespace are limited to the Pod's
|
||||
namespace. Access Services in other namespaces by specifying it in the DNS query.
|
||||
|
||||
For example, consider a pod in a `test` namespace. A `data` service is in
|
||||
For example, consider a Pod in a `test` namespace. A `data` Service is in
|
||||
the `prod` namespace.
|
||||
|
||||
A query for `data` returns no results, because it uses the pod's `test` namespace.
|
||||
A query for `data` returns no results, because it uses the Pod's `test` namespace.
|
||||
|
||||
A query for `data.prod` returns the intended result, because it specifies the
|
||||
namespace.
|
||||
|
||||
DNS queries may be expanded using the pod's `/etc/resolv.conf`. Kubelet
|
||||
sets this file for each pod. For example, a query for just `data` may be
|
||||
DNS queries may be expanded using the Pod's `/etc/resolv.conf`. Kubelet
|
||||
sets this file for each Pod. For example, a query for just `data` may be
|
||||
expanded to `data.test.svc.cluster.local`. The values of the `search` option
|
||||
are used to expand queries. To learn more about DNS queries, see
|
||||
[the `resolv.conf` manual page.](https://www.man7.org/linux/man-pages/man5/resolv.conf.5.html)
|
||||
|
@ -49,7 +49,7 @@ search <namespace>.svc.cluster.local svc.cluster.local cluster.local
|
|||
options ndots:5
|
||||
```
|
||||
|
||||
In summary, a pod in the _test_ namespace can successfully resolve either
|
||||
In summary, a Pod in the _test_ namespace can successfully resolve either
|
||||
`data.prod` or `data.prod.svc.cluster.local`.
|
||||
|
||||
### DNS Records
|
||||
|
@ -70,14 +70,14 @@ For more up-to-date specification, see
|
|||
### A/AAAA records
|
||||
|
||||
"Normal" (not headless) Services are assigned a DNS A or AAAA record,
|
||||
depending on the IP family of the service, for a name of the form
|
||||
depending on the IP family of the Service, for a name of the form
|
||||
`my-svc.my-namespace.svc.cluster-domain.example`. This resolves to the cluster IP
|
||||
of the Service.
|
||||
|
||||
"Headless" (without a cluster IP) Services are also assigned a DNS A or AAAA record,
|
||||
depending on the IP family of the service, for a name of the form
|
||||
depending on the IP family of the Service, for a name of the form
|
||||
`my-svc.my-namespace.svc.cluster-domain.example`. Unlike normal
|
||||
Services, this resolves to the set of IPs of the pods selected by the Service.
|
||||
Services, this resolves to the set of IPs of the Pods selected by the Service.
|
||||
Clients are expected to consume the set or else use standard round-robin
|
||||
selection from the set.
|
||||
|
||||
|
@ -87,36 +87,36 @@ SRV Records are created for named ports that are part of normal or [Headless
|
|||
Services](/docs/concepts/services-networking/service/#headless-services).
|
||||
For each named port, the SRV record would have the form
|
||||
`_my-port-name._my-port-protocol.my-svc.my-namespace.svc.cluster-domain.example`.
|
||||
For a regular service, this resolves to the port number and the domain name:
|
||||
For a regular Service, this resolves to the port number and the domain name:
|
||||
`my-svc.my-namespace.svc.cluster-domain.example`.
|
||||
For a headless service, this resolves to multiple answers, one for each pod
|
||||
that is backing the service, and contains the port number and the domain name of the pod
|
||||
For a headless Service, this resolves to multiple answers, one for each Pod
|
||||
that is backing the Service, and contains the port number and the domain name of the Pod
|
||||
of the form `auto-generated-name.my-svc.my-namespace.svc.cluster-domain.example`.
|
||||
|
||||
## Pods
|
||||
|
||||
### A/AAAA records
|
||||
|
||||
In general a pod has the following DNS resolution:
|
||||
In general a Pod has the following DNS resolution:
|
||||
|
||||
`pod-ip-address.my-namespace.pod.cluster-domain.example`.
|
||||
|
||||
For example, if a pod in the `default` namespace has the IP address 172.17.0.3,
|
||||
For example, if a Pod in the `default` namespace has the IP address 172.17.0.3,
|
||||
and the domain name for your cluster is `cluster.local`, then the Pod has a DNS name:
|
||||
|
||||
`172-17-0-3.default.pod.cluster.local`.
|
||||
|
||||
Any pods exposed by a Service have the following DNS resolution available:
|
||||
Any Pods exposed by a Service have the following DNS resolution available:
|
||||
|
||||
`pod-ip-address.service-name.my-namespace.svc.cluster-domain.example`.
|
||||
|
||||
### Pod's hostname and subdomain fields
|
||||
|
||||
Currently when a pod is created, its hostname is the Pod's `metadata.name` value.
|
||||
Currently when a Pod is created, its hostname is the Pod's `metadata.name` value.
|
||||
|
||||
The Pod spec has an optional `hostname` field, which can be used to specify the
|
||||
Pod's hostname. When specified, it takes precedence over the Pod's name to be
|
||||
the hostname of the pod. For example, given a Pod with `hostname` set to
|
||||
the hostname of the Pod. For example, given a Pod with `hostname` set to
|
||||
"`my-host`", the Pod will have its hostname set to "`my-host`".
|
||||
|
||||
The Pod spec also has an optional `subdomain` field which can be used to specify
|
||||
|
@ -173,14 +173,14 @@ spec:
|
|||
name: busybox
|
||||
```
|
||||
|
||||
If there exists a headless service in the same namespace as the pod and with
|
||||
If there exists a headless Service in the same namespace as the Pod and with
|
||||
the same name as the subdomain, the cluster's DNS Server also returns an A or AAAA
|
||||
record for the Pod's fully qualified hostname.
|
||||
For example, given a Pod with the hostname set to "`busybox-1`" and the subdomain set to
|
||||
"`default-subdomain`", and a headless Service named "`default-subdomain`" in
|
||||
the same namespace, the pod will see its own FQDN as
|
||||
the same namespace, the Pod will see its own FQDN as
|
||||
"`busybox-1.default-subdomain.my-namespace.svc.cluster-domain.example`". DNS serves an
|
||||
A or AAAA record at that name, pointing to the Pod's IP. Both pods "`busybox1`" and
|
||||
A or AAAA record at that name, pointing to the Pod's IP. Both Pods "`busybox1`" and
|
||||
"`busybox2`" can have their distinct A or AAAA records.
|
||||
|
||||
The Endpoints object can specify the `hostname` for any endpoint addresses,
|
||||
|
@ -189,7 +189,7 @@ along with its IP.
|
|||
{{< note >}}
|
||||
Because A or AAAA records are not created for Pod names, `hostname` is required for the Pod's A or AAAA
|
||||
record to be created. A Pod with no `hostname` but with `subdomain` will only create the
|
||||
A or AAAA record for the headless service (`default-subdomain.my-namespace.svc.cluster-domain.example`),
|
||||
A or AAAA record for the headless Service (`default-subdomain.my-namespace.svc.cluster-domain.example`),
|
||||
pointing to the Pod's IP address. Also, Pod needs to become ready in order to have a
|
||||
record unless `publishNotReadyAddresses=True` is set on the Service.
|
||||
{{< /note >}}
|
||||
|
@ -205,17 +205,17 @@ When you set `setHostnameAsFQDN: true` in the Pod spec, the kubelet writes the P
|
|||
{{< note >}}
|
||||
In Linux, the hostname field of the kernel (the `nodename` field of `struct utsname`) is limited to 64 characters.
|
||||
|
||||
If a Pod enables this feature and its FQDN is longer than 64 character, it will fail to start. The Pod will remain in `Pending` status (`ContainerCreating` as seen by `kubectl`) generating error events, such as Failed to construct FQDN from pod hostname and cluster domain, FQDN `long-FQDN` is too long (64 characters is the max, 70 characters requested). One way of improving user experience for this scenario is to create an [admission webhook controller](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks) to control FQDN size when users create top level objects, for example, Deployment.
|
||||
If a Pod enables this feature and its FQDN is longer than 64 character, it will fail to start. The Pod will remain in `Pending` status (`ContainerCreating` as seen by `kubectl`) generating error events, such as Failed to construct FQDN from Pod hostname and cluster domain, FQDN `long-FQDN` is too long (64 characters is the max, 70 characters requested). One way of improving user experience for this scenario is to create an [admission webhook controller](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks) to control FQDN size when users create top level objects, for example, Deployment.
|
||||
{{< /note >}}
|
||||
|
||||
### Pod's DNS Policy
|
||||
|
||||
DNS policies can be set on a per-pod basis. Currently Kubernetes supports the
|
||||
following pod-specific DNS policies. These policies are specified in the
|
||||
DNS policies can be set on a per-Pod basis. Currently Kubernetes supports the
|
||||
following Pod-specific DNS policies. These policies are specified in the
|
||||
`dnsPolicy` field of a Pod Spec.
|
||||
|
||||
- "`Default`": The Pod inherits the name resolution configuration from the node
|
||||
that the pods run on.
|
||||
that the Pods run on.
|
||||
See [related discussion](/docs/tasks/administer-cluster/dns-custom-nameservers)
|
||||
for more details.
|
||||
- "`ClusterFirst`": Any DNS query that does not match the configured cluster
|
||||
|
@ -226,6 +226,7 @@ following pod-specific DNS policies. These policies are specified in the
|
|||
for details on how DNS queries are handled in those cases.
|
||||
- "`ClusterFirstWithHostNet`": For Pods running with hostNetwork, you should
|
||||
explicitly set its DNS policy "`ClusterFirstWithHostNet`".
|
||||
- Note: This is not supported on Windows. See [below](#dns-windows) for details
|
||||
- "`None`": It allows a Pod to ignore DNS settings from the Kubernetes
|
||||
environment. All DNS settings are supposed to be provided using the
|
||||
`dnsConfig` field in the Pod Spec.
|
||||
|
@ -306,7 +307,7 @@ For IPv6 setup, search path and name server should be setup like this:
|
|||
kubectl exec -it dns-example -- cat /etc/resolv.conf
|
||||
```
|
||||
The output is similar to this:
|
||||
```shell
|
||||
```
|
||||
nameserver fd00:79:30::a
|
||||
search default.svc.cluster-domain.example svc.cluster-domain.example cluster-domain.example
|
||||
options ndots:5
|
||||
|
@ -323,8 +324,25 @@ If the feature gate `ExpandedDNSConfig` is enabled for the kube-apiserver and
|
|||
the kubelet, it is allowed for Kubernetes to have at most 32 search domains and
|
||||
a list of search domains of up to 2048 characters.
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
## DNS resolution on Windows nodes {#dns-windows}
|
||||
|
||||
- ClusterFirstWithHostNet is not supported for Pods that run on Windows nodes.
|
||||
Windows treats all names with a `.` as a FQDN and skips FQDN resolution.
|
||||
- On Windows, there are multiple DNS resolvers that can be used. As these come with
|
||||
slightly different behaviors, using the
|
||||
[`Resolve-DNSName`](https://docs.microsoft.com/powershell/module/dnsclient/resolve-dnsname)
|
||||
powershell cmdlet for name query resolutions is recommended.
|
||||
- On Linux, you have a DNS suffix list, which is used after resolution of a name as fully
|
||||
qualified has failed.
|
||||
On Windows, you can only have 1 DNS suffix, which is the DNS suffix associated with that
|
||||
Pod's namespace (example: `mydns.svc.cluster.local`). Windows can resolve FQDNs, Services,
|
||||
or network name which can be resolved with this single suffix. For example, a Pod spawned
|
||||
in the `default` namespace, will have the DNS suffix `default.svc.cluster.local`.
|
||||
Inside a Windows Pod, you can resolve both `kubernetes.default.svc.cluster.local`
|
||||
and `kubernetes`, but not the partially qualified names (`kubernetes.default` or
|
||||
`kubernetes.default.svc`).
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
For guidance on administering DNS configurations, check
|
||||
[Configure DNS Service](/docs/tasks/administer-cluster/dns-custom-nameservers/)
|
||||
|
|
|
@ -43,7 +43,7 @@ The following prerequisites are needed in order to utilize IPv4/IPv6 dual-stack
|
|||
Kubernetes versions, refer to the documentation for that version
|
||||
of Kubernetes.
|
||||
* Provider support for dual-stack networking (Cloud provider or otherwise must be able to provide Kubernetes nodes with routable IPv4/IPv6 network interfaces)
|
||||
* A network plugin that supports dual-stack (such as Kubenet or Calico)
|
||||
* A [network plugin](/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/) that supports dual-stack networking.
|
||||
|
||||
## Configure IPv4/IPv6 dual-stack
|
||||
|
||||
|
@ -239,6 +239,21 @@ If you want to enable egress traffic in order to reach off-cluster destinations
|
|||
Ensure your {{< glossary_tooltip text="CNI" term_id="cni" >}} provider supports IPv6.
|
||||
{{< /note >}}
|
||||
|
||||
## Windows support
|
||||
|
||||
Kubernetes on Windows does not support single-stack "IPv6-only" networking. However,
|
||||
dual-stack IPv4/IPv6 networking for pods and nodes with single-family services
|
||||
is supported.
|
||||
|
||||
You can use IPv4/IPv6 dual-stack networking with `l2bridge` networks.
|
||||
|
||||
{{< note >}}
|
||||
Overlay (VXLAN) networks on Windows **do not** support dual-stack networking.
|
||||
{{< /note >}}
|
||||
|
||||
You can read more about the different network modes for Windows within the
|
||||
[Networking on Windows](/docs/concepts/services-networking/windows-networking#network-modes) topic.
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
|
||||
|
|
|
@ -30,23 +30,8 @@ For clarity, this guide defines the following terms:
|
|||
Traffic routing is controlled by rules defined on the Ingress resource.
|
||||
|
||||
Here is a simple example where an Ingress sends all its traffic to one Service:
|
||||
{{< mermaid >}}
|
||||
graph LR;
|
||||
client([client])-. Ingress-managed <br> load balancer .->ingress[Ingress];
|
||||
ingress-->|routing rule|service[Service];
|
||||
subgraph cluster
|
||||
ingress;
|
||||
service-->pod1[Pod];
|
||||
service-->pod2[Pod];
|
||||
end
|
||||
classDef plain fill:#ddd,stroke:#fff,stroke-width:4px,color:#000;
|
||||
classDef k8s fill:#326ce5,stroke:#fff,stroke-width:4px,color:#fff;
|
||||
classDef cluster fill:#fff,stroke:#bbb,stroke-width:2px,color:#326ce5;
|
||||
class ingress,service,pod1,pod2 k8s;
|
||||
class client plain;
|
||||
class cluster cluster;
|
||||
{{</ mermaid >}}
|
||||
|
||||
{{< figure src="/docs/images/ingress.svg" alt="ingress-diagram" class="diagram-large" caption="Figure. Ingress" link="https://mermaid.live/edit#pako:eNqNkstuwyAQRX8F4U0r2VHqPlSRKqt0UamLqlnaWWAYJygYLB59KMm_Fxcix-qmGwbuXA7DwAEzzQETXKutof0Ovb4vaoUQkwKUu6pi3FwXM_QSHGBt0VFFt8DRU2OWSGrKUUMlVQwMmhVLEV1Vcm9-aUksiuXRaO_CEhkv4WjBfAgG1TrGaLa-iaUw6a0DcwGI-WgOsF7zm-pN881fvRx1UDzeiFq7ghb1kgqFWiElyTjnuXVG74FkbdumefEpuNuRu_4rZ1pqQ7L5fL6YQPaPNiFuywcG9_-ihNyUkm6YSONWkjVNM8WUIyaeOJLO3clTB_KhL8NQDmVe-OJjxgZM5FhFiiFTK5zjDkxHBQ9_4zB4a-x20EGNSZhyaKmXrg7f5hSsvufUwTMXThtMWiot5Jh6p9ffimHijIezaSVoeN0uiqcfMJvf7w" >}}
|
||||
|
||||
An Ingress may be configured to give Services externally-reachable URLs, load balance traffic, terminate SSL / TLS, and offer name-based virtual hosting. An [Ingress controller](/docs/concepts/services-networking/ingress-controllers) is responsible for fulfilling the Ingress, usually with a load balancer, though it may also configure your edge router or additional frontends to help handle the traffic.
|
||||
|
||||
|
@ -74,7 +59,7 @@ A minimal Ingress resource example:
|
|||
|
||||
{{< codenew file="service/networking/minimal-ingress.yaml" >}}
|
||||
|
||||
As with all other Kubernetes resources, an Ingress needs `apiVersion`, `kind`, and `metadata` fields.
|
||||
An Ingress needs `apiVersion`, `kind`, `metadata` and `spec` fields.
|
||||
The name of an Ingress object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
For general information about working with config files, see [deploying applications](/docs/tasks/run-application/run-stateless-application-deployment/), [configuring containers](/docs/tasks/configure-pod-container/configure-pod-configmap/), [managing resources](/docs/concepts/cluster-administration/manage-deployment/).
|
||||
|
@ -398,25 +383,8 @@ A fanout configuration routes traffic from a single IP address to more than one
|
|||
based on the HTTP URI being requested. An Ingress allows you to keep the number of load balancers
|
||||
down to a minimum. For example, a setup like:
|
||||
|
||||
{{< mermaid >}}
|
||||
graph LR;
|
||||
client([client])-. Ingress-managed <br> load balancer .->ingress[Ingress, 178.91.123.132];
|
||||
ingress-->|/foo|service1[Service service1:4200];
|
||||
ingress-->|/bar|service2[Service service2:8080];
|
||||
subgraph cluster
|
||||
ingress;
|
||||
service1-->pod1[Pod];
|
||||
service1-->pod2[Pod];
|
||||
service2-->pod3[Pod];
|
||||
service2-->pod4[Pod];
|
||||
end
|
||||
classDef plain fill:#ddd,stroke:#fff,stroke-width:4px,color:#000;
|
||||
classDef k8s fill:#326ce5,stroke:#fff,stroke-width:4px,color:#fff;
|
||||
classDef cluster fill:#fff,stroke:#bbb,stroke-width:2px,color:#326ce5;
|
||||
class ingress,service1,service2,pod1,pod2,pod3,pod4 k8s;
|
||||
class client plain;
|
||||
class cluster cluster;
|
||||
{{</ mermaid >}}
|
||||
{{< figure src="/docs/images/ingressFanOut.svg" alt="ingress-fanout-diagram" class="diagram-large" caption="Figure. Ingress Fan Out" link="https://mermaid.live/edit#pako:eNqNUslOwzAQ_RXLvYCUhMQpUFzUUzkgcUBwbHpw4klr4diR7bCo8O8k2FFbFomLPZq3jP00O1xpDpjijWHtFt09zAuFUCUFKHey8vf6NE7QrdoYsDZumGIb4Oi6NAskNeOoZJKpCgxK4oXwrFVgRyi7nCVXWZKRPMlysv5yD6Q4Xryf1Vq_WzDPooJs9egLNDbolKTpT03JzKgh3zWEztJZ0Niu9L-qZGcdmAMfj4cxvWmreba613z9C0B-AMQD-V_AdA-A4j5QZu0SatRKJhSqhZR0wjmPrDP6CeikrutQxy-Cuy2dtq9RpaU2dJKm6fzI5Glmg0VOLio4_5dLjx27hFSC015KJ2VZHtuQvY2fuHcaE43G0MaCREOow_FV5cMxHZ5-oPX75UM5avuXhXuOI9yAaZjg_aLuBl6B3RYaKDDtSw4166QrcKE-emrXcubghgunDaY1kxYizDqnH99UhakzHYykpWD9hjS--fEJoIELqQ" >}}
|
||||
|
||||
|
||||
would require an Ingress such as:
|
||||
|
||||
|
@ -460,25 +428,7 @@ you are using, you may need to create a default-http-backend
|
|||
|
||||
Name-based virtual hosts support routing HTTP traffic to multiple host names at the same IP address.
|
||||
|
||||
{{< mermaid >}}
|
||||
graph LR;
|
||||
client([client])-. Ingress-managed <br> load balancer .->ingress[Ingress, 178.91.123.132];
|
||||
ingress-->|Host: foo.bar.com|service1[Service service1:80];
|
||||
ingress-->|Host: bar.foo.com|service2[Service service2:80];
|
||||
subgraph cluster
|
||||
ingress;
|
||||
service1-->pod1[Pod];
|
||||
service1-->pod2[Pod];
|
||||
service2-->pod3[Pod];
|
||||
service2-->pod4[Pod];
|
||||
end
|
||||
classDef plain fill:#ddd,stroke:#fff,stroke-width:4px,color:#000;
|
||||
classDef k8s fill:#326ce5,stroke:#fff,stroke-width:4px,color:#fff;
|
||||
classDef cluster fill:#fff,stroke:#bbb,stroke-width:2px,color:#326ce5;
|
||||
class ingress,service1,service2,pod1,pod2,pod3,pod4 k8s;
|
||||
class client plain;
|
||||
class cluster cluster;
|
||||
{{</ mermaid >}}
|
||||
{{< figure src="/docs/images/ingressNameBased.svg" alt="ingress-namebase-diagram" class="diagram-large" caption="Figure. Ingress Name Based Virtual hosting" link="https://mermaid.live/edit#pako:eNqNkl9PwyAUxb8KYS-atM1Kp05m9qSJJj4Y97jugcLtRqTQAPVPdN_dVlq3qUt8gZt7zvkBN7xjbgRgiteW1Rt0_zjLNUJcSdD-ZBn21WmcoDu9tuBcXDHN1iDQVWHnSBkmUMEU0xwsSuK5DK5l745QejFNLtMkJVmSZmT1Re9NcTz_uDXOU1QakxTMJtxUHw7ss-SQLhehQEODTsdH4l20Q-zFyc84-Y67pghv5apxHuweMuj9eS2_NiJdPhix-kMgvwQShOyYMNkJoEUYM3PuGkpUKyY1KqVSdCSEiJy35gnoqCzLvo5fpPAbOqlfI26UsXQ0Ho9nB5CnqesRGTnncPYvSqsdUvqp9KRdlI6KojjEkB0mnLgjDRONhqENBYm6oXbLV5V1y6S7-l42_LowlIN2uFm_twqOcAW2YlK0H_i9c-bYb6CCHNO2FFCyRvkc53rbWptaMA83QnpjMS2ZchBh1nizeNMcU28bGEzXkrV_pArN7Sc0rBTu" >}}
|
||||
|
||||
|
||||
The following Ingress tells the backing load balancer to route requests based on
|
||||
|
|
|
@ -45,42 +45,7 @@ See the [NetworkPolicy](/docs/reference/generated/kubernetes-api/{{< param "vers
|
|||
|
||||
An example NetworkPolicy might look like this:
|
||||
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: test-network-policy
|
||||
namespace: default
|
||||
spec:
|
||||
podSelector:
|
||||
matchLabels:
|
||||
role: db
|
||||
policyTypes:
|
||||
- Ingress
|
||||
- Egress
|
||||
ingress:
|
||||
- from:
|
||||
- ipBlock:
|
||||
cidr: 172.17.0.0/16
|
||||
except:
|
||||
- 172.17.1.0/24
|
||||
- namespaceSelector:
|
||||
matchLabels:
|
||||
project: myproject
|
||||
- podSelector:
|
||||
matchLabels:
|
||||
role: frontend
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 6379
|
||||
egress:
|
||||
- to:
|
||||
- ipBlock:
|
||||
cidr: 10.0.0.0/24
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 5978
|
||||
```
|
||||
{{< codenew file="service/networking/networkpolicy.yaml" >}}
|
||||
|
||||
{{< note >}}
|
||||
POSTing this to the API server for your cluster will have no effect unless your chosen networking solution supports network policy.
|
||||
|
@ -89,7 +54,7 @@ POSTing this to the API server for your cluster will have no effect unless your
|
|||
__Mandatory Fields__: As with all other Kubernetes config, a NetworkPolicy
|
||||
needs `apiVersion`, `kind`, and `metadata` fields. For general information
|
||||
about working with config files, see
|
||||
[Configure Containers Using a ConfigMap](/docs/tasks/configure-pod-container/configure-pod-configmap/),
|
||||
[Configure a Pod to Use a ConfigMap](/docs/tasks/configure-pod-container/configure-pod-configmap/),
|
||||
and [Object Management](/docs/concepts/overview/working-with-objects/object-management).
|
||||
|
||||
__spec__: NetworkPolicy [spec](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status) has all the information needed to define a particular network policy in the given namespace.
|
||||
|
|
|
@ -122,7 +122,7 @@ metadata:
|
|||
spec:
|
||||
containers:
|
||||
- name: nginx
|
||||
image: nginx:11.14.2
|
||||
image: nginx:stable
|
||||
ports:
|
||||
- containerPort: 80
|
||||
name: http-web-svc
|
||||
|
@ -192,6 +192,7 @@ where it's running, by adding an Endpoints object manually:
|
|||
apiVersion: v1
|
||||
kind: Endpoints
|
||||
metadata:
|
||||
# the name here should match the name of the Service
|
||||
name: my-service
|
||||
subsets:
|
||||
- addresses:
|
||||
|
@ -203,6 +204,10 @@ subsets:
|
|||
The name of the Endpoints object must be a valid
|
||||
[DNS subdomain name](/docs/concepts/overview/working-with-objects/names#dns-subdomain-names).
|
||||
|
||||
When you create an [Endpoints](docs/reference/kubernetes-api/service-resources/endpoints-v1/)
|
||||
object for a Service, you set the name of the new object to be the same as that
|
||||
of the Service.
|
||||
|
||||
{{< note >}}
|
||||
The endpoint IPs _must not_ be: loopback (127.0.0.0/8 for IPv4, ::1/128 for IPv6), or
|
||||
link-local (169.254.0.0/16 and 224.0.0.0/24 for IPv4, fe80::/64 for IPv6).
|
||||
|
@ -394,6 +399,10 @@ You can also set the maximum session sticky time by setting
|
|||
`service.spec.sessionAffinityConfig.clientIP.timeoutSeconds` appropriately.
|
||||
(the default value is 10800, which works out to be 3 hours).
|
||||
|
||||
{{< note >}}
|
||||
On Windows, setting the maximum session sticky time for Services is not supported.
|
||||
{{< /note >}}
|
||||
|
||||
## Multi-Port Services
|
||||
|
||||
For some Services, you need to expose more than one port.
|
||||
|
@ -447,7 +456,7 @@ server will return a 422 HTTP status code to indicate that there's a problem.
|
|||
|
||||
You can set the `spec.externalTrafficPolicy` field to control how traffic from external sources is routed.
|
||||
Valid values are `Cluster` and `Local`. Set the field to `Cluster` to route external traffic to all ready endpoints
|
||||
and `Local` to only route to ready node-local endpoints. If the traffic policy is `Local` and there are are no node-local
|
||||
and `Local` to only route to ready node-local endpoints. If the traffic policy is `Local` and there are no node-local
|
||||
endpoints, the kube-proxy does not forward any traffic for the relevant Service.
|
||||
|
||||
{{< note >}}
|
||||
|
@ -701,23 +710,25 @@ Specify the assigned IP address as loadBalancerIP. Ensure that you have updated
|
|||
|
||||
#### Load balancers with mixed protocol types
|
||||
|
||||
{{< feature-state for_k8s_version="v1.20" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.24" state="beta" >}}
|
||||
|
||||
By default, for LoadBalancer type of Services, when there is more than one port defined, all
|
||||
ports must have the same protocol, and the protocol must be one which is supported
|
||||
by the cloud provider.
|
||||
|
||||
If the feature gate `MixedProtocolLBService` is enabled for the kube-apiserver it is allowed to use different protocols when there is more than one port defined.
|
||||
The feature gate `MixedProtocolLBService` (enabled by default for the kube-apiserver as of v1.24) allows the use of
|
||||
different protocols for LoadBalancer type of Services, when there is more than one port defined.
|
||||
|
||||
{{< note >}}
|
||||
|
||||
The set of protocols that can be used for LoadBalancer type of Services is still defined by the cloud provider.
|
||||
The set of protocols that can be used for LoadBalancer type of Services is still defined by the cloud provider. If a
|
||||
cloud provider does not support mixed protocols they will provide only a single protocol.
|
||||
|
||||
{{< /note >}}
|
||||
|
||||
#### Disabling load balancer NodePort allocation {#load-balancer-nodeport-allocation}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.22" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
|
||||
|
||||
You can optionally disable node port allocation for a Service of `type=LoadBalancer`, by setting
|
||||
the field `spec.allocateLoadBalancerNodePorts` to `false`. This should only be used for load balancer implementations
|
||||
|
@ -725,20 +736,12 @@ that route traffic directly to pods as opposed to using node ports. By default,
|
|||
is `true` and type LoadBalancer Services will continue to allocate node ports. If `spec.allocateLoadBalancerNodePorts`
|
||||
is set to `false` on an existing Service with allocated node ports, those node ports will **not** be de-allocated automatically.
|
||||
You must explicitly remove the `nodePorts` entry in every Service port to de-allocate those node ports.
|
||||
Your cluster must have the `ServiceLBNodePortControl`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
|
||||
enabled to use this field.
|
||||
For Kubernetes v{{< skew currentVersion >}}, this feature gate is enabled by default,
|
||||
and you can use the `spec.allocateLoadBalancerNodePorts` field. For clusters running
|
||||
other versions of Kubernetes, check the documentation for that release.
|
||||
|
||||
#### Specifying class of load balancer implementation {#load-balancer-class}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.22" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
|
||||
|
||||
`spec.loadBalancerClass` enables you to use a load balancer implementation other than the cloud provider default.
|
||||
Your cluster must have the `ServiceLoadBalancerClass` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) enabled to use this field. For Kubernetes v{{< skew currentVersion >}}, this feature gate is enabled by default. For clusters running
|
||||
other versions of Kubernetes, check the documentation for that release.
|
||||
By default, `spec.loadBalancerClass` is `nil` and a `LoadBalancer` type of Service uses
|
||||
the cloud provider's default load balancer implementation if the cluster is configured with
|
||||
a cloud provider using the `--cloud-provider` component flag.
|
||||
|
@ -1254,7 +1257,8 @@ someone else's choice. That is an isolation failure.
|
|||
|
||||
In order to allow you to choose a port number for your Services, we must
|
||||
ensure that no two Services can collide. Kubernetes does that by allocating each
|
||||
Service its own IP address.
|
||||
Service its own IP address from within the `service-cluster-ip-range`
|
||||
CIDR range that is configured for the API server.
|
||||
|
||||
To ensure each Service receives a unique IP, an internal allocator atomically
|
||||
updates a global allocation map in {{< glossary_tooltip term_id="etcd" >}}
|
||||
|
@ -1268,6 +1272,25 @@ in-memory locking). Kubernetes also uses controllers to check for invalid
|
|||
assignments (eg due to administrator intervention) and for cleaning up allocated
|
||||
IP addresses that are no longer used by any Services.
|
||||
|
||||
#### IP address ranges for `type: ClusterIP` Services {#service-ip-static-sub-range}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.24" state="alpha" >}}
|
||||
However, there is a problem with this `ClusterIP` allocation strategy, because a user
|
||||
can also [choose their own address for the service](#choosing-your-own-ip-address).
|
||||
This could result in a conflict if the internal allocator selects the same IP address
|
||||
for another Service.
|
||||
|
||||
If you enable the `ServiceIPStaticSubrange`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/),
|
||||
the allocation strategy divides the `ClusterIP` range into two bands, based on
|
||||
the size of the configured `service-cluster-ip-range` by using the following formula
|
||||
`min(max(16, cidrSize / 16), 256)`, described as _never less than 16 or more than 256,
|
||||
with a graduated step function between them_. Dynamic IP allocations will be preferentially
|
||||
chosen from the upper band, reducing risks of conflicts with the IPs
|
||||
assigned from the lower band.
|
||||
This allows users to use the lower band of the `service-cluster-ip-range` for their
|
||||
Services with static IPs assigned with a very low risk of running into conflicts.
|
||||
|
||||
### Service IP addresses {#ips-and-vips}
|
||||
|
||||
Unlike Pod IP addresses, which actually route to a fixed destination,
|
||||
|
|
|
@ -0,0 +1,164 @@
|
|||
---
|
||||
reviewers:
|
||||
- aravindhp
|
||||
- jayunit100
|
||||
- jsturtevant
|
||||
- marosset
|
||||
title: Networking on Windows
|
||||
content_type: concept
|
||||
weight: 75
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
Kubernetes supports running nodes on either Linux or Windows. You can mix both kinds of node
|
||||
within a single cluster.
|
||||
This page provides an overview to networking specific to the Windows operating system.
|
||||
|
||||
<!-- body -->
|
||||
## Container networking on Windows {#networking}
|
||||
|
||||
Networking for Windows containers is exposed through
|
||||
[CNI plugins](/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/).
|
||||
Windows containers function similarly to virtual machines in regards to
|
||||
networking. Each container has a virtual network adapter (vNIC) which is connected
|
||||
to a Hyper-V virtual switch (vSwitch). The Host Networking Service (HNS) and the
|
||||
Host Compute Service (HCS) work together to create containers and attach container
|
||||
vNICs to networks. HCS is responsible for the management of containers whereas HNS
|
||||
is responsible for the management of networking resources such as:
|
||||
|
||||
* Virtual networks (including creation of vSwitches)
|
||||
* Endpoints / vNICs
|
||||
* Namespaces
|
||||
* Policies including packet encapsulations, load-balancing rules, ACLs, and NAT rules.
|
||||
|
||||
The Windows HNS and vSwitch implement namespacing and can
|
||||
create virtual NICs as needed for a pod or container. However, many configurations such
|
||||
as DNS, routes, and metrics are stored in the Windows registry database rather than as
|
||||
files inside `/etc`, which is how Linux stores those configurations. The Windows registry for the container
|
||||
is separate from that of the host, so concepts like mapping `/etc/resolv.conf` from
|
||||
the host into a container don't have the same effect they would on Linux. These must
|
||||
be configured using Windows APIs run in the context of that container. Therefore
|
||||
CNI implementations need to call the HNS instead of relying on file mappings to pass
|
||||
network details into the pod or container.
|
||||
|
||||
## Network modes
|
||||
|
||||
Windows supports five different networking drivers/modes: L2bridge, L2tunnel,
|
||||
Overlay (Beta), Transparent, and NAT. In a heterogeneous cluster with Windows and Linux
|
||||
worker nodes, you need to select a networking solution that is compatible on both
|
||||
Windows and Linux. The following table lists the out-of-tree plugins are supported on Windows,
|
||||
with recommendations on when to use each CNI:
|
||||
|
||||
| Network Driver | Description | Container Packet Modifications | Network Plugins | Network Plugin Characteristics |
|
||||
| -------------- | ----------- | ------------------------------ | --------------- | ------------------------------ |
|
||||
| L2bridge | Containers are attached to an external vSwitch. Containers are attached to the underlay network, although the physical network doesn't need to learn the container MACs because they are rewritten on ingress/egress. | MAC is rewritten to host MAC, IP may be rewritten to host IP using HNS OutboundNAT policy. | [win-bridge](https://github.com/containernetworking/plugins/tree/master/plugins/main/windows/win-bridge), [Azure-CNI](https://github.com/Azure/azure-container-networking/blob/master/docs/cni.md), Flannel host-gateway uses win-bridge | win-bridge uses L2bridge network mode, connects containers to the underlay of hosts, offering best performance. Requires user-defined routes (UDR) for inter-node connectivity. |
|
||||
| L2Tunnel | This is a special case of l2bridge, but only used on Azure. All packets are sent to the virtualization host where SDN policy is applied. | MAC rewritten, IP visible on the underlay network | [Azure-CNI](https://github.com/Azure/azure-container-networking/blob/master/docs/cni.md) | Azure-CNI allows integration of containers with Azure vNET, and allows them to leverage the set of capabilities that [Azure Virtual Network provides](https://azure.microsoft.com/en-us/services/virtual-network/). For example, securely connect to Azure services or use Azure NSGs. See [azure-cni for some examples](https://docs.microsoft.com/azure/aks/concepts-network#azure-cni-advanced-networking) |
|
||||
| Overlay | Containers are given a vNIC connected to an external vSwitch. Each overlay network gets its own IP subnet, defined by a custom IP prefix.The overlay network driver uses VXLAN encapsulation. | Encapsulated with an outer header. | [win-overlay](https://github.com/containernetworking/plugins/tree/master/plugins/main/windows/win-overlay), Flannel VXLAN (uses win-overlay) | win-overlay should be used when virtual container networks are desired to be isolated from underlay of hosts (e.g. for security reasons). Allows for IPs to be re-used for different overlay networks (which have different VNID tags) if you are restricted on IPs in your datacenter. This option requires [KB4489899](https://support.microsoft.com/help/4489899) on Windows Server 2019. |
|
||||
| Transparent (special use case for [ovn-kubernetes](https://github.com/openvswitch/ovn-kubernetes)) | Requires an external vSwitch. Containers are attached to an external vSwitch which enables intra-pod communication via logical networks (logical switches and routers). | Packet is encapsulated either via [GENEVE](https://datatracker.ietf.org/doc/draft-gross-geneve/) or [STT](https://datatracker.ietf.org/doc/draft-davie-stt/) tunneling to reach pods which are not on the same host. <br/> Packets are forwarded or dropped via the tunnel metadata information supplied by the ovn network controller. <br/> NAT is done for north-south communication. | [ovn-kubernetes](https://github.com/openvswitch/ovn-kubernetes) | [Deploy via ansible](https://github.com/openvswitch/ovn-kubernetes/tree/master/contrib). Distributed ACLs can be applied via Kubernetes policies. IPAM support. Load-balancing can be achieved without kube-proxy. NATing is done without using iptables/netsh. |
|
||||
| NAT (*not used in Kubernetes*) | Containers are given a vNIC connected to an internal vSwitch. DNS/DHCP is provided using an internal component called [WinNAT](https://techcommunity.microsoft.com/t5/virtualization/windows-nat-winnat-capabilities-and-limitations/ba-p/382303) | MAC and IP is rewritten to host MAC/IP. | [nat](https://github.com/Microsoft/windows-container-networking/tree/master/plugins/nat) | Included here for completeness |
|
||||
|
||||
As outlined above, the [Flannel](https://github.com/coreos/flannel)
|
||||
[CNI plugin](https://github.com/flannel-io/cni-plugin)
|
||||
is also [supported](https://github.com/flannel-io/cni-plugin#windows-support-experimental) on Windows via the
|
||||
[VXLAN network backend](https://github.com/coreos/flannel/blob/master/Documentation/backends.md#vxlan) (**Beta support** ; delegates to win-overlay)
|
||||
and [host-gateway network backend](https://github.com/coreos/flannel/blob/master/Documentation/backends.md#host-gw) (stable support; delegates to win-bridge).
|
||||
|
||||
This plugin supports delegating to one of the reference CNI plugins (win-overlay,
|
||||
win-bridge), to work in conjunction with Flannel daemon on Windows (Flanneld) for
|
||||
automatic node subnet lease assignment and HNS network creation. This plugin reads
|
||||
in its own configuration file (cni.conf), and aggregates it with the environment
|
||||
variables from the FlannelD generated subnet.env file. It then delegates to one of
|
||||
the reference CNI plugins for network plumbing, and sends the correct configuration
|
||||
containing the node-assigned subnet to the IPAM plugin (for example: `host-local`).
|
||||
|
||||
For Node, Pod, and Service objects, the following network flows are supported for
|
||||
TCP/UDP traffic:
|
||||
|
||||
* Pod → Pod (IP)
|
||||
* Pod → Pod (Name)
|
||||
* Pod → Service (Cluster IP)
|
||||
* Pod → Service (PQDN, but only if there are no ".")
|
||||
* Pod → Service (FQDN)
|
||||
* Pod → external (IP)
|
||||
* Pod → external (DNS)
|
||||
* Node → Pod
|
||||
* Pod → Node
|
||||
|
||||
## IP address management (IPAM) {#ipam}
|
||||
|
||||
The following IPAM options are supported on Windows:
|
||||
|
||||
* [host-local](https://github.com/containernetworking/plugins/tree/master/plugins/ipam/host-local)
|
||||
* [azure-vnet-ipam](https://github.com/Azure/azure-container-networking/blob/master/docs/ipam.md) (for azure-cni only)
|
||||
* [Windows Server IPAM](https://docs.microsoft.com/windows-server/networking/technologies/ipam/ipam-top) (fallback option if no IPAM is set)
|
||||
|
||||
## Load balancing and Services
|
||||
|
||||
A Kubernetes {{< glossary_tooltip text="Service" term_id="service" >}} is an abstraction
|
||||
that defines a logical set of Pods and a means to access them over a network.
|
||||
In a cluster that includes Windows nodes, you can use the following types of Service:
|
||||
|
||||
* `NodePort`
|
||||
* `ClusterIP`
|
||||
* `LoadBalancer`
|
||||
* `ExternalName`
|
||||
|
||||
Windows container networking differs in some important ways from Linux networking.
|
||||
The [Microsoft documentation for Windows Container Networking](https://docs.microsoft.com/en-us/virtualization/windowscontainers/container-networking/architecture)
|
||||
provides additional details and background.
|
||||
|
||||
On Windows, you can use the following settings to configure Services and load
|
||||
balancing behavior:
|
||||
|
||||
{{< table caption="Windows Service Settings" >}}
|
||||
| Feature | Description | Minimum Supported Windows OS build | How to enable |
|
||||
| ------- | ----------- | -------------------------- | ------------- |
|
||||
| Session affinity | Ensures that connections from a particular client are passed to the same Pod each time. | Windows Server 2022 | Set `service.spec.sessionAffinity` to "ClientIP" |
|
||||
| Direct Server Return (DSR) | Load balancing mode where the IP address fixups and the LBNAT occurs at the container vSwitch port directly; service traffic arrives with the source IP set as the originating pod IP. | Windows Server 2019 | Set the following flags in kube-proxy: `--feature-gates="WinDSR=true" --enable-dsr=true` |
|
||||
| Preserve-Destination | Skips DNAT of service traffic, thereby preserving the virtual IP of the target service in packets reaching the backend Pod. Also disables node-node forwarding. | Windows Server, version 1903 | Set `"preserve-destination": "true"` in service annotations and enable DSR in kube-proxy. |
|
||||
| IPv4/IPv6 dual-stack networking | Native IPv4-to-IPv4 in parallel with IPv6-to-IPv6 communications to, from, and within a cluster | Windows Server 2019 | See [IPv4/IPv6 dual-stack](#ipv4ipv6-dual-stack) |
|
||||
| Client IP preservation | Ensures that source IP of incoming ingress traffic gets preserved. Also disables node-node forwarding. | Windows Server 2019 | Set `service.spec.externalTrafficPolicy` to "Local" and enable DSR in kube-proxy |
|
||||
{{< /table >}}
|
||||
|
||||
{{< warning >}}
|
||||
There are known issue with NodePort Services on overlay networking, if the destination node is running Windows Server 2022.
|
||||
To avoid the issue entirely, you can configure the service with `externalTrafficPolicy: Local`.
|
||||
|
||||
There are known issues with Pod to Pod connectivity on l2bridge network on Windows Server 2022 with KB5005619 or higher installed.
|
||||
To workaround the issue and restore Pod to Pod connectivity, you can disable the WinDSR feature in kube-proxy.
|
||||
|
||||
These issues require OS fixes.
|
||||
Please follow https://github.com/microsoft/Windows-Containers/issues/204 for updates.
|
||||
{{< /warning >}}
|
||||
|
||||
## Limitations
|
||||
|
||||
The following networking functionality is _not_ supported on Windows nodes:
|
||||
|
||||
* Host networking mode
|
||||
* Local NodePort access from the node itself (works for other nodes or external clients)
|
||||
* More than 64 backend pods (or unique destination addresses) for a single Service
|
||||
* IPv6 communication between Windows pods connected to overlay networks
|
||||
* Local Traffic Policy in non-DSR mode
|
||||
* Outbound communication using the ICMP protocol via the `win-overlay`, `win-bridge`, or using the Azure-CNI plugin.\
|
||||
Specifically, the Windows data plane ([VFP](https://www.microsoft.com/research/project/azure-virtual-filtering-platform/))
|
||||
doesn't support ICMP packet transpositions, and this means:
|
||||
* ICMP packets directed to destinations within the same network (such as pod to pod communication via ping)
|
||||
work as expected;
|
||||
* TCP/UDP packets work as expected;
|
||||
* ICMP packets directed to pass through a remote network (e.g. pod to external internet communication via ping)
|
||||
cannot be transposed and thus will not be routed back to their source;
|
||||
* Since TCP/UDP packets can still be transposed, you can substitute `ping <destination>` with
|
||||
`curl <destination>` when debugging connectivity with the outside world.
|
||||
|
||||
Other limitations:
|
||||
|
||||
* Windows reference network plugins win-bridge and win-overlay do not implement
|
||||
[CNI spec](https://github.com/containernetworking/cni/blob/master/SPEC.md) v0.4.0,
|
||||
due to a missing `CHECK` implementation.
|
||||
* The Flannel VXLAN CNI plugin has the following limitations on Windows:
|
||||
* Node-pod connectivity is only possible for local pods with Flannel v0.12.0 (or higher).
|
||||
* Flannel is restricted to using VNI 4096 and UDP port 4789. See the official
|
||||
[Flannel VXLAN](https://github.com/coreos/flannel/blob/master/Documentation/backends.md#vxlan)
|
||||
backend docs for more details on these parameters.
|
|
@ -127,14 +127,17 @@ instructions.
|
|||
|
||||
### CSI driver restrictions
|
||||
|
||||
{{< feature-state for_k8s_version="v1.21" state="deprecated" >}}
|
||||
CSI ephemeral volumes allow users to provide `volumeAttributes`
|
||||
directly to the CSI driver as part of the Pod spec. A CSI driver
|
||||
allowing `volumeAttributes` that are typically restricted to
|
||||
administrators is NOT suitable for use in an inline ephemeral volume.
|
||||
For example, parameters that are normally defined in the StorageClass
|
||||
should not be exposed to users through the use of inline ephemeral volumes.
|
||||
|
||||
As a cluster administrator, you can use a [PodSecurityPolicy](/docs/concepts/security/pod-security-policy/) to control which CSI drivers can be used in a Pod, specified with the
|
||||
[`allowedCSIDrivers` field](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#podsecuritypolicyspec-v1beta1-policy).
|
||||
|
||||
{{< note >}}
|
||||
PodSecurityPolicy is deprecated and will be removed in the Kubernetes v1.25 release.
|
||||
{{< /note >}}
|
||||
Cluster administrators who need to restrict the CSI drivers that are
|
||||
allowed to be used as inline volumes within a Pod spec may do so by:
|
||||
- Removing `Ephemeral` from `volumeLifecycleModes` in the CSIDriver spec, which prevents the driver from being used as an inline ephemeral volume.
|
||||
- Using an [admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/) to restrict how this driver is used.
|
||||
|
||||
### Generic ephemeral volumes
|
||||
|
||||
|
@ -248,14 +251,8 @@ same namespace, so that these conflicts can't occur.
|
|||
Enabling the GenericEphemeralVolume feature allows users to create
|
||||
PVCs indirectly if they can create Pods, even if they do not have
|
||||
permission to create PVCs directly. Cluster administrators must be
|
||||
aware of this. If this does not fit their security model, they have
|
||||
two choices:
|
||||
- Use an [admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/)
|
||||
that rejects objects like Pods that have a generic ephemeral
|
||||
volume.
|
||||
- Use a [Pod Security Policy](/docs/concepts/security/pod-security-policy/)
|
||||
where the `volumes` list does not contain the `ephemeral` volume type
|
||||
(deprecated since Kubernetes 1.21).
|
||||
aware of this. If this does not fit their security model, they should
|
||||
use an [admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/) that rejects objects like Pods that have a generic ephemeral volume.
|
||||
|
||||
The normal [namespace quota for PVCs](/docs/concepts/policy/resource-quotas/#storage-resource-quota) still applies, so
|
||||
even if users are allowed to use this new mechanism, they cannot use
|
||||
|
|
|
@ -175,6 +175,74 @@ spec:
|
|||
|
||||
However, the particular path specified in the custom recycler Pod template in the `volumes` part is replaced with the particular path of the volume that is being recycled.
|
||||
|
||||
### PersistentVolume deletion protection finalizer
|
||||
{{< feature-state for_k8s_version="v1.23" state="alpha" >}}
|
||||
|
||||
Finalizers can be added on a PersistentVolume to ensure that PersistentVolumes
|
||||
having `Delete` reclaim policy are deleted only after the backing storage are deleted.
|
||||
|
||||
The newly introduced finalizers `kubernetes.io/pv-controller` and `external-provisioner.volume.kubernetes.io/finalizer`
|
||||
are only added to dynamically provisioned volumes.
|
||||
|
||||
The finalizer `kubernetes.io/pv-controller` is added to in-tree plugin volumes. The following is an example
|
||||
|
||||
```shell
|
||||
kubectl describe pv pvc-74a498d6-3929-47e8-8c02-078c1ece4d78
|
||||
Name: pvc-74a498d6-3929-47e8-8c02-078c1ece4d78
|
||||
Labels: <none>
|
||||
Annotations: kubernetes.io/createdby: vsphere-volume-dynamic-provisioner
|
||||
pv.kubernetes.io/bound-by-controller: yes
|
||||
pv.kubernetes.io/provisioned-by: kubernetes.io/vsphere-volume
|
||||
Finalizers: [kubernetes.io/pv-protection kubernetes.io/pv-controller]
|
||||
StorageClass: vcp-sc
|
||||
Status: Bound
|
||||
Claim: default/vcp-pvc-1
|
||||
Reclaim Policy: Delete
|
||||
Access Modes: RWO
|
||||
VolumeMode: Filesystem
|
||||
Capacity: 1Gi
|
||||
Node Affinity: <none>
|
||||
Message:
|
||||
Source:
|
||||
Type: vSphereVolume (a Persistent Disk resource in vSphere)
|
||||
VolumePath: [vsanDatastore] d49c4a62-166f-ce12-c464-020077ba5d46/kubernetes-dynamic-pvc-74a498d6-3929-47e8-8c02-078c1ece4d78.vmdk
|
||||
FSType: ext4
|
||||
StoragePolicyName: vSAN Default Storage Policy
|
||||
Events: <none>
|
||||
```
|
||||
|
||||
The finalizer `external-provisioner.volume.kubernetes.io/finalizer` is added for CSI volumes.
|
||||
The following is an example:
|
||||
```shell
|
||||
Name: pvc-2f0bab97-85a8-4552-8044-eb8be45cf48d
|
||||
Labels: <none>
|
||||
Annotations: pv.kubernetes.io/provisioned-by: csi.vsphere.vmware.com
|
||||
Finalizers: [kubernetes.io/pv-protection external-provisioner.volume.kubernetes.io/finalizer]
|
||||
StorageClass: fast
|
||||
Status: Bound
|
||||
Claim: demo-app/nginx-logs
|
||||
Reclaim Policy: Delete
|
||||
Access Modes: RWO
|
||||
VolumeMode: Filesystem
|
||||
Capacity: 200Mi
|
||||
Node Affinity: <none>
|
||||
Message:
|
||||
Source:
|
||||
Type: CSI (a Container Storage Interface (CSI) volume source)
|
||||
Driver: csi.vsphere.vmware.com
|
||||
FSType: ext4
|
||||
VolumeHandle: 44830fa8-79b4-406b-8b58-621ba25353fd
|
||||
ReadOnly: false
|
||||
VolumeAttributes: storage.kubernetes.io/csiProvisionerIdentity=1648442357185-8081-csi.vsphere.vmware.com
|
||||
type=vSphere CNS Block Volume
|
||||
Events: <none>
|
||||
```
|
||||
|
||||
Enabling the `CSIMigration` feature for a specific in-tree volume plugin will remove
|
||||
the `kubernetes.io/pv-controller` finalizer, while adding the `external-provisioner.volume.kubernetes.io/finalizer`
|
||||
finalizer. Similarly, disabling `CSIMigration` will remove the `external-provisioner.volume.kubernetes.io/finalizer`
|
||||
finalizer, while adding the `kubernetes.io/pv-controller` finalizer.
|
||||
|
||||
### Reserving a PersistentVolume
|
||||
|
||||
The control plane can [bind PersistentVolumeClaims to matching PersistentVolumes](#binding) in the
|
||||
|
@ -284,18 +352,13 @@ FlexVolumes (deprecated since Kubernetes v1.23) allow resize if the driver is co
|
|||
|
||||
#### Resizing an in-use PersistentVolumeClaim
|
||||
|
||||
{{< feature-state for_k8s_version="v1.15" state="beta" >}}
|
||||
|
||||
{{< note >}}
|
||||
Expanding in-use PVCs is available as beta since Kubernetes 1.15, and as alpha since 1.11. The `ExpandInUsePersistentVolumes` feature must be enabled, which is the case automatically for many clusters for beta features. Refer to the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) documentation for more information.
|
||||
{{< /note >}}
|
||||
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
|
||||
|
||||
In this case, you don't need to delete and recreate a Pod or deployment that is using an existing PVC.
|
||||
Any in-use PVC automatically becomes available to its Pod as soon as its file system has been expanded.
|
||||
This feature has no effect on PVCs that are not in use by a Pod or deployment. You must create a Pod that
|
||||
uses the PVC before the expansion can complete.
|
||||
|
||||
|
||||
Similar to other volume types - FlexVolume volumes can also be expanded when in-use by a Pod.
|
||||
|
||||
{{< note >}}
|
||||
|
@ -329,7 +392,7 @@ If expanding underlying storage fails, the cluster administrator can manually re
|
|||
Recovery from failing PVC expansion by users is available as an alpha feature since Kubernetes 1.23. The `RecoverVolumeExpansionFailure` feature must be enabled for this feature to work. Refer to the [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) documentation for more information.
|
||||
{{< /note >}}
|
||||
|
||||
If the feature gates `ExpandPersistentVolumes` and `RecoverVolumeExpansionFailure` are both
|
||||
If the feature gates `RecoverVolumeExpansionFailure` is
|
||||
enabled in your cluster, and expansion has failed for a PVC, you can retry expansion with a
|
||||
smaller size than the previously requested value. To request a new expansion attempt with a
|
||||
smaller proposed size, edit `.spec.resources` for that PVC and choose a value that is less than the
|
||||
|
@ -477,6 +540,15 @@ In the CLI, the access modes are abbreviated to:
|
|||
* RWX - ReadWriteMany
|
||||
* RWOP - ReadWriteOncePod
|
||||
|
||||
{{< note >}}
|
||||
Kubernetes uses volume access modes to match PersistentVolumeClaims and PersistentVolumes.
|
||||
In some cases, the volume access modes also constrain where the PersistentVolume can be mounted.
|
||||
Volume access modes do **not** enforce write protection once the storage has been mounted.
|
||||
Even if the access modes are specified as ReadWriteOnce, ReadOnlyMany, or ReadWriteMany, they don't set any constraints on the volume.
|
||||
For example, even if a PersistentVolume is created as ReadOnlyMany, it is no guarantee that it will be read-only.
|
||||
If the access modes are specified as ReadWriteOncePod, the volume is constrained and can be mounted on only a single Pod.
|
||||
{{< /note >}}
|
||||
|
||||
> __Important!__ A volume can only be mounted using one access mode at a time, even if it supports many. For example, a GCEPersistentDisk can be mounted as ReadWriteOnce by a single node or ReadOnlyMany by many nodes, but not at the same time.
|
||||
|
||||
|
||||
|
@ -849,17 +921,12 @@ spec:
|
|||
|
||||
## Volume populators and data sources
|
||||
|
||||
{{< feature-state for_k8s_version="v1.22" state="alpha" >}}
|
||||
{{< feature-state for_k8s_version="v1.24" state="beta" >}}
|
||||
|
||||
{{< note >}}
|
||||
Kubernetes supports custom volume populators; this alpha feature was introduced
|
||||
in Kubernetes 1.18. Kubernetes 1.22 reimplemented the mechanism with a redesigned API.
|
||||
Check that you are reading the version of the Kubernetes documentation that matches your
|
||||
cluster. {{% version-check %}}
|
||||
Kubernetes supports custom volume populators.
|
||||
To use custom volume populators, you must enable the `AnyVolumeDataSource`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) for
|
||||
the kube-apiserver and kube-controller-manager.
|
||||
{{< /note >}}
|
||||
|
||||
Volume populators take advantage of a PVC spec field called `dataSourceRef`. Unlike the
|
||||
`dataSource` field, which can only contain either a reference to another PersistentVolumeClaim
|
||||
|
@ -877,6 +944,7 @@ contents.
|
|||
|
||||
There are two differences between the `dataSourceRef` field and the `dataSource` field that
|
||||
users should be aware of:
|
||||
|
||||
* The `dataSource` field ignores invalid values (as if the field was blank) while the
|
||||
`dataSourceRef` field never ignores values and will cause an error if an invalid value is
|
||||
used. Invalid values are any core object (objects with no apiGroup) except for PVCs.
|
||||
|
|
|
@ -16,37 +16,41 @@ Storage capacity is limited and may vary depending on the node on
|
|||
which a pod runs: network-attached storage might not be accessible by
|
||||
all nodes, or storage is local to a node to begin with.
|
||||
|
||||
{{< feature-state for_k8s_version="v1.21" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
|
||||
|
||||
This page describes how Kubernetes keeps track of storage capacity and
|
||||
how the scheduler uses that information to schedule Pods onto nodes
|
||||
how the scheduler uses that information to [schedule Pods](/docs/concepts/scheduling-eviction/) onto nodes
|
||||
that have access to enough storage capacity for the remaining missing
|
||||
volumes. Without storage capacity tracking, the scheduler may choose a
|
||||
node that doesn't have enough capacity to provision a volume and
|
||||
multiple scheduling retries will be needed.
|
||||
|
||||
Tracking storage capacity is supported for {{< glossary_tooltip
|
||||
text="Container Storage Interface" term_id="csi" >}} (CSI) drivers and
|
||||
[needs to be enabled](#enabling-storage-capacity-tracking) when installing a CSI driver.
|
||||
## {{% heading "prerequisites" %}}
|
||||
|
||||
Kubernetes v{{< skew currentVersion >}} includes cluster-level API support for
|
||||
storage capacity tracking. To use this you must also be using a CSI driver that
|
||||
supports capacity tracking. Consult the documentation for the CSI drivers that
|
||||
you use to find out whether this support is available and, if so, how to use
|
||||
it. If you are not running Kubernetes v{{< skew currentVersion >}}, check the
|
||||
documentation for that version of Kubernetes.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## API
|
||||
|
||||
There are two API extensions for this feature:
|
||||
- CSIStorageCapacity objects:
|
||||
- [CSIStorageCapacity](/docs/reference/kubernetes-api/config-and-storage-resources/csi-storage-capacity-v1/) objects:
|
||||
these get produced by a CSI driver in the namespace
|
||||
where the driver is installed. Each object contains capacity
|
||||
information for one storage class and defines which nodes have
|
||||
access to that storage.
|
||||
- [The `CSIDriverSpec.StorageCapacity` field](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#csidriverspec-v1-storage-k8s-io):
|
||||
- [The `CSIDriverSpec.StorageCapacity` field](/docs/reference/kubernetes-api/config-and-storage-resources/csi-driver-v1/#CSIDriverSpec):
|
||||
when set to `true`, the Kubernetes scheduler will consider storage
|
||||
capacity for volumes that use the CSI driver.
|
||||
|
||||
## Scheduling
|
||||
|
||||
Storage capacity information is used by the Kubernetes scheduler if:
|
||||
- the `CSIStorageCapacity` feature gate is true,
|
||||
- a Pod uses a volume that has not been created yet,
|
||||
- that volume uses a {{< glossary_tooltip text="StorageClass" term_id="storage-class" >}} which references a CSI driver and
|
||||
uses `WaitForFirstConsumer` [volume binding
|
||||
|
@ -97,20 +101,9 @@ multiple volumes: one volume might have been created already in a
|
|||
topology segment which then does not have enough capacity left for
|
||||
another volume. Manual intervention is necessary to recover from this,
|
||||
for example by increasing capacity or deleting the volume that was
|
||||
already created. [Further
|
||||
work](https://github.com/kubernetes/enhancements/pull/1703) is needed
|
||||
to handle this automatically.
|
||||
|
||||
## Enabling storage capacity tracking
|
||||
|
||||
Storage capacity tracking is a beta feature and enabled by default in
|
||||
a Kubernetes cluster since Kubernetes 1.21. In addition to having the
|
||||
feature enabled in the cluster, a CSI driver also has to support
|
||||
it. Please refer to the driver's documentation for details.
|
||||
already created.
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
- For more information on the design, see the
|
||||
[Storage Capacity Constraints for Pod Scheduling KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/1472-storage-capacity-tracking/README.md).
|
||||
- For more information on further development of this feature, see the [enhancement tracking issue #1472](https://github.com/kubernetes/enhancements/issues/1472).
|
||||
- Learn about [Kubernetes Scheduler](/docs/concepts/scheduling-eviction/kube-scheduler/)
|
||||
|
|
|
@ -49,7 +49,7 @@ metadata:
|
|||
name: standard
|
||||
provisioner: kubernetes.io/aws-ebs
|
||||
parameters:
|
||||
type: gp3
|
||||
type: gp2
|
||||
reclaimPolicy: Retain
|
||||
allowVolumeExpansion: true
|
||||
mountOptions:
|
||||
|
@ -271,9 +271,9 @@ parameters:
|
|||
fsType: ext4
|
||||
```
|
||||
|
||||
* `type`: `io1`, `gp2`, `gp3`, `sc1`, `st1`. See
|
||||
* `type`: `io1`, `gp2`, `sc1`, `st1`. See
|
||||
[AWS docs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html)
|
||||
for details. Default: `gp3`.
|
||||
for details. Default: `gp2`.
|
||||
* `zone` (Deprecated): AWS zone. If neither `zone` nor `zones` is specified, volumes are
|
||||
generally round-robin-ed across all active zones where Kubernetes cluster
|
||||
has a node. `zone` and `zones` parameters must not be used at the same time.
|
||||
|
|
|
@ -24,7 +24,7 @@ If a CSI Driver supports Volume Health Monitoring feature from the controller si
|
|||
|
||||
The External Health Monitor {{< glossary_tooltip text="controller" term_id="controller" >}} also watches for node failure events. You can enable node failure monitoring by setting the `enable-node-watcher` flag to true. When the external health monitor detects a node failure event, the controller reports an Event will be reported on the PVC to indicate that pods using this PVC are on a failed node.
|
||||
|
||||
If a CSI Driver supports Volume Health Monitoring feature from the node side, an Event will be reported on every Pod using the PVC when an abnormal volume condition is detected on a CSI volume.
|
||||
If a CSI Driver supports Volume Health Monitoring feature from the node side, an Event will be reported on every Pod using the PVC when an abnormal volume condition is detected on a CSI volume. In addition, Volume Health information is exposed as Kubelet VolumeStats metrics. A new metric kubelet_volume_stats_health_status_abnormal is added. This metric includes two labels: `namespace` and `persistentvolumeclaim`. The count is either 1 or 0. 1 indicates the volume is unhealthy, 0 indicates volume is healthy. For more information, please check [KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1432-volume-health-monitor#kubelet-metrics-changes).
|
||||
|
||||
{{< note >}}
|
||||
You need to enable the `CSIVolumeHealth` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to use this feature from the node side.
|
||||
|
|
|
@ -120,6 +120,7 @@ spec:
|
|||
driver: hostpath.csi.k8s.io
|
||||
source:
|
||||
volumeHandle: ee0cfb94-f8d4-11e9-b2d8-0242ac110002
|
||||
sourceVolumeMode: Filesystem
|
||||
volumeSnapshotClassName: csi-hostpath-snapclass
|
||||
volumeSnapshotRef:
|
||||
name: new-snapshot-test
|
||||
|
@ -141,6 +142,7 @@ spec:
|
|||
driver: hostpath.csi.k8s.io
|
||||
source:
|
||||
snapshotHandle: 7bdd0de3-aaeb-11e8-9aae-0242ac110002
|
||||
sourceVolumeMode: Filesystem
|
||||
volumeSnapshotRef:
|
||||
name: new-snapshot-test
|
||||
namespace: default
|
||||
|
@ -148,6 +150,51 @@ spec:
|
|||
|
||||
`snapshotHandle` is the unique identifier of the volume snapshot created on the storage backend. This field is required for the pre-provisioned snapshots. It specifies the CSI snapshot id on the storage system that this `VolumeSnapshotContent` represents.
|
||||
|
||||
`sourceVolumeMode` is the mode of the volume whose snapshot is taken. The value
|
||||
of the `sourceVolumeMode` field can be either `Filesystem` or `Block`. If the
|
||||
source volume mode is not specified, Kubernetes treats the snapshot as if the
|
||||
source volume's mode is unknown.
|
||||
|
||||
## Converting the volume mode of a Snapshot {#convert-volume-mode}
|
||||
|
||||
If the `VolumeSnapshots` API installed on your cluster supports the `sourceVolumeMode`
|
||||
field, then the API has the capability to prevent unauthorized users from converting
|
||||
the mode of a volume.
|
||||
|
||||
To check if your cluster has capability for this feature, run the following command:
|
||||
|
||||
```yaml
|
||||
$ kubectl get crd volumesnapshotcontent -o yaml
|
||||
```
|
||||
|
||||
If you want to allow users to create a `PersistentVolumeClaim` from an existing
|
||||
`VolumeSnapshot`, but with a different volume mode than the source, the annotation
|
||||
`snapshot.storage.kubernetes.io/allowVolumeModeChange: "true"`needs to be added to
|
||||
the `VolumeSnapshotContent` that corresponds to the `VolumeSnapshot`.
|
||||
|
||||
For pre-provisioned snapshots, `Spec.SourceVolumeMode` needs to be populated
|
||||
by the cluster administrator.
|
||||
|
||||
An example `VolumeSnapshotContent` resource with this feature enabled would look like:
|
||||
|
||||
```yaml
|
||||
apiVersion: snapshot.storage.k8s.io/v1
|
||||
kind: VolumeSnapshotContent
|
||||
metadata:
|
||||
name: new-snapshot-content-test
|
||||
annotations:
|
||||
- snapshot.storage.kubernetes.io/allowVolumeModeChange: "true"
|
||||
spec:
|
||||
deletionPolicy: Delete
|
||||
driver: hostpath.csi.k8s.io
|
||||
source:
|
||||
snapshotHandle: 7bdd0de3-aaeb-11e8-9aae-0242ac110002
|
||||
sourceVolumeMode: Filesystem
|
||||
volumeSnapshotRef:
|
||||
name: new-snapshot-test
|
||||
namespace: default
|
||||
```
|
||||
|
||||
## Provisioning Volumes from Snapshots
|
||||
|
||||
You can provision a new volume, pre-populated with data from a snapshot, by using
|
||||
|
|
|
@ -64,7 +64,9 @@ a different volume.
|
|||
|
||||
Kubernetes supports several types of volumes.
|
||||
|
||||
### awsElasticBlockStore {#awselasticblockstore}
|
||||
### awsElasticBlockStore (deprecated) {#awselasticblockstore}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.17" state="deprecated" >}}
|
||||
|
||||
An `awsElasticBlockStore` volume mounts an Amazon Web Services (AWS)
|
||||
[EBS volume](https://aws.amazon.com/ebs/) into your pod. Unlike
|
||||
|
@ -135,7 +137,9 @@ beta features must be enabled.
|
|||
To disable the `awsElasticBlockStore` storage plugin from being loaded by the controller manager
|
||||
and the kubelet, set the `InTreePluginAWSUnregister` flag to `true`.
|
||||
|
||||
### azureDisk {#azuredisk}
|
||||
### azureDisk (deprecated) {#azuredisk}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.19" state="deprecated" >}}
|
||||
|
||||
The `azureDisk` volume type mounts a Microsoft Azure [Data Disk](https://docs.microsoft.com/en-us/azure/aks/csi-storage-drivers) into a pod.
|
||||
|
||||
|
@ -143,14 +147,13 @@ For more details, see the [`azureDisk` volume plugin](https://github.com/kuberne
|
|||
|
||||
#### azureDisk CSI migration
|
||||
|
||||
{{< feature-state for_k8s_version="v1.19" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
|
||||
|
||||
The `CSIMigration` feature for `azureDisk`, when enabled, redirects all plugin operations
|
||||
from the existing in-tree plugin to the `disk.csi.azure.com` Container
|
||||
Storage Interface (CSI) Driver. In order to use this feature, the [Azure Disk CSI
|
||||
Driver](https://github.com/kubernetes-sigs/azuredisk-csi-driver)
|
||||
must be installed on the cluster and the `CSIMigration` and `CSIMigrationAzureDisk`
|
||||
features must be enabled.
|
||||
Storage Interface (CSI) Driver. In order to use this feature, the
|
||||
[Azure Disk CSI Driver](https://github.com/kubernetes-sigs/azuredisk-csi-driver)
|
||||
must be installed on the cluster and the `CSIMigration` feature must be enabled.
|
||||
|
||||
#### azureDisk CSI migration complete
|
||||
|
||||
|
@ -159,7 +162,9 @@ features must be enabled.
|
|||
To disable the `azureDisk` storage plugin from being loaded by the controller manager
|
||||
and the kubelet, set the `InTreePluginAzureDiskUnregister` flag to `true`.
|
||||
|
||||
### azureFile {#azurefile}
|
||||
### azureFile (deprecated) {#azurefile}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.21" state="deprecated" >}}
|
||||
|
||||
The `azureFile` volume type mounts a Microsoft Azure File volume (SMB 2.1 and 3.0)
|
||||
into a pod.
|
||||
|
@ -177,7 +182,8 @@ Driver](https://github.com/kubernetes-sigs/azurefile-csi-driver)
|
|||
must be installed on the cluster and the `CSIMigration` and `CSIMigrationAzureFile`
|
||||
[feature gates](/docs/reference/command-line-tools-reference/feature-gates/) must be enabled.
|
||||
|
||||
Azure File CSI driver does not support using same volume with different fsgroups, if Azurefile CSI migration is enabled, using same volume with different fsgroups won't be supported at all.
|
||||
Azure File CSI driver does not support using same volume with different fsgroups. If
|
||||
`CSIMigrationAzureFile` is enabled, using same volume with different fsgroups won't be supported at all.
|
||||
|
||||
#### azureFile CSI migration complete
|
||||
|
||||
|
@ -201,7 +207,9 @@ You must have your own Ceph server running with the share exported before you ca
|
|||
|
||||
See the [CephFS example](https://github.com/kubernetes/examples/tree/master/volumes/cephfs/) for more details.
|
||||
|
||||
### cinder
|
||||
### cinder (deprecated) {#cinder}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.18" state="deprecated" >}}
|
||||
|
||||
{{< note >}}
|
||||
Kubernetes must be configured with the OpenStack cloud provider.
|
||||
|
@ -233,17 +241,17 @@ spec:
|
|||
|
||||
#### OpenStack CSI migration
|
||||
|
||||
{{< feature-state for_k8s_version="v1.21" state="beta" >}}
|
||||
{{< feature-state for_k8s_version="v1.24" state="stable" >}}
|
||||
|
||||
The `CSIMigration` feature for Cinder is enabled by default in Kubernetes 1.21.
|
||||
The `CSIMigration` feature for Cinder is enabled by default since Kubernetes 1.21.
|
||||
It redirects all plugin operations from the existing in-tree plugin to the
|
||||
`cinder.csi.openstack.org` Container Storage Interface (CSI) Driver.
|
||||
[OpenStack Cinder CSI Driver](https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/cinder-csi-plugin/using-cinder-csi-plugin.md)
|
||||
must be installed on the cluster.
|
||||
You can disable Cinder CSI migration for your cluster by setting the `CSIMigrationOpenStack`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) to `false`.
|
||||
If you disable the `CSIMigrationOpenStack` feature, the in-tree Cinder volume plugin takes responsibility
|
||||
for all aspects of Cinder volume storage management.
|
||||
|
||||
To disable the in-tree Cinder plugin from being loaded by the controller manager
|
||||
and the kubelet, you can enable the `InTreePluginOpenStackUnregister`
|
||||
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/).
|
||||
|
||||
### configMap
|
||||
|
||||
|
@ -390,7 +398,9 @@ You must have your own Flocker installation running before you can use it.
|
|||
|
||||
See the [Flocker example](https://github.com/kubernetes/examples/tree/master/staging/volumes/flocker) for more details.
|
||||
|
||||
### gcePersistentDisk
|
||||
### gcePersistentDisk (deprecated) {#gcepersistentdisk}
|
||||
|
||||
{{< feature-state for_k8s_version="v1.17" state="deprecated" >}}
|
||||
|
||||
A `gcePersistentDisk` volume mounts a Google Compute Engine (GCE)
|
||||
[persistent disk](https://cloud.google.com/compute/docs/disks) (PD) into your Pod.
|
||||
|
@ -969,66 +979,15 @@ spec:
|
|||
For more information about StorageOS, dynamic provisioning, and PersistentVolumeClaims, see the
|
||||
[StorageOS examples](https://github.com/kubernetes/examples/blob/master/volumes/storageos).
|
||||
|
||||
### vsphereVolume {#vspherevolume}
|
||||
### vsphereVolume (deprecated) {#vspherevolume}
|
||||
|
||||
{{< note >}}
|
||||
You must configure the Kubernetes vSphere Cloud Provider. For cloudprovider
|
||||
configuration, refer to the [vSphere Getting Started guide](https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/).
|
||||
We recommend to use vSphere CSI out-of-tree driver instead.
|
||||
{{< /note >}}
|
||||
|
||||
A `vsphereVolume` is used to mount a vSphere VMDK volume into your Pod. The contents
|
||||
of a volume are preserved when it is unmounted. It supports both VMFS and VSAN datastore.
|
||||
|
||||
{{< note >}}
|
||||
You must create vSphere VMDK volume using one of the following methods before using with a Pod.
|
||||
{{< /note >}}
|
||||
|
||||
#### Creating a VMDK volume {#creating-vmdk-volume}
|
||||
|
||||
Choose one of the following methods to create a VMDK.
|
||||
|
||||
{{< tabs name="tabs_volumes" >}}
|
||||
{{% tab name="Create using vmkfstools" %}}
|
||||
First ssh into ESX, then use the following command to create a VMDK:
|
||||
|
||||
```shell
|
||||
vmkfstools -c 2G /vmfs/volumes/DatastoreName/volumes/myDisk.vmdk
|
||||
```
|
||||
|
||||
{{% /tab %}}
|
||||
{{% tab name="Create using vmware-vdiskmanager" %}}
|
||||
Use the following command to create a VMDK:
|
||||
|
||||
```shell
|
||||
vmware-vdiskmanager -c -t 0 -s 40GB -a lsilogic myDisk.vmdk
|
||||
```
|
||||
|
||||
{{% /tab %}}
|
||||
|
||||
{{< /tabs >}}
|
||||
|
||||
#### vSphere VMDK configuration example {#vsphere-vmdk-configuration}
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata:
|
||||
name: test-vmdk
|
||||
spec:
|
||||
containers:
|
||||
- image: k8s.gcr.io/test-webserver
|
||||
name: test-container
|
||||
volumeMounts:
|
||||
- mountPath: /test-vmdk
|
||||
name: test-volume
|
||||
volumes:
|
||||
- name: test-volume
|
||||
# This VMDK volume must already exist.
|
||||
vsphereVolume:
|
||||
volumePath: "[DatastoreName] volumes/myDisk"
|
||||
fsType: ext4
|
||||
```
|
||||
|
||||
For more information, see the [vSphere volume](https://github.com/kubernetes/examples/tree/master/staging/volumes/vsphere) examples.
|
||||
|
||||
#### vSphere CSI migration {#vsphere-csi-migration}
|
||||
|
@ -1040,8 +999,15 @@ from the existing in-tree plugin to the `csi.vsphere.vmware.com` {{< glossary_to
|
|||
[vSphere CSI driver](https://github.com/kubernetes-sigs/vsphere-csi-driver)
|
||||
must be installed on the cluster and the `CSIMigration` and `CSIMigrationvSphere`
|
||||
[feature gates](/docs/reference/command-line-tools-reference/feature-gates/) must be enabled.
|
||||
You can find additional advice on how to migrate in VMware's
|
||||
documentation page [Migrating In-Tree vSphere Volumes to vSphere Container Storage Plug-in](https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphere-csp-getting-started/GUID-968D421F-D464-4E22-8127-6CB9FF54423F.html).
|
||||
|
||||
This also requires minimum vSphere vCenter/ESXi Version to be 7.0u1 and minimum HW Version to be VM version 15.
|
||||
Kubernetes v{{< skew currentVersion >}} requires that you are using vSphere 7.0u2 or later
|
||||
in order to migrate to the out-of-tree CSI driver.
|
||||
If you are running a version of Kubernetes other than v{{< skew currentVersion >}}, consult
|
||||
the documentation for that version of Kubernetes.
|
||||
If you are running Kubernetes v{{< skew currentVersion >}} and an older version of vSphere,
|
||||
consider upgrading to at least vSphere 7.0u2.
|
||||
|
||||
{{< note >}}
|
||||
The following StorageClass parameters from the built-in `vsphereVolume` plugin are not supported by the vSphere CSI driver:
|
||||
|
@ -1211,7 +1177,6 @@ A `csi` volume can be used in a Pod in three different ways:
|
|||
|
||||
* through a reference to a [PersistentVolumeClaim](#persistentvolumeclaim)
|
||||
* with a [generic ephemeral volume](/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volume)
|
||||
(alpha feature)
|
||||
* with a [CSI ephemeral volume](/docs/concepts/storage/ephemeral-volumes/#csi-ephemeral-volume)
|
||||
if the driver supports that (beta feature)
|
||||
|
||||
|
@ -1285,6 +1250,20 @@ for more information.
|
|||
For more information on how to develop a CSI driver, refer to the
|
||||
[kubernetes-csi documentation](https://kubernetes-csi.github.io/docs/)
|
||||
|
||||
#### Windows CSI proxy
|
||||
|
||||
{{< feature-state for_k8s_version="v1.22" state="stable" >}}
|
||||
|
||||
CSI node plugins need to perform various privileged
|
||||
operations like scanning of disk devices and mounting of file systems. These operations
|
||||
differ for each host operating system. For Linux worker nodes, containerized CSI node
|
||||
node plugins are typically deployed as privileged containers. For Windows worker nodes,
|
||||
privileged operations for containerized CSI node plugins is supported using
|
||||
[csi-proxy](https://github.com/kubernetes-csi/csi-proxy), a community-managed,
|
||||
stand-alone binary that needs to be pre-installed on each Windows node.
|
||||
|
||||
For more details, refer to the deployment guide of the CSI plugin you wish to deploy.
|
||||
|
||||
#### Migrating to CSI drivers from in-tree plugins
|
||||
|
||||
{{< feature-state for_k8s_version="v1.17" state="beta" >}}
|
||||
|
@ -1301,6 +1280,14 @@ provisioning/delete, attach/detach, mount/unmount and resizing of volumes.
|
|||
In-tree plugins that support `CSIMigration` and have a corresponding CSI driver implemented
|
||||
are listed in [Types of Volumes](#volume-types).
|
||||
|
||||
The following in-tree plugins support persistent storage on Windows nodes:
|
||||
|
||||
* [`awsElasticBlockStore`](#awselasticblockstore)
|
||||
* [`azureDisk`](#azuredisk)
|
||||
* [`azureFile`](#azurefile)
|
||||
* [`gcePersistentDisk`](#gcepersistentdisk)
|
||||
* [`vsphereVolume`](#vspherevolume)
|
||||
|
||||
### flexVolume
|
||||
|
||||
{{< feature-state for_k8s_version="v1.23" state="deprecated" >}}
|
||||
|
@ -1312,6 +1299,12 @@ volume plugin path on each node and in some cases the control plane nodes as wel
|
|||
Pods interact with FlexVolume drivers through the `flexVolume` in-tree volume plugin.
|
||||
For more details, see the FlexVolume [README](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-storage/flexvolume.md#readme) document.
|
||||
|
||||
The following FlexVolume [plugins](https://github.com/Microsoft/K8s-Storage-Plugins/tree/master/flexvolume/windows),
|
||||
deployed as PowerShell scripts on the host, support Windows nodes:
|
||||
|
||||
* [SMB](https://github.com/microsoft/K8s-Storage-Plugins/tree/master/flexvolume/windows/plugins/microsoft.com~smb.cmd)
|
||||
* [iSCSI](https://github.com/microsoft/K8s-Storage-Plugins/tree/master/flexvolume/windows/plugins/microsoft.com~iscsi.cmd)
|
||||
|
||||
{{< note >}}
|
||||
FlexVolume is deprecated. Using an out-of-tree CSI driver is the recommended way to integrate external storage with Kubernetes.
|
||||
|
||||
|
|
|
@ -0,0 +1,71 @@
|
|||
---
|
||||
reviewers:
|
||||
- jingxu97
|
||||
- mauriciopoppe
|
||||
- jayunit100
|
||||
- jsturtevant
|
||||
- marosset
|
||||
- aravindhp
|
||||
title: Windows Storage
|
||||
content_type: concept
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
This page provides an storage overview specific to the Windows operating system.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## Persistent storage {#storage}
|
||||
|
||||
Windows has a layered filesystem driver to mount container layers and create a copy
|
||||
filesystem based on NTFS. All file paths in the container are resolved only within
|
||||
the context of that container.
|
||||
|
||||
* With Docker, volume mounts can only target a directory in the container, and not
|
||||
an individual file. This limitation does not apply to containerd.
|
||||
* Volume mounts cannot project files or directories back to the host filesystem.
|
||||
* Read-only filesystems are not supported because write access is always required
|
||||
for the Windows registry and SAM database. However, read-only volumes are supported.
|
||||
* Volume user-masks and permissions are not available. Because the SAM is not shared
|
||||
between the host & container, there's no mapping between them. All permissions are
|
||||
resolved within the context of the container.
|
||||
|
||||
As a result, the following storage functionality is not supported on Windows nodes:
|
||||
|
||||
* Volume subpath mounts: only the entire volume can be mounted in a Windows container
|
||||
* Subpath volume mounting for Secrets
|
||||
* Host mount projection
|
||||
* Read-only root filesystem (mapped volumes still support `readOnly`)
|
||||
* Block device mapping
|
||||
* Memory as the storage medium (for example, `emptyDir.medium` set to `Memory`)
|
||||
* File system features like uid/gid; per-user Linux filesystem permissions
|
||||
* Setting [secret permissions with DefaultMode](/docs/concepts/configuration/secret/#secret-files-permissions) (due to UID/GID dependency)
|
||||
* NFS based storage/volume support
|
||||
* Expanding the mounted volume (resizefs)
|
||||
|
||||
Kubernetes {{< glossary_tooltip text="volumes" term_id="volume" >}} enable complex
|
||||
applications, with data persistence and Pod volume sharing requirements, to be deployed
|
||||
on Kubernetes. Management of persistent volumes associated with a specific storage
|
||||
back-end or protocol includes actions such as provisioning/de-provisioning/resizing
|
||||
of volumes, attaching/detaching a volume to/from a Kubernetes node and
|
||||
mounting/dismounting a volume to/from individual containers in a pod that needs to
|
||||
persist data.
|
||||
|
||||
Volume management components are shipped as Kubernetes volume
|
||||
[plugin](/docs/concepts/storage/volumes/#types-of-volumes).
|
||||
The following broad classes of Kubernetes volume plugins are supported on Windows:
|
||||
|
||||
* [`FlexVolume plugins`](/docs/concepts/storage/volumes/#flexVolume)
|
||||
* Please note that FlexVolumes have been deprecated as of 1.23
|
||||
* [`CSI Plugins`](/docs/concepts/storage/volumes/#csi)
|
||||
|
||||
##### In-tree volume plugins
|
||||
|
||||
The following in-tree plugins support persistent storage on Windows nodes:
|
||||
|
||||
* [`awsElasticBlockStore`](/docs/concepts/storage/volumes/#awselasticblockstore)
|
||||
* [`azureDisk`](/docs/concepts/storage/volumes/#azuredisk)
|
||||
* [`azureFile`](/docs/concepts/storage/volumes/#azurefile)
|
||||
* [`gcePersistentDisk`](/docs/concepts/storage/volumes/#gcepersistentdisk)
|
||||
* [`vsphereVolume`](/docs/concepts/storage/volumes/#vspherevolume)
|
|
@ -0,0 +1,384 @@
|
|||
---
|
||||
reviewers:
|
||||
- jayunit100
|
||||
- jsturtevant
|
||||
- marosset
|
||||
- perithompson
|
||||
title: Windows containers in Kubernetes
|
||||
content_type: concept
|
||||
weight: 65
|
||||
---
|
||||
|
||||
<!-- overview -->
|
||||
|
||||
Windows applications constitute a large portion of the services and applications that
|
||||
run in many organizations. [Windows containers](https://aka.ms/windowscontainers)
|
||||
provide a way to encapsulate processes and package dependencies, making it easier
|
||||
to use DevOps practices and follow cloud native patterns for Windows applications.
|
||||
|
||||
Organizations with investments in Windows-based applications and Linux-based
|
||||
applications don't have to look for separate orchestrators to manage their workloads,
|
||||
leading to increased operational efficiencies across their deployments, regardless
|
||||
of operating system.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## Windows nodes in Kubernetes
|
||||
|
||||
To enable the orchestration of Windows containers in Kubernetes, include Windows nodes
|
||||
in your existing Linux cluster. Scheduling Windows containers in
|
||||
{{< glossary_tooltip text="Pods" term_id="pod" >}} on Kubernetes is similar to
|
||||
scheduling Linux-based containers.
|
||||
|
||||
In order to run Windows containers, your Kubernetes cluster must include
|
||||
multiple operating systems.
|
||||
While you can only run the {{< glossary_tooltip text="control plane" term_id="control-plane" >}} on Linux,
|
||||
you can deploy worker nodes running either Windows or Linux.
|
||||
|
||||
Windows {{< glossary_tooltip text="nodes" term_id="node" >}} are
|
||||
[supported](#windows-os-version-support) provided that the operating system is
|
||||
Windows Server 2019.
|
||||
|
||||
This document uses the term *Windows containers* to mean Windows containers with
|
||||
process isolation. Kubernetes does not support running Windows containers with
|
||||
[Hyper-V isolation](https://docs.microsoft.com/en-us/virtualization/windowscontainers/manage-containers/hyperv-container).
|
||||
|
||||
## Compatibility and limitations {#limitations}
|
||||
|
||||
Some node features are only available if you use a specific
|
||||
[container runtime](#container-runtime); others are not available on Windows nodes,
|
||||
including:
|
||||
|
||||
* HugePages: not supported for Windows containers
|
||||
* Privileged containers: not supported for Windows containers
|
||||
* TerminationGracePeriod: requires containerD
|
||||
|
||||
Not all features of shared namespaces are supported. See [API compatibility](#api)
|
||||
for more details.
|
||||
|
||||
See [Windows OS version compatibility](#windows-os-version-support) for details on
|
||||
the Windows versions that Kubernetes is tested against.
|
||||
|
||||
From an API and kubectl perspective, Windows containers behave in much the same
|
||||
way as Linux-based containers. However, there are some notable differences in key
|
||||
functionality which are outlined in this section.
|
||||
|
||||
### Comparison with Linux {#compatibility-linux-similarities}
|
||||
|
||||
Key Kubernetes elements work the same way in Windows as they do in Linux. This
|
||||
section refers to several key workload abstractions and how they map to Windows.
|
||||
|
||||
* [Pods](/docs/concepts/workloads/pods/)
|
||||
|
||||
A Pod is the basic building block of Kubernetes–the smallest and simplest unit in
|
||||
the Kubernetes object model that you create or deploy. You may not deploy Windows and
|
||||
Linux containers in the same Pod. All containers in a Pod are scheduled onto a single
|
||||
Node where each Node represents a specific platform and architecture. The following
|
||||
Pod capabilities, properties and events are supported with Windows containers:
|
||||
|
||||
* Single or multiple containers per Pod with process isolation and volume sharing
|
||||
* Pod `status` fields
|
||||
* Readiness and Liveness probes
|
||||
* postStart & preStop container lifecycle hooks
|
||||
* ConfigMap, Secrets: as environment variables or volumes
|
||||
* `emptyDir` volumes
|
||||
* Named pipe host mounts
|
||||
* Resource limits
|
||||
* OS field:
|
||||
|
||||
The `.spec.os.name` field should be set to `windows` to indicate that the current Pod uses Windows containers.
|
||||
The `IdentifyPodOS` feature gate needs to be enabled for this field to be recognized.
|
||||
|
||||
{{< note >}}
|
||||
Starting from 1.24, the `IdentifyPodOS` feature gate is in Beta stage and defaults to be enabled.
|
||||
{{< /note >}}
|
||||
|
||||
If the `IdentifyPodOS` feature gate is enabled and you set the `.spec.os.name` field to `windows`,
|
||||
you must not set the following fields in the `.spec` of that Pod:
|
||||
|
||||
* `spec.hostPID`
|
||||
* `spec.hostIPC`
|
||||
* `spec.securityContext.seLinuxOptions`
|
||||
* `spec.securityContext.seccompProfile`
|
||||
* `spec.securityContext.fsGroup`
|
||||
* `spec.securityContext.fsGroupChangePolicy`
|
||||
* `spec.securityContext.sysctls`
|
||||
* `spec.shareProcessNamespace`
|
||||
* `spec.securityContext.runAsUser`
|
||||
* `spec.securityContext.runAsGroup`
|
||||
* `spec.securityContext.supplementalGroups`
|
||||
* `spec.containers[*].securityContext.seLinuxOptions`
|
||||
* `spec.containers[*].securityContext.seccompProfile`
|
||||
* `spec.containers[*].securityContext.capabilities`
|
||||
* `spec.containers[*].securityContext.readOnlyRootFilesystem`
|
||||
* `spec.containers[*].securityContext.privileged`
|
||||
* `spec.containers[*].securityContext.allowPrivilegeEscalation`
|
||||
* `spec.containers[*].securityContext.procMount`
|
||||
* `spec.containers[*].securityContext.runAsUser`
|
||||
* `spec.containers[*].securityContext.runAsGroup`
|
||||
|
||||
In the above list, wildcards (`*`) indicate all elements in a list.
|
||||
For example, `spec.containers[*].securityContext` refers to the SecurityContext object
|
||||
for all containers. If any of these fields is specified, the Pod will
|
||||
not be admited by the API server.
|
||||
|
||||
* [Workload resources](/docs/concepts/workloads/controllers/) including:
|
||||
* ReplicaSet
|
||||
* Deployment
|
||||
* StatefulSet
|
||||
* DaemonSet
|
||||
* Job
|
||||
* CronJob
|
||||
* ReplicationController
|
||||
* {{< glossary_tooltip text="Services" term_id="service" >}}
|
||||
See [Load balancing and Services](#load-balancing-and-services) for more details.
|
||||
|
||||
Pods, workload resources, and Services are critical elements to managing Windows
|
||||
workloads on Kubernetes. However, on their own they are not enough to enable
|
||||
the proper lifecycle management of Windows workloads in a dynamic cloud native
|
||||
environment. Kubernetes also supports:
|
||||
|
||||
* `kubectl exec`
|
||||
* Pod and container metrics
|
||||
* {{< glossary_tooltip text="Horizontal pod autoscaling" term_id="horizontal-pod-autoscaler" >}}
|
||||
* {{< glossary_tooltip text="Resource quotas" term_id="resource-quota" >}}
|
||||
* Scheduler preemption
|
||||
|
||||
### Command line options for the kubelet {#kubelet-compatibility}
|
||||
|
||||
Some kubelet command line options behave differently on Windows, as described below:
|
||||
|
||||
* The `--windows-priorityclass` lets you set the scheduling priority of the kubelet process
|
||||
(see [CPU resource management](/docs/concepts/configuration/windows-resource-management/#resource-management-cpu))
|
||||
* The `--kubelet-reserve`, `--system-reserve` , and `--eviction-hard` flags update
|
||||
[NodeAllocatable](/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable)
|
||||
* Eviction by using `--enforce-node-allocable` is not implemented
|
||||
* Eviction by using `--eviction-hard` and `--eviction-soft` are not implemented
|
||||
* A kubelet running on a Windows node does not have memory
|
||||
restrictions. `--kubelet-reserve` and `--system-reserve` do not set limits on
|
||||
kubelet or processes running on the host. This means kubelet or a process on the host
|
||||
could cause memory resource starvation outside the node-allocatable and scheduler.
|
||||
* The `MemoryPressure` Condition is not implemented
|
||||
* The kubelet does not take OOM eviction actions
|
||||
|
||||
### API compatibility {#api}
|
||||
|
||||
There are subtle differences in the way the Kubernetes APIs work for Windows due to the OS
|
||||
and container runtime. Some workload properties were designed for Linux, and fail to run on Windows.
|
||||
|
||||
At a high level, these OS concepts are different:
|
||||
|
||||
* Identity - Linux uses userID (UID) and groupID (GID) which
|
||||
are represented as integer types. User and group names
|
||||
are not canonical - they are just an alias in `/etc/groups`
|
||||
or `/etc/passwd` back to UID+GID. Windows uses a larger binary
|
||||
[security identifier](https://docs.microsoft.com/en-us/windows/security/identity-protection/access-control/security-identifiers) (SID)
|
||||
which is stored in the Windows Security Access Manager (SAM) database. This
|
||||
database is not shared between the host and containers, or between containers.
|
||||
* File permissions - Windows uses an access control list based on (SIDs), whereas
|
||||
POSIX systems such as Linux use a bitmask based on object permissions and UID+GID,
|
||||
plus _optional_ access control lists.
|
||||
* File paths - the convention on Windows is to use `\` instead of `/`. The Go IO
|
||||
libraries typically accept both and just make it work, but when you're setting a
|
||||
path or command line that's interpreted inside a container, `\` may be needed.
|
||||
* Signals - Windows interactive apps handle termination differently, and can
|
||||
implement one or more of these:
|
||||
* A UI thread handles well-defined messages including `WM_CLOSE`.
|
||||
* Console apps handle Ctrl-C or Ctrl-break using a Control Handler.
|
||||
* Services register a Service Control Handler function that can accept
|
||||
`SERVICE_CONTROL_STOP` control codes.
|
||||
|
||||
Container exit codes follow the same convention where 0 is success, and nonzero is failure.
|
||||
The specific error codes may differ across Windows and Linux. However, exit codes
|
||||
passed from the Kubernetes components (kubelet, kube-proxy) are unchanged.
|
||||
|
||||
##### Field compatibility for container specifications {#compatibility-v1-pod-spec-containers}
|
||||
|
||||
The following list documents differences between how Pod container specifications
|
||||
work between Windows and Linux:
|
||||
|
||||
* Huge pages are not implemented in the Windows container
|
||||
runtime, and are not available. They require [asserting a user
|
||||
privilege](https://docs.microsoft.com/en-us/windows/desktop/Memory/large-page-support)
|
||||
that's not configurable for containers.
|
||||
* `requests.cpu` and `requests.memory` - requests are subtracted
|
||||
from node available resources, so they can be used to avoid overprovisioning a
|
||||
node. However, they cannot be used to guarantee resources in an overprovisioned
|
||||
node. They should be applied to all containers as a best practice if the operator
|
||||
wants to avoid overprovisioning entirely.
|
||||
* `securityContext.allowPrivilegeEscalation` -
|
||||
not possible on Windows; none of the capabilities are hooked up
|
||||
* `securityContext.capabilities` -
|
||||
POSIX capabilities are not implemented on Windows
|
||||
* `securityContext.privileged` -
|
||||
Windows doesn't support privileged containers
|
||||
* `securityContext.procMount` -
|
||||
Windows doesn't have a `/proc` filesystem
|
||||
* `securityContext.readOnlyRootFilesystem` -
|
||||
not possible on Windows; write access is required for registry & system
|
||||
processes to run inside the container
|
||||
* `securityContext.runAsGroup` -
|
||||
not possible on Windows as there is no GID support
|
||||
* `securityContext.runAsNonRoot` -
|
||||
this setting will prevent containers from running as `ContainerAdministrator`
|
||||
which is the closest equivalent to a root user on Windows.
|
||||
* `securityContext.runAsUser` -
|
||||
use [`runAsUserName`](/docs/tasks/configure-pod-container/configure-runasusername)
|
||||
instead
|
||||
* `securityContext.seLinuxOptions` -
|
||||
not possible on Windows as SELinux is Linux-specific
|
||||
* `terminationMessagePath` -
|
||||
this has some limitations in that Windows doesn't support mapping single files. The
|
||||
default value is `/dev/termination-log`, which does work because it does not
|
||||
exist on Windows by default.
|
||||
|
||||
##### Field compatibility for Pod specifications {#compatibility-v1-pod}
|
||||
|
||||
The following list documents differences between how Pod specifications work between Windows and Linux:
|
||||
|
||||
* `hostIPC` and `hostpid` - host namespace sharing is not possible on Windows
|
||||
* `hostNetwork` - There is no Windows OS support to share the host network
|
||||
* `dnsPolicy` - setting the Pod `dnsPolicy` to `ClusterFirstWithHostNet` is
|
||||
not supported on Windows because host networking is not provided. Pods always
|
||||
run with a container network.
|
||||
* `podSecurityContext` (see below)
|
||||
* `shareProcessNamespace` - this is a beta feature, and depends on Linux namespaces
|
||||
which are not implemented on Windows. Windows cannot share process namespaces or
|
||||
the container's root filesystem. Only the network can be shared.
|
||||
* `terminationGracePeriodSeconds` - this is not fully implemented in Docker on Windows,
|
||||
see the [GitHub issue](https://github.com/moby/moby/issues/25982).
|
||||
The behavior today is that the ENTRYPOINT process is sent CTRL_SHUTDOWN_EVENT,
|
||||
then Windows waits 5 seconds by default, and finally shuts down
|
||||
all processes using the normal Windows shutdown behavior. The 5
|
||||
second default is actually in the Windows registry
|
||||
[inside the container](https://github.com/moby/moby/issues/25982#issuecomment-426441183),
|
||||
so it can be overridden when the container is built.
|
||||
* `volumeDevices` - this is a beta feature, and is not implemented on Windows.
|
||||
Windows cannot attach raw block devices to pods.
|
||||
* `volumes`
|
||||
* If you define an `emptyDir` volume, you cannot set its volume source to `memory`.
|
||||
* You cannot enable `mountPropagation` for volume mounts as this is not
|
||||
supported on Windows.
|
||||
|
||||
##### Field compatibility for Pod security context {#compatibility-v1-pod-spec-containers-securitycontext}
|
||||
|
||||
None of the Pod [`securityContext`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context) fields work on Windows.
|
||||
|
||||
### Node problem detector
|
||||
|
||||
The node problem detector (see
|
||||
[Monitor Node Health](/docs/tasks/debug/debug-cluster/monitor-node-health/))
|
||||
is not compatible with Windows.
|
||||
|
||||
### Pause container
|
||||
|
||||
In a Kubernetes Pod, an infrastructure or “pause” container is first created
|
||||
to host the container. In Linux, the cgroups and namespaces that make up a pod
|
||||
need a process to maintain their continued existence; the pause process provides
|
||||
this. Containers that belong to the same pod, including infrastructure and worker
|
||||
containers, share a common network endpoint (same IPv4 and / or IPv6 address, same
|
||||
network port spaces). Kubernetes uses pause containers to allow for worker containers
|
||||
crashing or restarting without losing any of the networking configuration.
|
||||
|
||||
Kubernetes maintains a multi-architecture image that includes support for Windows.
|
||||
For Kubernetes v{{< skew currentVersion >}} the recommended pause image is `k8s.gcr.io/pause:3.6`.
|
||||
The [source code](https://github.com/kubernetes/kubernetes/tree/master/build/pause)
|
||||
is available on GitHub.
|
||||
|
||||
Microsoft maintains a different multi-architecture image, with Linux and Windows
|
||||
amd64 support, that you can find as `mcr.microsoft.com/oss/kubernetes/pause:3.6`.
|
||||
This image is built from the same source as the Kubernetes maintained image but
|
||||
all of the Windows binaries are [authenticode signed](https://docs.microsoft.com/en-us/windows-hardware/drivers/install/authenticode) by Microsoft.
|
||||
The Kubernetes project recommends using the Microsoft maintained image if you are
|
||||
deploying to a production or production-like environment that requires signed
|
||||
binaries.
|
||||
|
||||
### Container runtimes {#container-runtime}
|
||||
|
||||
You need to install a
|
||||
{{< glossary_tooltip text="container runtime" term_id="container-runtime" >}}
|
||||
into each node in the cluster so that Pods can run there.
|
||||
|
||||
The following container runtimes work with Windows:
|
||||
|
||||
{{% thirdparty-content %}}
|
||||
|
||||
#### cri-containerd
|
||||
|
||||
{{< feature-state for_k8s_version="v1.20" state="stable" >}}
|
||||
|
||||
You can use {{< glossary_tooltip term_id="containerd" text="ContainerD" >}} 1.4.0+
|
||||
as the container runtime for Kubernetes nodes that run Windows.
|
||||
|
||||
Learn how to [install ContainerD on a Windows node](/docs/setup/production-environment/container-runtimes/#install-containerd).
|
||||
|
||||
{{< note >}}
|
||||
There is a [known limitation](/docs/tasks/configure-pod-container/configure-gmsa/#gmsa-limitations)
|
||||
when using GMSA with containerd to access Windows network shares, which requires a
|
||||
kernel patch.
|
||||
{{< /note >}}
|
||||
|
||||
#### Mirantis Container Runtime {#mcr}
|
||||
|
||||
[Mirantis Container Runtime](https://docs.mirantis.com/mcr/20.10/overview.html) (MCR) is available as a container runtime for all Windows Server 2019 and later versions.
|
||||
|
||||
See [Install MCR on Windows Servers](https://docs.mirantis.com/mcr/20.10/install/mcr-windows.html) for more information.
|
||||
|
||||
## Windows OS version compatibility {#windows-os-version-support}
|
||||
|
||||
On Windows nodes, strict compatibility rules apply where the host OS version must
|
||||
match the container base image OS version. Only Windows containers with a container
|
||||
operating system of Windows Server 2019 are fully supported.
|
||||
|
||||
For Kubernetes v{{< skew currentVersion >}}, operating system compatibility for Windows nodes (and Pods)
|
||||
is as follows:
|
||||
|
||||
Windows Server LTSC release
|
||||
: Windows Server 2019
|
||||
: Windows Server 2022
|
||||
|
||||
Windows Server SAC release
|
||||
: Windows Server version 20H2
|
||||
|
||||
The Kubernetes [version-skew policy](/docs/setup/release/version-skew-policy/) also applies.
|
||||
|
||||
## Getting help and troubleshooting {#troubleshooting}
|
||||
|
||||
Your main source of help for troubleshooting your Kubernetes cluster should start
|
||||
with the [Troubleshooting](/docs/tasks/debug/)
|
||||
page.
|
||||
|
||||
Some additional, Windows-specific troubleshooting help is included
|
||||
in this section. Logs are an important element of troubleshooting
|
||||
issues in Kubernetes. Make sure to include them any time you seek
|
||||
troubleshooting assistance from other contributors. Follow the
|
||||
instructions in the
|
||||
SIG Windows [contributing guide on gathering logs](https://github.com/kubernetes/community/blob/master/sig-windows/CONTRIBUTING.md#gathering-logs).
|
||||
|
||||
### Reporting issues and feature requests
|
||||
|
||||
If you have what looks like a bug, or you would like to
|
||||
make a feature request, please follow the [SIG Windows contributing guide](https://github.com/kubernetes/community/blob/master/sig-windows/CONTRIBUTING.md#reporting-issues-and-feature-requests) to create a new issue.
|
||||
You should first search the list of issues in case it was
|
||||
reported previously and comment with your experience on the issue and add additional
|
||||
logs. SIG-Windows Slack is also a great avenue to get some initial support and
|
||||
troubleshooting ideas prior to creating a ticket.
|
||||
|
||||
## {{% heading "whatsnext" %}}
|
||||
|
||||
### Deployment tools
|
||||
|
||||
The kubeadm tool helps you to deploy a Kubernetes cluster, providing the control
|
||||
plane to manage the cluster it, and nodes to run your workloads.
|
||||
[Adding Windows nodes](/docs/tasks/administer-cluster/kubeadm/adding-windows-nodes/)
|
||||
explains how to deploy Windows nodes to your cluster using kubeadm.
|
||||
|
||||
The Kubernetes [cluster API](https://cluster-api.sigs.k8s.io/) project also provides means to automate deployment of Windows nodes.
|
||||
|
||||
### Windows distribution channels
|
||||
|
||||
For a detailed explanation of Windows distribution channels see the [Microsoft documentation](https://docs.microsoft.com/en-us/windows-server/get-started-19/servicing-channels-19).
|
||||
|
||||
Information on the different Windows Server servicing channels
|
||||
including their support models can be found at
|
||||
[Windows Server servicing channels](https://docs.microsoft.com/en-us/windows-server/get-started/servicing-channels-comparison).
|
|
@ -3,7 +3,6 @@ reviewers:
|
|||
- jayunit100
|
||||
- jsturtevant
|
||||
- marosset
|
||||
- perithompson
|
||||
title: Guide for scheduling Windows containers in Kubernetes
|
||||
content_type: concept
|
||||
weight: 75
|
||||
|
@ -12,16 +11,14 @@ weight: 75
|
|||
<!-- overview -->
|
||||
|
||||
Windows applications constitute a large portion of the services and applications that run in many organizations.
|
||||
This guide walks you through the steps to configure and deploy a Windows container in Kubernetes.
|
||||
|
||||
|
||||
This guide walks you through the steps to configure and deploy Windows containers in Kubernetes.
|
||||
|
||||
<!-- body -->
|
||||
|
||||
## Objectives
|
||||
|
||||
* Configure an example deployment to run Windows containers on the Windows node
|
||||
* (Optional) Configure an Active Directory Identity for your Pod using Group Managed Service Accounts (GMSA)
|
||||
* Highlight Windows specific funcationality in Kubernetes
|
||||
|
||||
## Before you begin
|
||||
|
||||
|
@ -34,8 +31,8 @@ The example in the section below is provided to jumpstart your experience with W
|
|||
|
||||
## Getting Started: Deploying a Windows container
|
||||
|
||||
To deploy a Windows container on Kubernetes, you must first create an example application.
|
||||
The example YAML file below creates a simple webserver application.
|
||||
The example YAML file below deploys a simple webserver application running inside a Windows container.
|
||||
|
||||
Create a service spec named `win-webserver.yaml` with the contents below:
|
||||
|
||||
```yaml
|
||||
|
@ -83,8 +80,8 @@ spec:
|
|||
```
|
||||
|
||||
{{< note >}}
|
||||
Port mapping is also supported, but for simplicity in this example
|
||||
the container port 80 is exposed directly to the service.
|
||||
Port mapping is also supported, but for simplicity this example exposes
|
||||
port 80 of the container directly to the Service.
|
||||
{{< /note >}}
|
||||
|
||||
1. Check that all nodes are healthy:
|
||||
|
@ -104,7 +101,6 @@ the container port 80 is exposed directly to the service.
|
|||
|
||||
1. Check that the deployment succeeded. To verify:
|
||||
|
||||
* Two containers per pod on the Windows node, use `docker ps`
|
||||
* Two pods listed from the Linux control plane node, use `kubectl get pods`
|
||||
* Node-to-pod communication across the network, `curl` port 80 of your pod IPs from the Linux control plane node
|
||||
to check for a web server response
|
||||
|
@ -139,42 +135,49 @@ piping them to STDOUT for consumption by `kubectl logs <pod>`.
|
|||
Follow the instructions in the LogMonitor GitHub page to copy its binaries and configuration files
|
||||
to all your containers and add the necessary entrypoints for LogMonitor to push your logs to STDOUT.
|
||||
|
||||
## Using configurable Container usernames
|
||||
## Configuring container user
|
||||
|
||||
Starting with Kubernetes v1.16, Windows containers can be configured to run their entrypoints and processes
|
||||
### Using configurable Container usernames
|
||||
|
||||
Windows containers can be configured to run their entrypoints and processes
|
||||
with different usernames than the image defaults.
|
||||
The way this is achieved is a bit different from the way it is done for Linux containers.
|
||||
Learn more about it [here](/docs/tasks/configure-pod-container/configure-runasusername/).
|
||||
|
||||
## Managing Workload Identity with Group Managed Service Accounts
|
||||
### Managing Workload Identity with Group Managed Service Accounts
|
||||
|
||||
Starting with Kubernetes v1.14, Windows container workloads can be configured to use Group Managed Service Accounts (GMSA).
|
||||
Group Managed Service Accounts are a specific type of Active Directory account that provides automatic password management,
|
||||
Windows container workloads can be configured to use Group Managed Service Accounts (GMSA).
|
||||
Group Managed Service Accounts are a specific type of Active Directory account that provide automatic password management,
|
||||
simplified service principal name (SPN) management, and the ability to delegate the management to other administrators across multiple servers.
|
||||
Containers configured with a GMSA can access external Active Directory Domain resources while carrying the identity configured with the GMSA.
|
||||
Learn more about configuring and using GMSA for Windows containers [here](/docs/tasks/configure-pod-container/configure-gmsa/).
|
||||
|
||||
## Taints and Tolerations
|
||||
|
||||
Users today need to use some combination of taints and node selectors in order to
|
||||
keep Linux and Windows workloads on their respective OS-specific nodes.
|
||||
This likely imposes a burden only on Windows users. The recommended approach is outlined below,
|
||||
Users need to use some combination of taints and node selectors in order to
|
||||
schedule Linux and Windows workloads to their respective OS-specific nodes.
|
||||
The recommended approach is outlined below,
|
||||
with one of its main goals being that this approach should not break compatibility for existing Linux workloads.
|
||||
{{< note >}}
|
||||
|
||||
If the `IdentifyPodOS` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) is
|
||||
enabled, you can (and should) set `.spec.os.name` for a Pod to indicate the operating system
|
||||
that the containers in that Pod are designed for. For Pods that run Linux containers, set
|
||||
`.spec.os.name` to `linux`. For Pods that run Windows containers, set `.spec.os.name`
|
||||
to Windows.
|
||||
|
||||
{{< note >}}
|
||||
Starting from 1.24, the `IdentifyPodOS` feature is in Beta stage and defaults to be enabled.
|
||||
{{< /note >}}
|
||||
|
||||
The scheduler does not use the value of `.spec.os.name` when assigning Pods to nodes. You should
|
||||
use normal Kubernetes mechanisms for
|
||||
[assigning pods to nodes](/docs/concepts/scheduling-eviction/assign-pod-node/)
|
||||
to ensure that the control plane for your cluster places pods onto nodes that are running the
|
||||
appropriate operating system.
|
||||
no effect on the scheduling of the Windows pods, so taints and tolerations and node selectors are still required
|
||||
|
||||
The `.spec.os.name` value has no effect on the scheduling of the Windows pods,
|
||||
so taints and tolerations and node selectors are still required
|
||||
to ensure that the Windows pods land onto appropriate Windows nodes.
|
||||
{{< /note >}}
|
||||
|
||||
### Ensuring OS-specific workloads land on the appropriate container host
|
||||
|
||||
Users can ensure Windows containers can be scheduled on the appropriate host using Taints and Tolerations.
|
||||
|
@ -225,16 +228,14 @@ Here are values used today for each Windows Server version.
|
|||
| Product Name | Build Number(s) |
|
||||
|--------------------------------------|------------------------|
|
||||
| Windows Server 2019 | 10.0.17763 |
|
||||
| Windows Server version 1809 | 10.0.17763 |
|
||||
| Windows Server version 1903 | 10.0.18362 |
|
||||
|
||||
| Windows Server, Version 20H2 | 10.0.19042 |
|
||||
| Windows Server 2022 | 10.0.20348 |
|
||||
|
||||
### Simplifying with RuntimeClass
|
||||
|
||||
[RuntimeClass] can be used to simplify the process of using taints and tolerations.
|
||||
A cluster administrator can create a `RuntimeClass` object which is used to encapsulate these taints and tolerations.
|
||||
|
||||
|
||||
1. Save this file to `runtimeClasses.yml`. It includes the appropriate `nodeSelector`
|
||||
for the Windows OS, architecture, and version.
|
||||
|
||||
|
@ -306,7 +307,4 @@ spec:
|
|||
app: iis-2019
|
||||
```
|
||||
|
||||
|
||||
|
||||
|
||||
[RuntimeClass]: https://kubernetes.io/docs/concepts/containers/runtime-class/
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue